github.com/datastax/go-cassandra-native-protocol@v0.0.0-20220706104457-5e8aad05cf90/specs/native_protocol_v5.spec (about) 1 # 2 # Licensed to the Apache Software Foundation (ASF) under one 3 # or more contributor license agreements. See the NOTICE file 4 # distributed with this work for additional information 5 # regarding copyright ownership. The ASF licenses this file 6 # to you under the Apache License, Version 2.0 (the 7 # "License"); you may not use this file except in compliance 8 # with the License. You may obtain a copy of the License at 9 # 10 # http://www.apache.org/licenses/LICENSE-2.0 11 # 12 # Unless required by applicable law or agreed to in writing, software 13 # distributed under the License is distributed on an "AS IS" BASIS, 14 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 # See the License for the specific language governing permissions and 16 # limitations under the License. 17 # 18 19 CQL BINARY PROTOCOL v5 20 21 22 Table of Contents 23 24 1. Overview 25 2. Frame format 26 2.1. Uncompressed Format 27 2.2. Compressed Format 28 2.3 Protocol Negotiation 29 2.3.1 Initial Handshake 30 2.3.2 Compression 31 2.4. Frame Payload 32 2.4.1 Frame Header 33 2.4.1.1. version 34 2.4.1.2. flags 35 2.4.1.3. stream 36 2.4.1.4. opcode 37 2.4.1.5. length 38 3. Notations 39 4. Messages 40 4.1. Requests 41 4.1.1. STARTUP 42 4.1.2. AUTH_RESPONSE 43 4.1.3. OPTIONS 44 4.1.4. QUERY 45 4.1.5. PREPARE 46 4.1.6. EXECUTE 47 4.1.7. BATCH 48 4.1.8. REGISTER 49 4.2. Responses 50 4.2.1. ERROR 51 4.2.2. READY 52 4.2.3. AUTHENTICATE 53 4.2.4. SUPPORTED 54 4.2.5. RESULT 55 4.2.5.1. Void 56 4.2.5.2. Rows 57 4.2.5.3. Set_keyspace 58 4.2.5.4. Prepared 59 4.2.5.5. Schema_change 60 4.2.6. EVENT 61 4.2.7. AUTH_CHALLENGE 62 4.2.8. AUTH_SUCCESS 63 5. Data Type Serialization Formats 64 6. User Defined Type Serialization 65 7. Result paging 66 8. Error codes 67 9. Changes from v4 68 69 70 1. Overview 71 72 The CQL binary protocol is a frame based protocol with a frame comprises a header, payload 73 and trailer. In v5 there are two distinct frame formats, compressed and uncompressed, in 74 both cases, the payload is a stream of CQL envelopes (Section 2.4). Each envelope contains 75 a single CQL message, along with a metadata header. In effect, the v5 framing format is a 76 simple wrapper around protocol v5. 77 78 In either format, a frame may or may not be self contained. If self contained, then the 79 payload includes one or more complete envelopes and can be fully processed immediately. 80 Otherwise, the payload contains some part of a large envelope, which has been split into 81 its own sequence of frames. These are expected to be transmitted/received in order, so 82 the receiver can accumulate them as they arrive and process them once all have been received. 83 84 The frame header contains length information for the payload, a flag to indicate whether 85 or not the frame is self contained and a CRC24 to assert the integrity of the header itself. 86 There are slight variations in the header format between the compressed and uncompressed 87 variants. 88 89 The payload is opaque as far as the framing format is concerned, modulo the self 90 contained variation. 91 92 The trailer contains a CRC32 to protect the integrity of the payload, covering all envelopes 93 (whole or partial) contained therein. 94 95 96 2. Frame Format 97 98 2.1 Uncompressed Format 99 100 The uncompressed variant uses a 6 byte header containing payload length, self contained 101 flag and CRC24 for the header itself. The max size for the payload is 128KiB, and is 102 followed by its CRC32. 103 104 1. Payload length (17 bits) 105 2. isSelfContained flag (1 bit) 106 3. Header padding (6 bits) 107 4. CRC24 of the header (24 bits) 108 5. Payload (up to 2 ^ 17 - 1 bits) 109 6. Payload CRC32 (32 bits) 110 111 0 1 2 3 112 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 113 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 114 | Payload Length |C| | 115 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 116 CRC24 of Header | | 117 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 118 | | 119 + + 120 | Payload | 121 + + 122 | | 123 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 124 | CRC32 of Payload | 125 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 126 127 2.2 LZ4 Compressed Format 128 129 The variant with LZ4 compression uses an 8 byte header, containing both the compressed 130 and uncompressed lengths of the payload, the self contained flag and a CRC24 for the 131 header. As with uncompressed frames, the max payload size is 128KiB and is followed 132 by a CRC32 trailer. This is the CRC of the compressed payload. 133 134 1. Compressed length (17 bits) 135 2. Uncompressed length (17 bits) 136 3. isSelfContained flag (1 bit) 137 4. Header padding (5 bits) 138 5. CRC24 of Header contents (24 bits) 139 6. Compressed Payload (up to 2 ^ 17 - 1 bits) 140 7. CRC32 of Compressed Payload (32 bits) 141 142 0 1 2 3 143 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 145 | Compressed Length | Uncompressed Length 146 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 147 |C| | CRC24 of Header | 148 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 149 | | 150 + + 151 | Compressed Payload | 152 + + 153 | | 154 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 155 | CRC32 of Compressed Payload | 156 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 157 158 159 2.3 Protocol Negotiation 160 161 2.3.1 Initial Handshake 162 163 In order to support both v5 and earlier formats, the v5 framing format is not 164 applied to message exchanges before an initial handshake is completed. Practically, 165 this means that the initial STARTUP message and any OPTIONS messages which precede 166 it are expected to be unframed. Likewise, the responses returned by the server, 167 SUPPORTED in response to OPTIONS and either READY or AUTHENTICATE in response to 168 STARTUP are transmitted unframed. 169 170 After sending the READY or AUTHENTICATE response to a STARTUP message, the server 171 will begin encoding and decoding all further transmissions according to the protocol 172 version of that STARTUP message. Compression of the frames is dictated by the 173 COMPRESSION option sent in the STARTUP message. Only LZ4 compression is currently 174 supported for v5. 175 176 Note: OPTIONS requests may be sent by the client at any time in the connection 177 lifecycle, both before and after the STARTUP exchange. As mentioned, those 178 transmitted before STARTUP, as well as the SUPPORTED responses the server returns 179 are unframed. Any OPTIONS/SUPPORTED exchanges after the STARTUP handshake are 180 formatted according to the negotiated protocol version, so for v5 these must be 181 framed. 182 183 2.3.2 Compression 184 185 Before being used, client and server must agree on a compression algorithm to 186 use, which is done in the STARTUP message. As a consequence, a STARTUP message 187 must never be compressed. However, once the STARTUP frame has been received 188 by the server, messages can be compressed (including the response to the STARTUP 189 request). Frames do not have to be compressed, however, even if compression has 190 been agreed upon (a sender may only compress frames above a certain size at its 191 discretion). Where compression has been agreed, the sender signals that the payload 192 is not compressed by setting the compressed length to 0. 193 194 As of v5 of the protocol, the only compression available is lz4 195 (https://code.google.com/p/lz4/). 196 197 198 2.4. Frame Payload 199 200 Envelopes are defined as: 201 202 0 8 16 24 32 40 203 +---------+---------+---------+---------+---------+ 204 | version | flags | stream | opcode | 205 +---------+---------+---------+---------+---------+ 206 | length | 207 +---------+---------+---------+---------+ 208 | | 209 . ... body ... . 210 . . 211 . . 212 +---------------------------------------- 213 214 The protocol is big-endian (network byte order). 215 216 Each envelope contains a fixed size header (9 bytes) followed by a variable size 217 body. The header is described in Section 2.4.1. The content of the body depends 218 on the header opcode value (the body can in particular be empty for some 219 opcode values). The list of allowed opcodes is defined in Section 2.4.1.4 and the 220 details of each corresponding message are described Section 4. 221 222 The protocol distinguishes two types of envelope: requests and responses. Requests 223 are those envelopes sent by the client to the server. Responses are those envelopes sent 224 by the server to the client. Note, however, that the protocol supports server pushes 225 (events) so a response does not necessarily come right after a client request. 226 227 Note to client implementors: client libraries should always assume that the 228 body of a given envelope may contain more data than what is described in this 229 document. It will however always be safe to ignore the remainder of the body 230 in such cases. The reason is that this may enable extending the protocol 231 with optional features without needing to change the protocol version. 232 233 Envelope headers are designed to support backwards compatibility with earlier 234 protocol versions. For that reason, they include an unused leading byte in place 235 of the version field from previous protocol versions. This was always to some extent 236 redundant as the version is set and enforced at the connection level. It was also 237 previously possible to enable compression for an individual envelope. This is no 238 longer possible, as the framing format is responsible for compression, which is set for 239 the lifetime of a connection and applies to all messages transmitted throughout it 240 (see Section 2.2.1 for caveats). The compression flag is therefore deprecated and 241 ignored in protocol v5. 242 243 2.4.1.1. version 244 245 The version is a single byte that indicates both the direction of the message 246 (request or response) and the version of the protocol in use. The most 247 significant bit of version is used to define the direction of the message: 248 0 indicates a request, 1 indicates a response. This can be useful for protocol 249 analyzers to distinguish the nature of the packet from the direction in which 250 it is moving. The rest of that byte is the protocol version (5 for the protocol 251 defined in this document). In other words, for this version of the protocol, 252 version will be one of: 253 0x05 Request frame for this protocol version 254 0x85 Response frame for this protocol version 255 256 Please note that while every message ships with the version, only one version 257 of messages is accepted on a given connection. In other words, the first message 258 exchanged (STARTUP) sets the version for the connection for the lifetime of this 259 connection. 260 261 This document describes version 5 of the protocol. For the changes made since 262 version 4, see Section 10. 263 264 2.4.1.2. flags 265 266 Flags applying to this envelope. The flags have the following meaning (described 267 by the mask that allows selecting them): 268 0x01: Compression flag. In protocol v5 this flag is deprecated and ignored. 269 0x02: Tracing flag. For a request, this indicates the client requires tracing 270 of the request. Note that only QUERY, PREPARE and EXECUTE queries 271 support tracing. Other requests will simply ignore the tracing flag if 272 set. If a request supports tracing and the tracing flag is set, the 273 response to this request will have the tracing flag set and contain 274 tracing information. 275 If a response has the tracing flag set, its body contains a tracing ID. 276 The tracing ID is a [uuid] and is the first thing in the body. 277 0x04: Custom payload flag. For a request or response, this indicates that a 278 generic key-value custom payload for a custom QueryHandler implementation 279 is present. Such a custom payload is simply ignored by the default 280 QueryHandler implementation. Currently, only QUERY, PREPARE, EXECUTE and 281 BATCH requests support custom payloads. 282 Type of custom payload is [bytes map] (see below). If either or both 283 of the tracing and warning flags are set, the custom payload will follow 284 those indicated elements in the body. If neither are set, the custom 285 payload will be the first value in the body. 286 0x08: Warning flag. The response contains warnings which were generated by the 287 server to go along with this response. 288 If a response has the warning flag set, its body will contain the text of 289 the warnings. The warnings are a [string list] and will be the first value 290 in the body if the tracing flag is not set, or directly after the tracing 291 ID if it is. 292 0x10: Use beta flag. Indicates that the client opts in to use protocol version 293 that is currently in beta. Server will respond with ERROR if protocol 294 version is marked as beta on server and client does not provide this flag. 295 296 The rest of flags is currently unused and ignored. 297 298 2.4.1.3. stream 299 300 An envelope has a stream id (a [short] value). When sending request messages, this 301 stream id must be set by the client to a non-negative value (negative stream id 302 are reserved for streams initiated by the server; currently all EVENT messages 303 (section 4.2.6) have a streamId of -1). If a client sends a request message 304 with the stream id X, it is guaranteed that the stream id of the response to 305 that message will be X. 306 307 This helps to enable the asynchronous nature of the protocol. If a client 308 sends multiple messages simultaneously (without waiting for responses), there 309 is no guarantee on the order of the responses. For instance, if the client 310 writes REQ_1, REQ_2, REQ_3 on the wire (in that order), the server might 311 respond to REQ_3 (or REQ_2) first. Assigning different stream ids to these 3 312 requests allows the client to distinguish to which request a received answer 313 responds to. As there can only be 32768 different simultaneous streams, it is up 314 to the client to reuse stream id. 315 316 Note that clients are free to use the protocol synchronously (i.e. wait for 317 the response to REQ_N before sending REQ_N+1). In that case, the stream id 318 can be safely set to 0 as long as each frame contains only a single envelope. 319 Clients should also feel free to use only a subset of the 32768 maximum possible stream 320 ids if it is simpler for its implementation. 321 322 2.4.1.4. opcode 323 324 An integer byte that distinguishes the actual message: 325 0x00 ERROR 326 0x01 STARTUP 327 0x02 READY 328 0x03 AUTHENTICATE 329 0x05 OPTIONS 330 0x06 SUPPORTED 331 0x07 QUERY 332 0x08 RESULT 333 0x09 PREPARE 334 0x0A EXECUTE 335 0x0B REGISTER 336 0x0C EVENT 337 0x0D BATCH 338 0x0E AUTH_CHALLENGE 339 0x0F AUTH_RESPONSE 340 0x10 AUTH_SUCCESS 341 342 Messages are described in Section 4. 343 344 (Note that there is no 0x04 message in this version of the protocol) 345 346 2.4.1.5. length 347 348 A 4 byte integer representing the length of the body of the envelope (note: 349 currently an envelope body is limited to 256MB in length). 350 351 352 3. Notations 353 354 To describe the layout of the envelope body for the messages in Section 4, we 355 define the following: 356 357 [int] A 4 bytes integer 358 [long] A 8 bytes integer 359 [byte] A 1 byte unsigned integer 360 [short] A 2 bytes unsigned integer 361 [string] A [short] n, followed by n bytes representing an UTF-8 362 string. 363 [long string] An [int] n, followed by n bytes representing an UTF-8 string. 364 [uuid] A 16 bytes long uuid. 365 [string list] A [short] n, followed by n [string]. 366 [bytes] A [int] n, followed by n bytes if n >= 0. If n < 0, 367 no byte should follow and the value represented is `null`. 368 [value] A [int] n, followed by n bytes if n >= 0. 369 If n == -1 no byte should follow and the value represented is `null`. 370 If n == -2 no byte should follow and the value represented is 371 `not set` not resulting in any change to the existing value. 372 n < -2 is an invalid value and results in an error. 373 [short bytes] A [short] n, followed by n bytes if n >= 0. 374 375 [unsigned vint] An unsigned variable length integer. A vint is encoded with the most significant byte (MSB) first. 376 The most significant byte will contains the information about how many extra bytes need to be read 377 as well as the most significant bits of the integer. 378 The number of extra bytes to read is encoded as 1 bits on the left side. 379 For example, if we need to read 2 more bytes the first byte will start with 110 380 (e.g. 256 000 will be encoded on 3 bytes as [110]00011 11101000 00000000) 381 If the encoded integer is 8 bytes long the vint will be encoded on 9 bytes and the first 382 byte will be: 11111111 383 384 [vint] A signed variable length integer. This is encoded using zig-zag encoding and then sent 385 like an [unsigned vint]. Zig-zag encoding converts numbers as follows: 386 0 = 0, -1 = 1, 1 = 2, -2 = 3, 2 = 4, -3 = 5, 3 = 6 and so forth. 387 The purpose is to send small negative values as small unsigned values, so that we save bytes on the wire. 388 To encode a value n use "(n >> 31) ^ (n << 1)" for 32 bit values, and "(n >> 63) ^ (n << 1)" 389 for 64 bit values where "^" is the xor operation, "<<" is the left shift operation and ">>" is 390 the arithemtic right shift operation (highest-order bit is replicated). 391 Decode with "(n >> 1) ^ -(n & 1)". 392 393 [option] A pair of <id><value> where <id> is a [short] representing 394 the option id and <value> depends on that option (and can be 395 of size 0). The supported id (and the corresponding <value>) 396 will be described when this is used. 397 [option list] A [short] n, followed by n [option]. 398 [inet] An address (ip and port) to a node. It consists of one 399 [byte] n, that represents the address size, followed by n 400 [byte] representing the IP address (in practice n can only be 401 either 4 (IPv4) or 16 (IPv6)), following by one [int] 402 representing the port. 403 [inetaddr] An IP address (without a port) to a node. It consists of one 404 [byte] n, that represents the address size, followed by n 405 [byte] representing the IP address. 406 [consistency] A consistency level specification. This is a [short] 407 representing a consistency level with the following 408 correspondance: 409 0x0000 ANY 410 0x0001 ONE 411 0x0002 TWO 412 0x0003 THREE 413 0x0004 QUORUM 414 0x0005 ALL 415 0x0006 LOCAL_QUORUM 416 0x0007 EACH_QUORUM 417 0x0008 SERIAL 418 0x0009 LOCAL_SERIAL 419 0x000A LOCAL_ONE 420 421 [string map] A [short] n, followed by n pair <k><v> where <k> and <v> 422 are [string]. 423 [string multimap] A [short] n, followed by n pair <k><v> where <k> is a 424 [string] and <v> is a [string list]. 425 [bytes map] A [short] n, followed by n pair <k><v> where <k> is a 426 [string] and <v> is a [bytes]. 427 428 429 4. Messages 430 431 Dependant on the flags specified in the header, the layout of the message body must be: 432 [<tracing_id>][<warnings>][<custom_payload>]<message> 433 where: 434 - <tracing_id> is a UUID tracing ID, present if this is a request message and the Tracing flag is set. 435 - <warnings> is a string list of warnings (if this is a request message and the Warning flag is set. 436 - <custom_payload> is bytes map for the serialised custom payload present if this is one of the message types 437 which support custom payloads (QUERY, PREPARE, EXECUTE and BATCH) and the Custom payload flag is set. 438 - <message> as defined below through sections 4 and 5. 439 440 4.1. Requests 441 442 Note that outside of their normal responses (described below), all requests 443 can get an ERROR message (Section 4.2.1) as response. 444 445 4.1.1. STARTUP 446 447 Initialize the connection. The server will respond by either a READY message 448 (in which case the connection is ready for queries) or an AUTHENTICATE message 449 (in which case credentials will need to be provided using AUTH_RESPONSE). 450 451 This must be the first message of the connection, except for OPTIONS that can 452 be sent before to find out the options supported by the server. Once the 453 connection has been initialized, a client should not send any more STARTUP 454 messages. 455 456 The body is a [string map] of options. Possible options are: 457 - "CQL_VERSION": the version of CQL to use. This option is mandatory and 458 currently the only version supported is "3.0.0". Note that this is 459 different from the protocol version. 460 - "COMPRESSION": the compression algorithm to use for frames (See section 2.3.2). 461 This is optional; if not specified no compression will be used. 462 - "DRIVER_NAME": allows clients to supply a free-form label representing the driver 463 implementation. This is displayed in the output of `nodetool clientstats` 464 - "DRIVER_VERSION": allows clients to supply a free-form label represting the driver 465 version. This is displayed in the output of `nodetool clientstats` 466 - "THROW_ON_OVERLOAD": flag to specify server behaviour where the incoming message 467 rate is too high. An [string] value of "1" instructs the server to respond with 468 and Error when its resources are exhausted. Any other value, or if the the key 469 is not present, and the server will apply backpressure to the connection until it 470 has cleared its backlog of inbound messages. 471 472 As mentioned in Section 2.3, STARTUP messages must not be sent in the framed format. STARTUP, 473 any OPTIONS requests which precede them, as well as the server's responses to those messages 474 must be unframed to support protocol negotiation with older clients. 475 476 4.1.2. AUTH_RESPONSE 477 478 Answers a server authentication challenge. 479 480 Authentication in the protocol is SASL based. The server sends authentication 481 challenges (a bytes token) to which the client answers with this message. Those 482 exchanges continue until the server accepts the authentication by sending a 483 AUTH_SUCCESS message after a client AUTH_RESPONSE. Note that the exchange 484 begins with the client sending an initial AUTH_RESPONSE in response to a 485 server AUTHENTICATE request. 486 487 The body of this message is a single [bytes] token. The details of what this 488 token contains (and when it can be null/empty, if ever) depends on the actual 489 authenticator used. 490 491 The response to a AUTH_RESPONSE is either a follow-up AUTH_CHALLENGE message, 492 an AUTH_SUCCESS message or an ERROR message. 493 494 495 4.1.3. OPTIONS 496 497 Asks the server to return which STARTUP options are supported. The body of an 498 OPTIONS message should be empty and the server will respond with a SUPPORTED 499 message. 500 501 502 4.1.4. QUERY 503 504 Performs a CQL query. The body of the message must be: 505 <query><query_parameters> 506 where <query> is a [long string] representing the query and 507 <query_parameters> must be 508 <consistency><flags>[<n>[name_1]<value_1>...[name_n]<value_n>][<result_page_size>][<paging_state>][<serial_consistency>][<timestamp>][<keyspace>][<now_in_seconds>] 509 where: 510 - <consistency> is the [consistency] level for the operation. 511 - <flags> is a [int] whose bits define the options for this query and 512 in particular influence what the remainder of the message contains. 513 A flag is set if the bit corresponding to its `mask` is set. Supported 514 flags are, given their mask: 515 0x0001: Values. If set, a [short] <n> followed by <n> [value] 516 values are provided. Those values are used for bound variables in 517 the query. Optionally, if the 0x40 flag is present, each value 518 will be preceded by a [string] name, representing the name of 519 the marker the value must be bound to. 520 0x0002: Skip_metadata. If set, the Result Set returned as a response 521 to the query (if any) will have the NO_METADATA flag (see 522 Section 4.2.5.2). 523 0x0004: Page_size. If set, <result_page_size> is an [int] 524 controlling the desired page size of the result (in CQL3 rows). 525 See the section on paging (Section 7) for more details. 526 0x0008: With_paging_state. If set, <paging_state> should be present. 527 <paging_state> is a [bytes] value that should have been returned 528 in a result set (Section 4.2.5.2). The query will be 529 executed but starting from a given paging state. This is also to 530 continue paging on a different node than the one where it 531 started (See Section 7 for more details). 532 0x0010: With serial consistency. If set, <serial_consistency> should be 533 present. <serial_consistency> is the [consistency] level for the 534 serial phase of conditional updates. That consitency can only be 535 either SERIAL or LOCAL_SERIAL and if not present, it defaults to 536 SERIAL. This option will be ignored for anything else other than a 537 conditional update/insert. 538 0x0020: With default timestamp. If set, <timestamp> must be present. 539 <timestamp> is a [long] representing the default timestamp for the query 540 in microseconds (negative values are forbidden). This will 541 replace the server side assigned timestamp as default timestamp. 542 Note that a timestamp in the query itself will still override 543 this timestamp. This is entirely optional. 544 0x0040: With names for values. This only makes sense if the 0x01 flag is set and 545 is ignored otherwise. If present, the values from the 0x01 flag will 546 be preceded by a name (see above). Note that this is only useful for 547 QUERY requests where named bind markers are used; for EXECUTE statements, 548 since the names for the expected values was returned during preparation, 549 a client can always provide values in the right order without any names 550 and using this flag, while supported, is almost surely inefficient. 551 0x0080: With keyspace. If set, <keyspace> must be present. <keyspace> is a 552 [string] indicating the keyspace that the query should be executed in. 553 It supercedes the keyspace that the connection is bound to, if any. 554 0x0100: With now in seconds. If set, <now_in_seconds> must be present. 555 <now_in_seconds> is an [int] representing the current time (now) for 556 the query. Affects TTL cell liveness in read queries and local deletion 557 time for tombstones and TTL cells in update requests. It's intended 558 for testing purposes and is optional. 559 560 Note that the consistency is ignored by some queries (USE, CREATE, ALTER, 561 TRUNCATE, ...). 562 563 The server will respond to a QUERY message with a RESULT message, the content 564 of which depends on the query. 565 566 567 4.1.5. PREPARE 568 569 Prepare a query for later execution (through EXECUTE). The body of the message must be: 570 <query><flags>[<keyspace>] 571 where: 572 - <query> is a [long string] representing the CQL query. 573 - <flags> is a [int] whose bits define the options for this statement and in particular 574 influence what the remainder of the message contains. 575 A flag is set if the bit corresponding to its `mask` is set. Supported 576 flags are, given their mask: 577 0x01: With keyspace. If set, <keyspace> must be present. <keyspace> is a 578 [string] indicating the keyspace that the query should be executed in. 579 It supercedes the keyspace that the connection is bound to, if any. 580 581 The server will respond with a RESULT message with a `prepared` kind (0x0004, 582 see Section 4.2.5). 583 584 585 4.1.6. EXECUTE 586 587 Executes a prepared query. The body of the message must be: 588 <id><result_metadata_id><query_parameters> 589 where 590 - <id> is the prepared query ID. It's the [short bytes] returned as a 591 response to a PREPARE message. 592 - <result_metadata_id> is the ID of the resultset metadata that was sent 593 along with response to PREPARE message. If a RESULT/Rows message reports 594 changed resultset metadata with the Metadata_changed flag, the reported new 595 resultset metadata must be used in subsequent executions. 596 - <query_parameters> has the exact same definition as in QUERY (see Section 4.1.4). 597 598 599 4.1.7. BATCH 600 601 Allows executing a list of queries (prepared or not) as a batch (note that 602 only DML statements are accepted in a batch). The body of the message must 603 be: 604 <type><n><query_1>...<query_n><consistency><flags>[<serial_consistency>][<timestamp>][<keyspace>][<now_in_seconds>] 605 where: 606 - <type> is a [byte] indicating the type of batch to use: 607 - If <type> == 0, the batch will be "logged". This is equivalent to a 608 normal CQL3 batch statement. 609 - If <type> == 1, the batch will be "unlogged". 610 - If <type> == 2, the batch will be a "counter" batch (and non-counter 611 statements will be rejected). 612 - <flags> is a [int] whose bits define the options for this query and 613 in particular influence what the remainder of the message contains. It is similar 614 to the <flags> from QUERY and EXECUTE methods, except that the 4 rightmost 615 bits must always be 0 as their corresponding options do not make sense for 616 Batch. A flag is set if the bit corresponding to its `mask` is set. Supported 617 flags are, given their mask: 618 0x0010: With serial consistency. If set, <serial_consistency> should be 619 present. <serial_consistency> is the [consistency] level for the 620 serial phase of conditional updates. That consistency can only be 621 either SERIAL or LOCAL_SERIAL and if not present, it defaults to 622 SERIAL. This option will be ignored for anything else other than a 623 conditional update/insert. 624 0x0020: With default timestamp. If set, <timestamp> should be present. 625 <timestamp> is a [long] representing the default timestamp for the query 626 in microseconds. This will replace the server side assigned 627 timestamp as default timestamp. Note that a timestamp in the query itself 628 will still override this timestamp. This is entirely optional. 629 0x0040: With names for values. If set, then all values for all <query_i> must be 630 preceded by a [string] <name_i> that have the same meaning as in QUERY 631 requests [IMPORTANT NOTE: this feature does not work and should not be 632 used. It is specified in a way that makes it impossible for the server 633 to implement. This will be fixed in a future version of the native 634 protocol. See https://issues.apache.org/jira/browse/CASSANDRA-10246 for 635 more details]. 636 0x0080: With keyspace. If set, <keyspace> must be present. <keyspace> is a 637 [string] indicating the keyspace that the query should be executed in. 638 It supercedes the keyspace that the connection is bound to, if any. 639 0x0100: With now in seconds. If set, <now_in_seconds> must be present. 640 <now_in_seconds> is an [int] representing the current time (now) for 641 the query. Affects TTL cell liveness in read queries and local deletion 642 time for tombstones and TTL cells in update requests. It's intended 643 for testing purposes and is optional. 644 - <n> is a [short] indicating the number of following queries. 645 - <query_1>...<query_n> are the queries to execute. A <query_i> must be of the 646 form: 647 <kind><string_or_id><n>[<name_1>]<value_1>...[<name_n>]<value_n> 648 where: 649 - <kind> is a [byte] indicating whether the following query is a prepared 650 one or not. <kind> value must be either 0 or 1. 651 - <string_or_id> depends on the value of <kind>. If <kind> == 0, it should be 652 a [long string] query string (as in QUERY, the query string might contain 653 bind markers). Otherwise (that is, if <kind> == 1), it should be a 654 [short bytes] representing a prepared query ID. 655 - <n> is a [short] indicating the number (possibly 0) of following values. 656 - <name_i> is the optional name of the following <value_i>. It must be present 657 if and only if the 0x40 flag is provided for the batch. 658 - <value_i> is the [value] to use for bound variable i (of bound variable <name_i> 659 if the 0x40 flag is used). 660 - <consistency> is the [consistency] level for the operation. 661 - <serial_consistency> is only present if the 0x10 flag is set. In that case, 662 <serial_consistency> is the [consistency] level for the serial phase of 663 conditional updates. That consitency can only be either SERIAL or 664 LOCAL_SERIAL and if not present will defaults to SERIAL. This option will 665 be ignored for anything else other than a conditional update/insert. 666 667 The server will respond with a RESULT message. 668 669 670 4.1.8. REGISTER 671 672 Register this connection to receive some types of events. The body of the 673 message is a [string list] representing the event types to register for. See 674 section 4.2.6 for the list of valid event types. 675 676 The response to a REGISTER message will be a READY message. 677 678 Please note that if a client driver maintains multiple connections to a 679 Cassandra node and/or connections to multiple nodes, it is advised to 680 dedicate a handful of connections to receive events, but to *not* register 681 for events on all connections, as this would only result in receiving 682 multiple times the same event messages, wasting bandwidth. 683 684 685 4.2. Responses 686 687 This section describes the content of the frame body for the different 688 responses. Please note that to make room for future evolution, clients should 689 support extra informations (that they should simply discard) to the one 690 described in this document at the end of the frame body. 691 692 4.2.1. ERROR 693 694 Indicates an error processing a request. The body of the message will be an 695 error code ([int]) followed by a [string] error message. Then, depending on 696 the exception, more content may follow. The error codes are defined in 697 Section 8, along with their additional content if any. 698 699 700 4.2.2. READY 701 702 Indicates that the server is ready to process queries. This message will be 703 sent by the server either after a STARTUP message if no authentication is 704 required (if authentication is required, the server indicates readiness by 705 sending a AUTH_RESPONSE message). 706 707 The body of a READY message is empty. 708 709 710 4.2.3. AUTHENTICATE 711 712 Indicates that the server requires authentication, and which authentication 713 mechanism to use. 714 715 The authentication is SASL based and thus consists of a number of server 716 challenges (AUTH_CHALLENGE, Section 4.2.7) followed by client responses 717 (AUTH_RESPONSE, Section 4.1.2). The initial exchange is however boostrapped 718 by an initial client response. The details of that exchange (including how 719 many challenge-response pairs are required) are specific to the authenticator 720 in use. The exchange ends when the server sends an AUTH_SUCCESS message or 721 an ERROR message. 722 723 This message will be sent following a STARTUP message if authentication is 724 required and must be answered by a AUTH_RESPONSE message from the client. 725 726 The body consists of a single [string] indicating the full class name of the 727 IAuthenticator in use. 728 729 730 4.2.4. SUPPORTED 731 732 Indicates which startup options are supported by the server. This message 733 comes as a response to an OPTIONS message. 734 735 The body of a SUPPORTED message is a [string multimap]. This multimap gives 736 for each of the supported STARTUP options, the list of supported values. It 737 also includes: 738 - "PROTOCOL_VERSIONS": the list of native protocol versions that are 739 supported, encoded as the version number followed by a slash and the 740 version description. For example: 3/v3, 4/v4, 5/v5-beta. If a version is 741 in beta, it will have the word "beta" in its description. 742 743 744 4.2.5. RESULT 745 746 The result to a query (QUERY, PREPARE, EXECUTE or BATCH messages). 747 748 The first element of the body of a RESULT message is an [int] representing the 749 `kind` of result. The rest of the body depends on the kind. The kind can be 750 one of: 751 0x0001 Void: for results carrying no information. 752 0x0002 Rows: for results to select queries, returning a set of rows. 753 0x0003 Set_keyspace: the result to a `use` query. 754 0x0004 Prepared: result to a PREPARE message. 755 0x0005 Schema_change: the result to a schema altering query. 756 757 The body for each kind (after the [int] kind) is defined below. 758 759 760 4.2.5.1. Void 761 762 The rest of the body for a Void result is empty. It indicates that a query was 763 successful without providing more information. 764 765 766 4.2.5.2. Rows 767 768 Indicates a set of rows. The rest of the body of a Rows result is: 769 <metadata><rows_count><rows_content> 770 where: 771 - <metadata> is composed of: 772 <flags><columns_count>[<paging_state>][<new_metadata_id>][<global_table_spec>?<col_spec_1>...<col_spec_n>] 773 where: 774 - <flags> is an [int]. The bits of <flags> provides information on the 775 formatting of the remaining information. A flag is set if the bit 776 corresponding to its `mask` is set. Supported flags are, given their 777 mask: 778 0x0001 Global_tables_spec: if set, only one table spec (keyspace 779 and table name) is provided as <global_table_spec>. If not 780 set, <global_table_spec> is not present. 781 0x0002 Has_more_pages: indicates whether this is not the last 782 page of results and more should be retrieved. If set, the 783 <paging_state> will be present. The <paging_state> is a 784 [bytes] value that should be used in QUERY/EXECUTE to 785 continue paging and retrieve the remainder of the result for 786 this query (See Section 7 for more details). 787 0x0004 No_metadata: if set, the <metadata> is only composed of 788 these <flags>, the <column_count> and optionally the 789 <paging_state> (depending on the Has_more_pages flag) but 790 no other information (so no <global_table_spec> nor <col_spec_i>). 791 This will only ever be the case if this was requested 792 during the query (see QUERY and RESULT messages). 793 0x0008 Metadata_changed: if set, the No_metadata flag has to be unset 794 and <new_metadata_id> has to be supplied. This flag is to be 795 used to avoid a roundtrip in case of metadata changes for queries 796 that requested metadata to be skipped. 797 - <columns_count> is an [int] representing the number of columns selected 798 by the query that produced this result. It defines the number of <col_spec_i> 799 elements in and the number of elements for each row in <rows_content>. 800 - <new_metadata_id> is [short bytes] representing the new, changed resultset 801 metadata. The new metadata ID must also be used in subsequent executions of 802 the corresponding prepared statement, if any. 803 - <global_table_spec> is present if the Global_tables_spec is set in 804 <flags>. It is composed of two [string] representing the 805 (unique) keyspace name and table name the columns belong to. 806 - <col_spec_i> specifies the columns returned in the query. There are 807 <column_count> such column specifications that are composed of: 808 (<ksname><tablename>)?<name><type> 809 The initial <ksname> and <tablename> are two [string] and are only present 810 if the Global_tables_spec flag is not set. The <column_name> is a 811 [string] and <type> is an [option] that corresponds to the description 812 (what this description is depends a bit on the context: in results to 813 selects, this will be either the user chosen alias or the selection used 814 (often a colum name, but it can be a function call too). In results to 815 a PREPARE, this will be either the name of the corresponding bind variable 816 or the column name for the variable if it is "anonymous") and type of 817 the corresponding result. The option for <type> is either a native 818 type (see below), in which case the option has no value, or a 819 'custom' type, in which case the value is a [string] representing 820 the fully qualified class name of the type represented. Valid option 821 ids are: 822 0x0000 Custom: the value is a [string], see above. 823 0x0001 Ascii 824 0x0002 Bigint 825 0x0003 Blob 826 0x0004 Boolean 827 0x0005 Counter 828 0x0006 Decimal 829 0x0007 Double 830 0x0008 Float 831 0x0009 Int 832 0x000B Timestamp 833 0x000C Uuid 834 0x000D Varchar 835 0x000E Varint 836 0x000F Timeuuid 837 0x0010 Inet 838 0x0011 Date 839 0x0012 Time 840 0x0013 Smallint 841 0x0014 Tinyint 842 0x0015 Duration 843 0x0020 List: the value is an [option], representing the type 844 of the elements of the list. 845 0x0021 Map: the value is two [option], representing the types of the 846 keys and values of the map 847 0x0022 Set: the value is an [option], representing the type 848 of the elements of the set 849 0x0030 UDT: the value is <ks><udt_name><n><name_1><type_1>...<name_n><type_n> 850 where: 851 - <ks> is a [string] representing the keyspace name this 852 UDT is part of. 853 - <udt_name> is a [string] representing the UDT name. 854 - <n> is a [short] representing the number of fields of 855 the UDT, and thus the number of <name_i><type_i> pairs 856 following 857 - <name_i> is a [string] representing the name of the 858 i_th field of the UDT. 859 - <type_i> is an [option] representing the type of the 860 i_th field of the UDT. 861 0x0031 Tuple: the value is <n><type_1>...<type_n> where <n> is a [short] 862 representing the number of values in the type, and <type_i> 863 are [option] representing the type of the i_th component 864 of the tuple 865 866 - <rows_count> is an [int] representing the number of rows present in this 867 result. Those rows are serialized in the <rows_content> part. 868 - <rows_content> is composed of <row_1>...<row_m> where m is <rows_count>. 869 Each <row_i> is composed of <value_1>...<value_n> where n is 870 <columns_count> and where <value_j> is a [bytes] representing the value 871 returned for the jth column of the ith row. In other words, <rows_content> 872 is composed of (<rows_count> * <columns_count>) [bytes]. 873 874 875 4.2.5.3. Set_keyspace 876 877 The result to a `use` query. The body (after the kind [int]) is a single 878 [string] indicating the name of the keyspace that has been set. 879 880 881 4.2.5.4. Prepared 882 883 The result to a PREPARE message. The body of a Prepared result is: 884 <id><result_metadata_id><metadata><result_metadata> 885 where: 886 - <id> is [short bytes] representing the prepared query ID. 887 - <result_metadata_id> is [short bytes] representing the resultset metadata ID. 888 - <metadata> is composed of: 889 <flags><columns_count><pk_count>[<pk_index_1>...<pk_index_n>][<global_table_spec>?<col_spec_1>...<col_spec_n>] 890 where: 891 - <flags> is an [int]. The bits of <flags> provides information on the 892 formatting of the remaining information. A flag is set if the bit 893 corresponding to its `mask` is set. Supported masks and their flags 894 are: 895 0x0001 Global_tables_spec: if set, only one table spec (keyspace 896 and table name) is provided as <global_table_spec>. If not 897 set, <global_table_spec> is not present. 898 - <columns_count> is an [int] representing the number of bind markers 899 in the prepared statement. It defines the number of <col_spec_i> 900 elements. 901 - <pk_count> is an [int] representing the number of <pk_index_i> 902 elements to follow. If this value is zero, at least one of the 903 partition key columns in the table that the statement acts on 904 did not have a corresponding bind marker (or the bind marker 905 was wrapped in a function call). 906 - <pk_index_i> is a short that represents the index of the bind marker 907 that corresponds to the partition key column in position i. 908 For example, a <pk_index> sequence of [2, 0, 1] indicates that the 909 table has three partition key columns; the full partition key 910 can be constructed by creating a composite of the values for 911 the bind markers at index 2, at index 0, and at index 1. 912 This allows implementations with token-aware routing to correctly 913 construct the partition key without needing to inspect table 914 metadata. 915 - <global_table_spec> is present if the Global_tables_spec is set in 916 <flags>. If present, it is composed of two [string]s. The first 917 [string] is the name of the keyspace that the statement acts on. 918 The second [string] is the name of the table that the columns 919 represented by the bind markers belong to. 920 - <col_spec_i> specifies the bind markers in the prepared statement. 921 There are <column_count> such column specifications, each with the 922 following format: 923 (<ksname><tablename>)?<name><type> 924 The initial <ksname> and <tablename> are two [string] that are only 925 present if the Global_tables_spec flag is not set. The <name> field 926 is a [string] that holds the name of the bind marker (if named), 927 or the name of the column, field, or expression that the bind marker 928 corresponds to (if the bind marker is "anonymous"). The <type> 929 field is an [option] that represents the expected type of values for 930 the bind marker. See the Rows documentation (section 4.2.5.2) for 931 full details on the <type> field. 932 933 - <result_metadata> is defined exactly the same as <metadata> in the Rows 934 documentation (section 4.2.5.2). This describes the metadata for the 935 result set that will be returned when this prepared statement is executed. 936 Note that <result_metadata> may be empty (have the No_metadata flag and 937 0 columns, See section 4.2.5.2) and will be for any query that is not a 938 Select. In fact, there is never a guarantee that this will be non-empty, so 939 implementations should protect themselves accordingly. This result metadata 940 is an optimization that allows implementations to later execute the 941 prepared statement without requesting the metadata (see the Skip_metadata 942 flag in EXECUTE). Clients can safely discard this metadata if they do not 943 want to take advantage of that optimization. 944 945 Note that the prepared query ID returned is global to the node on which the query 946 has been prepared. It can be used on any connection to that node 947 until the node is restarted (after which the query must be reprepared). 948 949 4.2.5.5. Schema_change 950 951 The result to a schema altering query (creation/update/drop of a 952 keyspace/table/index). The body (after the kind [int]) is the same 953 as the body for a "SCHEMA_CHANGE" event, so 3 strings: 954 <change_type><target><options> 955 Please refer to section 4.2.6 below for the meaning of those fields. 956 957 Note that a query to create or drop an index is considered to be a change 958 to the table the index is on. 959 960 961 4.2.6. EVENT 962 963 An event pushed by the server. A client will only receive events for the 964 types it has REGISTERed to. The body of an EVENT message will start with a 965 [string] representing the event type. The rest of the message depends on the 966 event type. The valid event types are: 967 - "TOPOLOGY_CHANGE": events related to change in the cluster topology. 968 Currently, events are sent when new nodes are added to the cluster, and 969 when nodes are removed. The body of the message (after the event type) 970 consists of a [string] and an [inet], corresponding respectively to the 971 type of change ("NEW_NODE" or "REMOVED_NODE") followed by the address of 972 the new/removed node. 973 - "STATUS_CHANGE": events related to change of node status. Currently, 974 up/down events are sent. The body of the message (after the event type) 975 consists of a [string] and an [inet], corresponding respectively to the 976 type of status change ("UP" or "DOWN") followed by the address of the 977 concerned node. 978 - "SCHEMA_CHANGE": events related to schema change. After the event type, 979 the rest of the message will be <change_type><target><options> where: 980 - <change_type> is a [string] representing the type of changed involved. 981 It will be one of "CREATED", "UPDATED" or "DROPPED". 982 - <target> is a [string] that can be one of "KEYSPACE", "TABLE", "TYPE", 983 "FUNCTION" or "AGGREGATE" and describes what has been modified 984 ("TYPE" stands for modifications related to user types, "FUNCTION" 985 for modifications related to user defined functions, "AGGREGATE" 986 for modifications related to user defined aggregates). 987 - <options> depends on the preceding <target>: 988 - If <target> is "KEYSPACE", then <options> will be a single [string] 989 representing the keyspace changed. 990 - If <target> is "TABLE" or "TYPE", then 991 <options> will be 2 [string]: the first one will be the keyspace 992 containing the affected object, and the second one will be the name 993 of said affected object (either the table, user type, function, or 994 aggregate name). 995 - If <target> is "FUNCTION" or "AGGREGATE", multiple arguments follow: 996 - [string] keyspace containing the user defined function / aggregate 997 - [string] the function/aggregate name 998 - [string list] one string for each argument type (as CQL type) 999 1000 All EVENT messages have a streamId of -1 (Section 2.4.1.3). 1001 1002 Please note that "NEW_NODE" and "UP" events are sent based on internal Gossip 1003 communication and as such may be sent a short delay before the binary 1004 protocol server on the newly up node is fully started. Clients are thus 1005 advised to wait a short time before trying to connect to the node (1 second 1006 should be enough), otherwise they may experience a connection refusal at 1007 first. 1008 1009 4.2.7. AUTH_CHALLENGE 1010 1011 A server authentication challenge (see AUTH_RESPONSE (Section 4.1.2) for more 1012 details). 1013 1014 The body of this message is a single [bytes] token. The details of what this 1015 token contains (and when it can be null/empty, if ever) depends on the actual 1016 authenticator used. 1017 1018 Clients are expected to answer the server challenge with an AUTH_RESPONSE 1019 message. 1020 1021 4.2.8. AUTH_SUCCESS 1022 1023 Indicates the success of the authentication phase. See Section 4.2.3 for more 1024 details. 1025 1026 The body of this message is a single [bytes] token holding final information 1027 from the server that the client may require to finish the authentication 1028 process. What that token contains and whether it can be null depends on the 1029 actual authenticator used. 1030 1031 1032 5. Data Type Serialization Formats 1033 1034 This sections describes the serialization formats for all CQL data types 1035 supported by Cassandra through the native protocol. These serialization 1036 formats should be used by client drivers to encode values for EXECUTE 1037 messages. Cassandra will use these formats when returning values in 1038 RESULT messages. 1039 1040 All values are represented as [bytes] in EXECUTE and RESULT messages. 1041 The [bytes] format includes an int prefix denoting the length of the value. 1042 For that reason, the serialization formats described here will not include 1043 a length component. 1044 1045 For legacy compatibility reasons, note that most non-string types support 1046 "empty" values (i.e. a value with zero length). An empty value is distinct 1047 from NULL, which is encoded with a negative length. 1048 1049 As with the rest of the native protocol, all encodings are big-endian. 1050 1051 5.1. ascii 1052 1053 A sequence of bytes in the ASCII range [0, 127]. Bytes with values outside of 1054 this range will result in a validation error. 1055 1056 5.2 bigint 1057 1058 An eight-byte two's complement integer. 1059 1060 5.3 blob 1061 1062 Any sequence of bytes. 1063 1064 5.4 boolean 1065 1066 A single byte. A value of 0 denotes "false"; any other value denotes "true". 1067 (However, it is recommended that a value of 1 be used to represent "true".) 1068 1069 5.5 date 1070 1071 An unsigned integer representing days with epoch centered at 2^31. 1072 (unix epoch January 1st, 1970). 1073 A few examples: 1074 0: -5877641-06-23 1075 2^31: 1970-1-1 1076 2^32: 5881580-07-11 1077 1078 5.6 decimal 1079 1080 The decimal format represents an arbitrary-precision number. It contains an 1081 [int] "scale" component followed by a varint encoding (see section 5.24) 1082 of the unscaled value. The encoded value represents "<unscaled>E<-scale>". 1083 In other words, "<unscaled> * 10 ^ (-1 * <scale>)". 1084 1085 5.7 double 1086 1087 An 8 byte floating point number in the IEEE 754 binary64 format. 1088 1089 5.8 duration 1090 1091 A duration is composed of 3 signed variable length integers ([vint]s). 1092 The first [vint] represents a number of months, the second [vint] represents 1093 a number of days, and the last [vint] represents a number of nanoseconds. 1094 The number of months and days must be valid 32 bits integers whereas the 1095 number of nanoseconds must be a valid 64 bits integer. 1096 A duration can either be positive or negative. If a duration is positive 1097 all the integers must be positive or zero. If a duration is 1098 negative all the numbers must be negative or zero. 1099 1100 5.9 float 1101 1102 A 4 byte floating point number in the IEEE 754 binary32 format. 1103 1104 5.10 inet 1105 1106 A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively. 1107 1108 5.11 int 1109 1110 A 4 byte two's complement integer. 1111 1112 5.12 list 1113 1114 A [int] n indicating the number of elements in the list, followed by n 1115 elements. Each element is [bytes] representing the serialized value. 1116 1117 5.13 map 1118 1119 A [int] n indicating the number of key/value pairs in the map, followed by 1120 n entries. Each entry is composed of two [bytes] representing the key 1121 and value. 1122 1123 5.14 set 1124 1125 A [int] n indicating the number of elements in the set, followed by n 1126 elements. Each element is [bytes] representing the serialized value. 1127 1128 5.15 smallint 1129 1130 A 2 byte two's complement integer. 1131 1132 5.16 text 1133 1134 A sequence of bytes conforming to the UTF-8 specifications. 1135 1136 5.17 time 1137 1138 An 8 byte two's complement long representing nanoseconds since midnight. 1139 Valid values are in the range 0 to 86399999999999 1140 1141 5.18 timestamp 1142 1143 An 8 byte two's complement integer representing a millisecond-precision 1144 offset from the unix epoch (00:00:00, January 1st, 1970). Negative values 1145 represent a negative offset from the epoch. 1146 1147 5.19 timeuuid 1148 1149 A 16 byte sequence representing a version 1 UUID as defined by RFC 4122. 1150 1151 5.20 tinyint 1152 1153 A 1 byte two's complement integer. 1154 1155 5.21 tuple 1156 1157 A sequence of [bytes] values representing the items in a tuple. The encoding 1158 of each element depends on the data type for that position in the tuple. 1159 Null values may be represented by using length -1 for the [bytes] 1160 representation of an element. 1161 1162 5.22 uuid 1163 1164 A 16 byte sequence representing any valid UUID as defined by RFC 4122. 1165 1166 5.23 varchar 1167 1168 An alias of the "text" type. 1169 1170 5.24 varint 1171 1172 A variable-length two's complement encoding of a signed integer. 1173 1174 The following examples may help implementors of this spec: 1175 1176 Value | Encoding 1177 ------|--------- 1178 0 | 0x00 1179 1 | 0x01 1180 127 | 0x7F 1181 128 | 0x0080 1182 129 | 0x0081 1183 -1 | 0xFF 1184 -128 | 0x80 1185 -129 | 0xFF7F 1186 1187 Note that positive numbers must use a most-significant byte with a value 1188 less than 0x80, because a most-significant bit of 1 indicates a negative 1189 value. Implementors should pad positive values that have a MSB >= 0x80 1190 with a leading 0x00 byte. 1191 1192 1193 6. User Defined Types 1194 1195 This section describes the serialization format for User defined types (UDT), 1196 as described in section 4.2.5.2. 1197 1198 A UDT value is composed of successive [bytes] values, one for each field of the UDT 1199 value (in the order defined by the type). A UDT value will generally have one value 1200 for each field of the type it represents, but it is allowed to have less values than 1201 the type has fields. 1202 1203 1204 7. Result paging 1205 1206 The protocol allows for paging the result of queries. For that, the QUERY and 1207 EXECUTE messages have a <result_page_size> value that indicate the desired 1208 page size in CQL3 rows. 1209 1210 If a positive value is provided for <result_page_size>, the result set of the 1211 RESULT message returned for the query will contain at most the 1212 <result_page_size> first rows of the query result. If that first page of results 1213 contains the full result set for the query, the RESULT message (of kind `Rows`) 1214 will have the Has_more_pages flag *not* set. However, if some results are not 1215 part of the first response, the Has_more_pages flag will be set and the result 1216 will contain a <paging_state> value. In that case, the <paging_state> value 1217 should be used in a QUERY or EXECUTE message (that has the *same* query as 1218 the original one or the behavior is undefined) to retrieve the next page of 1219 results. 1220 1221 Only CQL3 queries that return a result set (RESULT message with a Rows `kind`) 1222 support paging. For other type of queries, the <result_page_size> value is 1223 ignored. 1224 1225 Note to client implementors: 1226 - While <result_page_size> can be as low as 1, it will likely be detrimental 1227 to performance to pick a value too low. A value below 100 is probably too 1228 low for most use cases. 1229 - Clients should not rely on the actual size of the result set returned to 1230 decide if there are more results to fetch or not. Instead, they should always 1231 check the Has_more_pages flag (unless they did not enable paging for the query 1232 obviously). Clients should also not assert that no result will have more than 1233 <result_page_size> results. While the current implementation always respects 1234 the exact value of <result_page_size>, we reserve the right to return 1235 slightly smaller or bigger pages in the future for performance reasons. 1236 - The <paging_state> is specific to a protocol version and drivers should not 1237 send a <paging_state> returned by a node using the protocol v3 to query a node 1238 using the protocol v4 for instance. 1239 1240 1241 8. Error codes 1242 1243 Let us recall that an ERROR message is composed of <code><message>[...] 1244 (see 4.2.1 for details). The supported error codes, as well as any additional 1245 information the message may contain after the <message> are described below: 1246 0x0000 Server error: something unexpected happened. This indicates a 1247 server-side bug. 1248 0x000A Protocol error: some client message triggered a protocol 1249 violation (for instance a QUERY message is sent before a STARTUP 1250 one has been sent) 1251 0x0100 Authentication error: authentication was required and failed. The 1252 possible reason for failing depends on the authenticator in use, 1253 which may or may not include more detail in the accompanying 1254 error message. 1255 0x1000 Unavailable exception. The rest of the ERROR message body will be 1256 <cl><required><alive> 1257 where: 1258 <cl> is the [consistency] level of the query that triggered 1259 the exception. 1260 <required> is an [int] representing the number of nodes that 1261 should be alive to respect <cl> 1262 <alive> is an [int] representing the number of replicas that 1263 were known to be alive when the request had been 1264 processed (since an unavailable exception has been 1265 triggered, there will be <alive> < <required>) 1266 0x1001 Overloaded: the request cannot be processed because the 1267 coordinator node is overloaded 1268 0x1002 Is_bootstrapping: the request was a read request but the 1269 coordinator node is bootstrapping 1270 0x1003 Truncate_error: error during a truncation error. 1271 0x1100 Write_timeout: Timeout exception during a write request. The rest 1272 of the ERROR message body will be 1273 <cl><received><blockfor><writeType><contentions> 1274 where: 1275 <cl> is the [consistency] level of the query having triggered 1276 the exception. 1277 <received> is an [int] representing the number of nodes having 1278 acknowledged the request. 1279 <blockfor> is an [int] representing the number of replicas whose 1280 acknowledgement is required to achieve <cl>. 1281 <writeType> is a [string] that describe the type of the write 1282 that timed out. The value of that string can be one 1283 of: 1284 - "SIMPLE": the write was a non-batched 1285 non-counter write. 1286 - "BATCH": the write was a (logged) batch write. 1287 If this type is received, it means the batch log 1288 has been successfully written (otherwise a 1289 "BATCH_LOG" type would have been sent instead). 1290 - "UNLOGGED_BATCH": the write was an unlogged 1291 batch. No batch log write has been attempted. 1292 - "COUNTER": the write was a counter write 1293 (batched or not). 1294 - "BATCH_LOG": the timeout occurred during the 1295 write to the batch log when a (logged) batch 1296 write was requested. 1297 - "CAS": the timeout occured during the Compare And Set write/update. 1298 - "VIEW": the timeout occured when a write involves 1299 VIEW update and failure to acqiure local view(MV) 1300 lock for key within timeout 1301 - "CDC": the timeout occured when cdc_total_space_in_mb is 1302 exceeded when doing a write to data tracked by cdc. 1303 <contentions> is a [short] that describes the number of contentions occured during the CAS operation. 1304 The field only presents when the <writeType> is "CAS". 1305 0x1200 Read_timeout: Timeout exception during a read request. The rest 1306 of the ERROR message body will be 1307 <cl><received><blockfor><data_present> 1308 where: 1309 <cl> is the [consistency] level of the query having triggered 1310 the exception. 1311 <received> is an [int] representing the number of nodes having 1312 answered the request. 1313 <blockfor> is an [int] representing the number of replicas whose 1314 response is required to achieve <cl>. Please note that 1315 it is possible to have <received> >= <blockfor> if 1316 <data_present> is false. Also in the (unlikely) 1317 case where <cl> is achieved but the coordinator node 1318 times out while waiting for read-repair acknowledgement. 1319 <data_present> is a single byte. If its value is 0, it means 1320 the replica that was asked for data has not 1321 responded. Otherwise, the value is != 0. 1322 0x1300 Read_failure: A non-timeout exception during a read request. The rest 1323 of the ERROR message body will be 1324 <cl><received><blockfor><reasonmap><data_present> 1325 where: 1326 <cl> is the [consistency] level of the query having triggered 1327 the exception. 1328 <received> is an [int] representing the number of nodes having 1329 answered the request. 1330 <blockfor> is an [int] representing the number of replicas whose 1331 acknowledgement is required to achieve <cl>. 1332 <reasonmap> is a map of endpoint to failure reason codes. This maps 1333 the endpoints of the replica nodes that failed when 1334 executing the request to a code representing the reason 1335 for the failure. The map is encoded starting with an [int] n 1336 followed by n pairs of <endpoint><failurecode> where 1337 <endpoint> is an [inetaddr] and <failurecode> is a [short]. 1338 <data_present> is a single byte. If its value is 0, it means 1339 the replica that was asked for data had not 1340 responded. Otherwise, the value is != 0. 1341 0x1400 Function_failure: A (user defined) function failed during execution. 1342 The rest of the ERROR message body will be 1343 <keyspace><function><arg_types> 1344 where: 1345 <keyspace> is the keyspace [string] of the failed function 1346 <function> is the name [string] of the failed function 1347 <arg_types> [string list] one string for each argument type (as CQL type) of the failed function 1348 0x1500 Write_failure: A non-timeout exception during a write request. The rest 1349 of the ERROR message body will be 1350 <cl><received><blockfor><reasonmap><write_type> 1351 where: 1352 <cl> is the [consistency] level of the query having triggered 1353 the exception. 1354 <received> is an [int] representing the number of nodes having 1355 answered the request. 1356 <blockfor> is an [int] representing the number of replicas whose 1357 acknowledgement is required to achieve <cl>. 1358 <reasonmap> is a map of endpoint to failure reason codes. This maps 1359 the endpoints of the replica nodes that failed when 1360 executing the request to a code representing the reason 1361 for the failure. The map is encoded starting with an [int] n 1362 followed by n pairs of <endpoint><failurecode> where 1363 <endpoint> is an [inetaddr] and <failurecode> is a [short]. 1364 <writeType> is a [string] that describes the type of the write 1365 that failed. The value of that string can be one 1366 of: 1367 - "SIMPLE": the write was a non-batched 1368 non-counter write. 1369 - "BATCH": the write was a (logged) batch write. 1370 If this type is received, it means the batch log 1371 has been successfully written (otherwise a 1372 "BATCH_LOG" type would have been sent instead). 1373 - "UNLOGGED_BATCH": the write was an unlogged 1374 batch. No batch log write has been attempted. 1375 - "COUNTER": the write was a counter write 1376 (batched or not). 1377 - "BATCH_LOG": the failure occured during the 1378 write to the batch log when a (logged) batch 1379 write was requested. 1380 - "CAS": the failure occured during the Compare And Set write/update. 1381 - "VIEW": the failure occured when a write involves 1382 VIEW update and failure to acqiure local view(MV) 1383 lock for key within timeout 1384 - "CDC": the failure occured when cdc_total_space_in_mb is 1385 exceeded when doing a write to data tracked by cdc. 1386 0x1600 CDC_WRITE_FAILURE: // todo 1387 0x1700 CAS_WRITE_UNKNOWN: An exception occured due to contended Compare And Set write/update. 1388 The CAS operation was only partially completed and the operation may or may not get completed by 1389 the contending CAS write or SERIAL/LOCAL_SERIAL read. The rest of the ERROR message body will be 1390 <cl><received><blockfor> 1391 where: 1392 <cl> is the [consistency] level of the query having triggered 1393 the exception. 1394 <received> is an [int] representing the number of nodes having 1395 acknowledged the request. 1396 <blockfor> is an [int] representing the number of replicas whose 1397 acknowledgement is required to achieve <cl>. 1398 1399 0x2000 Syntax_error: The submitted query has a syntax error. 1400 0x2100 Unauthorized: The logged user doesn't have the right to perform 1401 the query. 1402 0x2200 Invalid: The query is syntactically correct but invalid. 1403 0x2300 Config_error: The query is invalid because of some configuration issue 1404 0x2400 Already_exists: The query attempted to create a keyspace or a 1405 table that was already existing. The rest of the ERROR message 1406 body will be <ks><table> where: 1407 <ks> is a [string] representing either the keyspace that 1408 already exists, or the keyspace in which the table that 1409 already exists is. 1410 <table> is a [string] representing the name of the table that 1411 already exists. If the query was attempting to create a 1412 keyspace, <table> will be present but will be the empty 1413 string. 1414 0x2500 Unprepared: Can be thrown while a prepared statement tries to be 1415 executed if the provided prepared statement ID is not known by 1416 this host. The rest of the ERROR message body will be [short 1417 bytes] representing the unknown ID. 1418 1419 9. Changes from v4 1420 1421 * Added result set metadata id to Prepared responses (Section 4.2.5.4) 1422 * Beta protocol flag for v5 native protocol is added (Section 2.2) 1423 * <numfailures> in Read_failure and Write_failure error message bodies (Section 9) 1424 has been replaced with <reasonmap>. The <reasonmap> maps node IP addresses to 1425 a failure reason code which indicates why the request failed on that node. 1426 * Enlarged flag's bitmaps for QUERY, EXECUTE and BATCH messages from [byte] to [int] 1427 (Sections 4.1.4, 4.1.6 and 4.1.7). 1428 * Add the duration data type 1429 * Added keyspace field in QUERY, PREPARE, and BATCH messages (Sections 4.1.4, 4.1.5, and 4.1.7). 1430 * Added now_in_seconds field in QUERY, EXECUTE, and BATCH messages (Sections 4.1.4, 4.1.6, and 4.1.7). 1431 * Added [int] flags field in PREPARE message (Section 4.1.5). 1432 * Removed NO_COMPACT startup option (Section 4.1.1.) 1433 * Introduces outer framing format wrapping the "frames" of v4 and earlier, which are 1434 now referred to as "envelopes" (Sections 2.1, 2.2 and 2.3)