github.com/datastax/go-cassandra-native-protocol@v0.0.0-20220706104457-5e8aad05cf90/specs/native_protocol_v3.spec (about) 1 2 CQL BINARY PROTOCOL v3 3 4 5 Table of Contents 6 7 1. Overview 8 2. Frame header 9 2.1. version 10 2.2. flags 11 2.3. stream 12 2.4. opcode 13 2.5. length 14 3. Notations 15 4. Messages 16 4.1. Requests 17 4.1.1. STARTUP 18 4.1.2. AUTH_RESPONSE 19 4.1.3. OPTIONS 20 4.1.4. QUERY 21 4.1.5. PREPARE 22 4.1.6. EXECUTE 23 4.1.7. BATCH 24 4.1.8. REGISTER 25 4.2. Responses 26 4.2.1. ERROR 27 4.2.2. READY 28 4.2.3. AUTHENTICATE 29 4.2.4. SUPPORTED 30 4.2.5. RESULT 31 4.2.5.1. Void 32 4.2.5.2. Rows 33 4.2.5.3. Set_keyspace 34 4.2.5.4. Prepared 35 4.2.5.5. Schema_change 36 4.2.6. EVENT 37 4.2.7. AUTH_CHALLENGE 38 4.2.8. AUTH_SUCCESS 39 5. Compression 40 6. Data Type Serialization Formats 41 7. User Defined Type Serialization 42 8. Result paging 43 9. Error codes 44 10. Changes from v2 45 46 47 1. Overview 48 49 The CQL binary protocol is a frame based protocol. Frames are defined as: 50 51 0 8 16 24 32 40 52 +---------+---------+---------+---------+---------+ 53 | version | flags | stream | opcode | 54 +---------+---------+---------+---------+---------+ 55 | length | 56 +---------+---------+---------+---------+ 57 | | 58 . ... body ... . 59 . . 60 . . 61 +---------------------------------------- 62 63 The protocol is big-endian (network byte order). 64 65 Each frame contains a fixed size header (9 bytes) followed by a variable size 66 body. The header is described in Section 2. The content of the body depends 67 on the header opcode value (the body can in particular be empty for some 68 opcode values). The list of allowed opcode is defined Section 2.3 and the 69 details of each corresponding message is described Section 4. 70 71 The protocol distinguishes 2 types of frames: requests and responses. Requests 72 are those frame sent by the clients to the server, response are the ones sent 73 by the server. Note however that the protocol supports server pushes (events) 74 so responses does not necessarily come right after a client request. 75 76 Note to client implementors: clients library should always assume that the 77 body of a given frame may contain more data than what is described in this 78 document. It will however always be safe to ignore the remaining of the frame 79 body in such cases. The reason is that this may allow to sometimes extend the 80 protocol with optional features without needing to change the protocol 81 version. 82 83 84 85 2. Frame header 86 87 2.1. version 88 89 The version is a single byte that indicate both the direction of the message 90 (request or response) and the version of the protocol in use. The up-most bit 91 of version is used to define the direction of the message: 0 indicates a 92 request, 1 indicates a responses. This can be useful for protocol analyzers to 93 distinguish the nature of the packet from the direction which it is moving. 94 The rest of that byte is the protocol version (3 for the protocol defined in 95 this document). In other words, for this version of the protocol, version will 96 have one of: 97 0x03 Request frame for this protocol version 98 0x83 Response frame for this protocol version 99 100 Please note that the while every message ship with the version, only one version 101 of messages is accepted on a given connection. In other words, the first message 102 exchanged (STARTUP) sets the version for the connection for the lifetime of this 103 connection. 104 105 This document describe the version 3 of the protocol. For the changes made since 106 version 2, see Section 10. 107 108 109 2.2. flags 110 111 Flags applying to this frame. The flags have the following meaning (described 112 by the mask that allow to select them): 113 0x01: Compression flag. If set, the frame body is compressed. The actual 114 compression to use should have been set up beforehand through the 115 Startup message (which thus cannot be compressed; Section 4.1.1). 116 0x02: Tracing flag. For a request frame, this indicate the client requires 117 tracing of the request. Note that not all requests support tracing. 118 Currently, only QUERY, PREPARE and EXECUTE queries support tracing. 119 Other requests will simply ignore the tracing flag if set. If a 120 request support tracing and the tracing flag was set, the response to 121 this request will have the tracing flag set and contain tracing 122 information. 123 If a response frame has the tracing flag set, its body contains 124 a tracing ID. The tracing ID is a [uuid] and is the first thing in 125 the frame body. The rest of the body will then be the usual body 126 corresponding to the response opcode. 127 128 The rest of the flags is currently unused and ignored. 129 130 2.3. stream 131 132 A frame has a stream id (a [short] value). When sending request messages, this 133 stream id must be set by the client to a non-negative value (negative stream id 134 are reserved for streams initiated by the server; currently all EVENT messages 135 (section 4.2.6) have a streamId of -1). If a client sends a request message 136 with the stream id X, it is guaranteed that the stream id of the response to 137 that message will be X. 138 139 This allow to deal with the asynchronous nature of the protocol. If a client 140 sends multiple messages simultaneously (without waiting for responses), there 141 is no guarantee on the order of the responses. For instance, if the client 142 writes REQ_1, REQ_2, REQ_3 on the wire (in that order), the server might 143 respond to REQ_3 (or REQ_2) first. Assigning different stream id to these 3 144 requests allows the client to distinguish to which request an received answer 145 respond to. As there can only be 32768 different simultaneous streams, it is up 146 to the client to reuse stream id. 147 148 Note that clients are free to use the protocol synchronously (i.e. wait for 149 the response to REQ_N before sending REQ_N+1). In that case, the stream id 150 can be safely set to 0. Clients should also feel free to use only a subset of 151 the 32768 maximum possible stream ids if it is simpler for those 152 implementation. 153 154 2.4. opcode 155 156 An integer byte that distinguish the actual message: 157 0x00 ERROR 158 0x01 STARTUP 159 0x02 READY 160 0x03 AUTHENTICATE 161 0x05 OPTIONS 162 0x06 SUPPORTED 163 0x07 QUERY 164 0x08 RESULT 165 0x09 PREPARE 166 0x0A EXECUTE 167 0x0B REGISTER 168 0x0C EVENT 169 0x0D BATCH 170 0x0E AUTH_CHALLENGE 171 0x0F AUTH_RESPONSE 172 0x10 AUTH_SUCCESS 173 174 Messages are described in Section 4. 175 176 (Note that there is no 0x04 message in this version of the protocol) 177 178 179 2.5. length 180 181 A 4 byte integer representing the length of the body of the frame (note: 182 currently a frame is limited to 256MB in length). 183 184 185 3. Notations 186 187 To describe the layout of the frame body for the messages in Section 4, we 188 define the following: 189 190 [int] A 4 bytes signed integer 191 [long] A 8 bytes signed integer 192 [short] A 2 bytes unsigned integer 193 [string] A [short] n, followed by n bytes representing an UTF-8 194 string. 195 [long string] An [int] n, followed by n bytes representing an UTF-8 string. 196 [uuid] A 16 bytes long uuid. 197 [string list] A [short] n, followed by n [string]. 198 [bytes] A [int] n, followed by n bytes if n >= 0. If n < 0, 199 no byte should follow and the value represented is `null`. 200 [short bytes] A [short] n, followed by n bytes if n >= 0. 201 202 [option] A pair of <id><value> where <id> is a [short] representing 203 the option id and <value> depends on that option (and can be 204 of size 0). The supported id (and the corresponding <value>) 205 will be described when this is used. 206 [option list] A [short] n, followed by n [option]. 207 [inet] An address (ip and port) to a node. It consists of one 208 [byte] n, that represents the address size, followed by n 209 [byte] representing the IP address (in practice n can only be 210 either 4 (IPv4) or 16 (IPv6)), following by one [int] 211 representing the port. 212 [consistency] A consistency level specification. This is a [short] 213 representing a consistency level with the following 214 correspondance: 215 0x0000 ANY 216 0x0001 ONE 217 0x0002 TWO 218 0x0003 THREE 219 0x0004 QUORUM 220 0x0005 ALL 221 0x0006 LOCAL_QUORUM 222 0x0007 EACH_QUORUM 223 0x0008 SERIAL 224 0x0009 LOCAL_SERIAL 225 0x000A LOCAL_ONE 226 227 [string map] A [short] n, followed by n pair <k><v> where <k> and <v> 228 are [string]. 229 [string multimap] A [short] n, followed by n pair <k><v> where <k> is a 230 [string] and <v> is a [string list]. 231 232 233 4. Messages 234 235 4.1. Requests 236 237 Note that outside of their normal responses (described below), all requests 238 can get an ERROR message (Section 4.2.1) as response. 239 240 4.1.1. STARTUP 241 242 Initialize the connection. The server will respond by either a READY message 243 (in which case the connection is ready for queries) or an AUTHENTICATE message 244 (in which case credentials will need to be provided using AUTH_RESPONSE). 245 246 This must be the first message of the connection, except for OPTIONS that can 247 be sent before to find out the options supported by the server. Once the 248 connection has been initialized, a client should not send any more STARTUP 249 message. 250 251 The body is a [string map] of options. Possible options are: 252 - "CQL_VERSION": the version of CQL to use. This option is mandatory and 253 currenty, the only version supported is "3.0.0". Note that this is 254 different from the protocol version. 255 - "COMPRESSION": the compression algorithm to use for frames (See section 5). 256 This is optional, if not specified no compression will be used. 257 258 259 4.1.2. AUTH_RESPONSE 260 261 Answers a server authentication challenge. 262 263 Authentication in the protocol is SASL based. The server sends authentication 264 challenges (a bytes token) to which the client answer with this message. Those 265 exchanges continue until the server accepts the authentication by sending a 266 AUTH_SUCCESS message after a client AUTH_RESPONSE. It is however that client that 267 initiate the exchange by sending an initial AUTH_RESPONSE in response to a 268 server AUTHENTICATE request. 269 270 The body of this message is a single [bytes] token. The details of what this 271 token contains (and when it can be null/empty, if ever) depends on the actual 272 authenticator used. 273 274 The response to a AUTH_RESPONSE is either a follow-up AUTH_CHALLENGE message, 275 an AUTH_SUCCESS message or an ERROR message. 276 277 278 4.1.3. OPTIONS 279 280 Asks the server to return what STARTUP options are supported. The body of an 281 OPTIONS message should be empty and the server will respond with a SUPPORTED 282 message. 283 284 285 4.1.4. QUERY 286 287 Performs a CQL query. The body of the message must be: 288 <query><query_parameters> 289 where <query> is a [long string] representing the query and 290 <query_parameters> must be 291 <consistency><flags>[<n>[name_1]<value_1>...[name_n]<value_n>][<result_page_size>][<paging_state>][<serial_consistency>][<timestamp>] 292 where: 293 - <consistency> is the [consistency] level for the operation. 294 - <flags> is a [byte] whose bits define the options for this query and 295 in particular influence what the remainder of the message contains. 296 A flag is set if the bit corresponding to its `mask` is set. Supported 297 flags are, given there mask: 298 0x01: Values. In that case, a [short] <n> followed by <n> [bytes] 299 values are provided. Those value are used for bound variables in 300 the query. Optionally, if the 0x40 flag is present, each value 301 will be preceded by a [string] name, representing the name of 302 the marker the value must be binded to. This is optional, and 303 if not present, values will be binded by position. 304 0x02: Skip_metadata. If present, the Result Set returned as a response 305 to that query (if any) will have the NO_METADATA flag (see 306 Section 4.2.5.2). 307 0x04: Page_size. In that case, <result_page_size> is an [int] 308 controlling the desired page size of the result (in CQL3 rows). 309 See the section on paging (Section 8) for more details. 310 0x08: With_paging_state. If present, <paging_state> should be present. 311 <paging_state> is a [bytes] value that should have been returned 312 in a result set (Section 4.2.5.2). If provided, the query will be 313 executed but starting from a given paging state. This also to 314 continue paging on a different node from the one it has been 315 started (See Section 8 for more details). 316 0x10: With serial consistency. If present, <serial_consistency> should be 317 present. <serial_consistency> is the [consistency] level for the 318 serial phase of conditional updates. That consitency can only be 319 either SERIAL or LOCAL_SERIAL and if not present, it defaults to 320 SERIAL. This option will be ignored for anything else that a 321 conditional update/insert. 322 0x20: With default timestamp. If present, <timestamp> should be present. 323 <timestamp> is a [long] representing the default timestamp for the query 324 in microseconds (negative values are discouraged but supported for 325 backward compatibility reasons except for the smallest negative 326 value (-2^63) that is forbidden). If provided, this will 327 replace the server side assigned timestamp as default timestamp. 328 Note that a timestamp in the query itself will still override 329 this timestamp. This is entirely optional. 330 0x40: With names for values. This only makes sense if the 0x01 flag is set and 331 is ignored otherwise. If present, the values from the 0x01 flag will 332 be preceded by a name (see above). Note that this is only useful for 333 QUERY requests where named bind markers are used; for EXECUTE statements, 334 since the names for the expected values was returned during preparation, 335 a client can always provide values in the right order without any names 336 and using this flag, while supported, is almost surely inefficient. 337 338 Note that the consistency is ignored by some queries (USE, CREATE, ALTER, 339 TRUNCATE, ...). 340 341 The server will respond to a QUERY message with a RESULT message, the content 342 of which depends on the query. 343 344 345 4.1.5. PREPARE 346 347 Prepare a query for later execution (through EXECUTE). The body consists of 348 the CQL query to prepare as a [long string]. 349 350 The server will respond with a RESULT message with a `prepared` kind (0x0004, 351 see Section 4.2.5). 352 353 354 4.1.6. EXECUTE 355 356 Executes a prepared query. The body of the message must be: 357 <id><query_parameters> 358 where <id> is the prepared query ID. It's the [short bytes] returned as a 359 response to a PREPARE message. As for <query_parameters>, it has the exact 360 same definition than in QUERY (see Section 4.1.4). 361 362 The response from the server will be a RESULT message. 363 364 365 4.1.7. BATCH 366 367 Allows executing a list of queries (prepared or not) as a batch (note that 368 only DML statements are accepted in a batch). The body of the message must 369 be: 370 <type><n><query_1>...<query_n><consistency><flags>[<serial_consistency>][<timestamp>] 371 where: 372 - <type> is a [byte] indicating the type of batch to use: 373 - If <type> == 0, the batch will be "logged". This is equivalent to a 374 normal CQL3 batch statement. 375 - If <type> == 1, the batch will be "unlogged". 376 - If <type> == 2, the batch will be a "counter" batch (and non-counter 377 statements will be rejected). 378 - <flags> is a [byte] whose bits define the options for this query and 379 in particular influence the remainder of the message contains. It is similar 380 to the <flags> from QUERY and EXECUTE methods, except that the 4 rightmost 381 bits must always be 0 as their corresponding option do not make sense for 382 Batch. A flag is set if the bit corresponding to its `mask` is set. Supported 383 flags are, given there mask: 384 0x10: With serial consistency. If present, <serial_consistency> should be 385 present. <serial_consistency> is the [consistency] level for the 386 serial phase of conditional updates. That consitency can only be 387 either SERIAL or LOCAL_SERIAL and if not present, it defaults to 388 SERIAL. This option will be ignored for anything else that a 389 conditional update/insert. 390 0x20: With default timestamp. If present, <timestamp> should be present. 391 <timestamp> is a [long] representing the default timestamp for the query 392 in microseconds. If provided, this will replace the server side assigned 393 timestamp as default timestamp. Note that a timestamp in the query itself 394 will still override this timestamp. This is entirely optional. 395 0x40: With names for values. If set, then all values for all <query_i> must be 396 preceded by a [string] <name_i> that have the same meaning as in QUERY 397 requests [IMPORTANT NOTE: this feature does not work and should not be 398 used. It is specified in a way that makes it impossible for the server 399 to implement. This will be fixed in a future version of the native 400 protocol. See https://issues.apache.org/jira/browse/CASSANDRA-10246 for 401 more details]. 402 - <n> is a [short] indicating the number of following queries. 403 - <query_1>...<query_n> are the queries to execute. A <query_i> must be of the 404 form: 405 <kind><string_or_id><n>[<name_1>]<value_1>...[<name_n>]<value_n> 406 where: 407 - <kind> is a [byte] indicating whether the following query is a prepared 408 one or not. <kind> value must be either 0 or 1. 409 - <string_or_id> depends on the value of <kind>. If <kind> == 0, it should be 410 a [long string] query string (as in QUERY, the query string might contain 411 bind markers). Otherwise (that is, if <kind> == 1), it should be a 412 [short bytes] representing a prepared query ID. 413 - <n> is a [short] indicating the number (possibly 0) of following values. 414 - <name_i> is the optional name of the following <value_i>. It must be present 415 if and only if the 0x40 flag is provided for the batch. 416 - <value_i> is the [bytes] to use for bound variable i (of bound variable <name_i> 417 if the 0x40 flag is used). 418 - <consistency> is the [consistency] level for the operation. 419 - <serial_consistency> is only present if the 0x10 flag is set. In that case, 420 <serial_consistency> is the [consistency] level for the serial phase of 421 conditional updates. That consitency can only be either SERIAL or 422 LOCAL_SERIAL and if not present will defaults to SERIAL. This option will 423 be ignored for anything else that a conditional update/insert. 424 425 The server will respond with a RESULT message. 426 427 428 4.1.8. REGISTER 429 430 Register this connection to receive some type of events. The body of the 431 message is a [string list] representing the event types to register to. See 432 section 4.2.6 for the list of valid event types. 433 434 The response to a REGISTER message will be a READY message. 435 436 Please note that if a client driver maintains multiple connections to a 437 Cassandra node and/or connections to multiple nodes, it is advised to 438 dedicate a handful of connections to receive events, but to *not* register 439 for events on all connections, as this would only result in receiving 440 multiple times the same event messages, wasting bandwidth. 441 442 443 4.2. Responses 444 445 This section describes the content of the frame body for the different 446 responses. Please note that to make room for future evolution, clients should 447 support extra informations (that they should simply discard) to the one 448 described in this document at the end of the frame body. 449 450 4.2.1. ERROR 451 452 Indicates an error processing a request. The body of the message will be an 453 error code ([int]) followed by a [string] error message. Then, depending on 454 the exception, more content may follow. The error codes are defined in 455 Section 9, along with their additional content if any. 456 457 458 4.2.2. READY 459 460 Indicates that the server is ready to process queries. This message will be 461 sent by the server either after a STARTUP message if no authentication is 462 required, or after a successful CREDENTIALS message. 463 464 The body of a READY message is empty. 465 466 467 4.2.3. AUTHENTICATE 468 469 Indicates that the server require authentication, and which authentication 470 mechanism to use. 471 472 The authentication is SASL based and thus consists on a number of server 473 challenges (AUTH_CHALLENGE, Section 4.2.7) followed by client responses 474 (AUTH_RESPONSE, Section 4.1.2). The Initial exchange is however boostrapped 475 by an initial client response. The details of that exchange (including how 476 much challenge-response pair are required) are specific to the authenticator 477 in use. The exchange ends when the server sends an AUTH_SUCCESS message or 478 an ERROR message. 479 480 This message will be sent following a STARTUP message if authentication is 481 required and must be answered by a AUTH_RESPONSE message from the client. 482 483 The body consists of a single [string] indicating the full class name of the 484 IAuthenticator in use. 485 486 487 4.2.4. SUPPORTED 488 489 Indicates which startup options are supported by the server. This message 490 comes as a response to an OPTIONS message. 491 492 The body of a SUPPORTED message is a [string multimap]. This multimap gives 493 for each of the supported STARTUP options, the list of supported values. 494 495 496 4.2.5. RESULT 497 498 The result to a query (QUERY, PREPARE, EXECUTE or BATCH messages). 499 500 The first element of the body of a RESULT message is an [int] representing the 501 `kind` of result. The rest of the body depends on the kind. The kind can be 502 one of: 503 0x0001 Void: for results carrying no information. 504 0x0002 Rows: for results to select queries, returning a set of rows. 505 0x0003 Set_keyspace: the result to a `use` query. 506 0x0004 Prepared: result to a PREPARE message. 507 0x0005 Schema_change: the result to a schema altering query. 508 509 The body for each kind (after the [int] kind) is defined below. 510 511 512 4.2.5.1. Void 513 514 The rest of the body for a Void result is empty. It indicates that a query was 515 successful without providing more information. 516 517 518 4.2.5.2. Rows 519 520 Indicates a set of rows. The rest of body of a Rows result is: 521 <metadata><rows_count><rows_content> 522 where: 523 - <metadata> is composed of: 524 <flags><columns_count>[<paging_state>][<global_table_spec>?<col_spec_1>...<col_spec_n>] 525 where: 526 - <flags> is an [int]. The bits of <flags> provides information on the 527 formatting of the remaining informations. A flag is set if the bit 528 corresponding to its `mask` is set. Supported flags are, given there 529 mask: 530 0x0001 Global_tables_spec: if set, only one table spec (keyspace 531 and table name) is provided as <global_table_spec>. If not 532 set, <global_table_spec> is not present. 533 0x0002 Has_more_pages: indicates whether this is not the last 534 page of results and more should be retrieve. If set, the 535 <paging_state> will be present. The <paging_state> is a 536 [bytes] value that should be used in QUERY/EXECUTE to 537 continue paging and retrieve the remained of the result for 538 this query (See Section 8 for more details). 539 0x0004 No_metadata: if set, the <metadata> is only composed of 540 these <flags>, the <column_count> and optionally the 541 <paging_state> (depending on the Has_more_pages flage) but 542 no other information (so no <global_table_spec> nor <col_spec_i>). 543 This will only ever be the case if this was requested 544 during the query (see QUERY and RESULT messages). 545 - <columns_count> is an [int] representing the number of columns selected 546 by the query this result is of. It defines the number of <col_spec_i> 547 elements in and the number of element for each row in <rows_content>. 548 - <global_table_spec> is present if the Global_tables_spec is set in 549 <flags>. If present, it is composed of two [string] representing the 550 (unique) keyspace name and table name the columns return are of. 551 - <col_spec_i> specifies the columns returned in the query. There is 552 <column_count> such column specifications that are composed of: 553 (<ksname><tablename>)?<name><type> 554 The initial <ksname> and <tablename> are two [string] are only present 555 if the Global_tables_spec flag is not set. The <column_name> is a 556 [string] and <type> is an [option] that correspond to the description 557 (what this description is depends a bit on the context: in results to 558 selects, this will be either the user chosen alias or the selection used 559 (often a colum name, but it can be a function call too). In results to 560 a PREPARE, this will be either the name of the bind variable corresponding 561 or the column name for the variable if it is "anonymous") and type of 562 the corresponding result. The option for <type> is either a native 563 type (see below), in which case the option has no value, or a 564 'custom' type, in which case the value is a [string] representing 565 the full qualified class name of the type represented. Valid option 566 ids are: 567 0x0000 Custom: the value is a [string], see above. 568 0x0001 Ascii 569 0x0002 Bigint 570 0x0003 Blob 571 0x0004 Boolean 572 0x0005 Counter 573 0x0006 Decimal 574 0x0007 Double 575 0x0008 Float 576 0x0009 Int 577 0x000B Timestamp 578 0x000C Uuid 579 0x000D Varchar 580 0x000E Varint 581 0x000F Timeuuid 582 0x0010 Inet 583 0x0020 List: the value is an [option], representing the type 584 of the elements of the list. 585 0x0021 Map: the value is two [option], representing the types of the 586 keys and values of the map 587 0x0022 Set: the value is an [option], representing the type 588 of the elements of the set 589 0x0030 UDT: the value is <ks><udt_name><n><name_1><type_1>...<name_n><type_n> 590 where: 591 - <ks> is a [string] representing the keyspace name this 592 UDT is part of. 593 - <udt_name> is a [string] representing the UDT name. 594 - <n> is a [short] reprensenting the number of fields of 595 the UDT, and thus the number of <name_i><type_i> pair 596 following 597 - <name_i> is a [string] representing the name of the 598 i_th field of the UDT. 599 - <type_i> is an [option] representing the type of the 600 i_th field of the UDT. 601 0x0031 Tuple: the value is <n><type_1>...<type_n> where <n> is a [short] 602 representing the number of value in the type, and <type_i> 603 are [option] representing the type of the i_th component 604 of the tuple 605 606 - <rows_count> is an [int] representing the number of rows present in this 607 result. Those rows are serialized in the <rows_content> part. 608 - <rows_content> is composed of <row_1>...<row_m> where m is <rows_count>. 609 Each <row_i> is composed of <value_1>...<value_n> where n is 610 <columns_count> and where <value_j> is a [bytes] representing the value 611 returned for the jth column of the ith row. In other words, <rows_content> 612 is composed of (<rows_count> * <columns_count>) [bytes]. 613 614 615 4.2.5.3. Set_keyspace 616 617 The result to a `use` query. The body (after the kind [int]) is a single 618 [string] indicating the name of the keyspace that has been set. 619 620 621 4.2.5.4. Prepared 622 623 The result to a PREPARE message. The rest of the body of a Prepared result is: 624 <id><metadata><result_metadata> 625 where: 626 - <id> is [short bytes] representing the prepared query ID. 627 - <metadata> is defined exactly as for a Rows RESULT (See section 4.2.5.2; you 628 can however assume that the Has_more_pages flag is always off) and 629 is the specification for the variable bound in this prepare statement. 630 - <result_metadata> is defined exactly as <metadata> but correspond to the 631 metadata for the resultSet that execute this query will yield. Note that 632 <result_metadata> may be empty (have the No_metadata flag and 0 columns, See 633 section 4.2.5.2) and will be for any query that is not a Select. There is 634 in fact never a guarantee that this will non-empty so client should protect 635 themselves accordingly. The presence of this information is an 636 optimization that allows to later execute the statement that has been 637 prepared without requesting the metadata (Skip_metadata flag in EXECUTE). 638 Clients can safely discard this metadata if they do not want to take 639 advantage of that optimization. 640 641 Note that prepared query ID return is global to the node on which the query 642 has been prepared. It can be used on any connection to that node and this 643 until the node is restarted (after which the query must be reprepared). 644 645 4.2.5.5. Schema_change 646 647 The result to a schema altering query (creation/update/drop of a 648 keyspace/table/index). The body (after the kind [int]) is the same 649 as the body for a "SCHEMA_CHANGE" event, so 3 strings: 650 <change_type><target><options> 651 Please refer to the section 4.2.6 below for the meaning of those fields. 652 653 Note that queries to create and drop an index are considered as change 654 updating the table the index is on. 655 656 657 4.2.6. EVENT 658 659 And event pushed by the server. A client will only receive events for the 660 type it has REGISTER to. The body of an EVENT message will start by a 661 [string] representing the event type. The rest of the message depends on the 662 event type. The valid event types are: 663 - "TOPOLOGY_CHANGE": events related to change in the cluster topology. 664 Currently, events are sent when new nodes are added to the cluster, and 665 when nodes are removed. The body of the message (after the event type) 666 consists of a [string] and an [inet], corresponding respectively to the 667 type of change ("NEW_NODE", "REMOVED_NODE", or "MOVED_NODE") followed 668 by the address of the new/removed/moved node. 669 - "STATUS_CHANGE": events related to change of node status. Currently, 670 up/down events are sent. The body of the message (after the event type) 671 consists of a [string] and an [inet], corresponding respectively to the 672 type of status change ("UP" or "DOWN") followed by the address of the 673 concerned node. 674 - "SCHEMA_CHANGE": events related to schema change. After the event type, 675 the rest of the message will be <change_type><target><options> where: 676 - <change_type> is a [string] representing the type of changed involved. 677 It will be one of "CREATED", "UPDATED" or "DROPPED". 678 - <target> is a [string] that can be one of "KEYSPACE", "TABLE" or "TYPE" 679 and describes what has been modified ("TYPE" stands for modifications 680 related to user types). 681 - <options> depends on the preceding <target>. If <target> is 682 "KEYSPACE", then <options> will be a single [string] representing the 683 keyspace changed. Otherwise, if <target> is "TABLE" or "TYPE", then 684 <options> will be 2 [string]: the first one will be the keyspace 685 containing the affected object, and the second one will be the name 686 of said affected object (so either the table name or the user type 687 name). 688 689 All EVENT message have a streamId of -1 (Section 2.3). 690 691 Please note that "NEW_NODE" and "UP" events are sent based on internal Gossip 692 communication and as such may be sent a short delay before the binary 693 protocol server on the newly up node is fully started. Clients are thus 694 advise to wait a short time before trying to connect to the node (1 seconds 695 should be enough), otherwise they may experience a connection refusal at 696 first. 697 698 It is possible for the same event to be sent multiple times. Therefore, 699 a client library should ignore the same event if it has already been notified 700 of a change. 701 702 4.2.7. AUTH_CHALLENGE 703 704 A server authentication challenge (see AUTH_RESPONSE (Section 4.1.2) for more 705 details). 706 707 The body of this message is a single [bytes] token. The details of what this 708 token contains (and when it can be null/empty, if ever) depends on the actual 709 authenticator used. 710 711 Clients are expected to answer the server challenge by an AUTH_RESPONSE 712 message. 713 714 4.2.7. AUTH_SUCCESS 715 716 Indicate the success of the authentication phase. See Section 4.2.3 for more 717 details. 718 719 The body of this message is a single [bytes] token holding final information 720 from the server that the client may require to finish the authentication 721 process. What that token contains and whether it can be null depends on the 722 actual authenticator used. 723 724 725 5. Compression 726 727 Frame compression is supported by the protocol, but then only the frame body 728 is compressed (the frame header should never be compressed). 729 730 Before being used, client and server must agree on a compression algorithm to 731 use, which is done in the STARTUP message. As a consequence, a STARTUP message 732 must never be compressed. However, once the STARTUP frame has been received 733 by the server can be compressed (including the response to the STARTUP 734 request). Frame do not have to be compressed however, even if compression has 735 been agreed upon (a server may only compress frame above a certain size at its 736 discretion). A frame body should be compressed if and only if the compressed 737 flag (see Section 2.2) is set. 738 739 As of this version 2 of the protocol, the following compressions are available: 740 - lz4 (https://code.google.com/p/lz4/). In that, note that the 4 first bytes 741 of the body will be the uncompressed length (followed by the compressed 742 bytes). 743 - snappy (https://code.google.com/p/snappy/). This compression might not be 744 available as it depends on a native lib (server-side) that might not be 745 avaivable on some installation. 746 747 748 6. Data Type Serialization Formats 749 750 This sections describes the serialization formats for all CQL data types 751 supported by Cassandra through the native protocol. These serialization 752 formats should be used by client drivers to encode values for EXECUTE 753 messages. Cassandra will use these formats when returning values in 754 RESULT messages. 755 756 All values are represented as [bytes] in EXECUTE and RESULT messages. 757 The [bytes] format includes an int prefix denoting the length of the value. 758 For that reason, the serialization formats described here will not include 759 a length component. 760 761 For legacy compatibility reasons, note that most non-string types support 762 "empty" values (i.e. a value with zero length). An empty value is distinct 763 from NULL, which is encoded with a negative length. 764 765 As with the rest of the native protocol, all encodings are big-endian. 766 767 6.1. ascii 768 769 A sequence of bytes in the ASCII range [0, 127]. Bytes with values outside of 770 this range will result in a validation error. 771 772 6.2 bigint 773 774 An eight-byte two's complement integer. 775 776 6.3 blob 777 778 Any sequence of bytes. 779 780 6.4 boolean 781 782 A single byte. A value of 0 denotes "false"; any other value denotes "true". 783 (However, it is recommended that a value of 1 be used to represent "true".) 784 785 6.5 decimal 786 787 The decimal format represents an arbitrary-precision number. It contains an 788 [int] "scale" component followed by a varint encoding (see section 6.17) 789 of the unscaled value. The encoded value represents "<unscaled>E<-scale>". 790 In other words, "<unscaled> * 10 ^ (-1 * <scale>)". 791 792 6.6 double 793 794 An eight-byte floating point number in the IEEE 754 binary64 format. 795 796 6.7 float 797 798 An four-byte floating point number in the IEEE 754 binary32 format. 799 800 6.8 inet 801 802 A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively. 803 804 6.9 int 805 806 A four-byte two's complement integer. 807 808 6.10 list 809 810 A [int] n indicating the number of elements in the list, followed by n 811 elements. Each element is [bytes] representing the serialized value. 812 813 6.11 map 814 815 A [int] n indicating the number of key/value pairs in the map, followed by 816 n entries. Each entry is composed of two [bytes] representing the key 817 and value. 818 819 6.12 set 820 821 A [int] n indicating the number of elements in the set, followed by n 822 elements. Each element is [bytes] representing the serialized value. 823 824 6.13 text 825 826 A sequence of bytes conforming to the UTF-8 specifications. 827 828 6.14 timestamp 829 830 An eight-byte two's complement integer representing a millisecond-precision 831 offset from the unix epoch (00:00:00, January 1st, 1970). Negative values 832 represent a negative offset from the epoch. 833 834 6.15 uuid 835 836 A 16 byte sequence representing any valid UUID as defined by RFC 4122. 837 838 6.16 varchar 839 840 An alias of the "text" type. 841 842 6.17 varint 843 844 A variable-length two's complement encoding of a signed integer. 845 846 The following examples may help implementors of this spec: 847 848 Value | Encoding 849 ------|--------- 850 0 | 0x00 851 1 | 0x01 852 127 | 0x7F 853 128 | 0x0080 854 129 | 0x0081 855 -1 | 0xFF 856 -128 | 0x80 857 -129 | 0xFF7F 858 859 Note that positive numbers must use a most-significant byte with a value 860 less than 0x80, because a most-significant bit of 1 indicates a negative 861 value. Implementors should pad positive values that have a MSB >= 0x80 862 with a leading 0x00 byte. 863 864 6.18 timeuuid 865 866 A 16 byte sequence representing a version 1 UUID as defined by RFC 4122. 867 868 6.19 tuple 869 870 A sequence of [bytes] values representing the items in a tuple. The encoding 871 of each element depends on the data type for that position in the tuple. 872 Null values may be represented by using length -1 for the [bytes] 873 representation of an element. 874 875 Within a tuple, all data types should use the v3 protocol serialization format. 876 877 878 7. User Defined Types 879 880 This section describes the serialization format for User defined types (UDT), 881 as described in section 4.2.5.2. 882 883 A UDT value is composed of successive [bytes] values, one for each field of the UDT 884 value (in the order defined by the type). A UDT value will generally have one value 885 for each field of the type it represents, but it is allowed to have less values than 886 the type has fields. 887 888 Within a user-defined type value, all data types should use the v3 protocol 889 serialization format. 890 891 892 8. Result paging 893 894 The protocol allows for paging the result of queries. For that, the QUERY and 895 EXECUTE messages have a <result_page_size> value that indicate the desired 896 page size in CQL3 rows. 897 898 If a positive value is provided for <result_page_size>, the result set of the 899 RESULT message returned for the query will contain at most the 900 <result_page_size> first rows of the query result. If that first page of result 901 contains the full result set for the query, the RESULT message (of kind `Rows`) 902 will have the Has_more_pages flag *not* set. However, if some results are not 903 part of the first response, the Has_more_pages flag will be set and the result 904 will contain a <paging_state> value. In that case, the <paging_state> value 905 should be used in a QUERY or EXECUTE message (that has the *same* query than 906 the original one or the behavior is undefined) to retrieve the next page of 907 results. 908 909 Only CQL3 queries that return a result set (RESULT message with a Rows `kind`) 910 support paging. For other type of queries, the <result_page_size> value is 911 ignored. 912 913 Note to client implementors: 914 - While <result_page_size> can be as low as 1, it will likely be detrimental 915 to performance to pick a value too low. A value below 100 is probably too 916 low for most use cases. 917 - Clients should not rely on the actual size of the result set returned to 918 decide if there is more result to fetch or not. Instead, they should always 919 check the Has_more_pages flag (unless they did not enabled paging for the query 920 obviously). Clients should also not assert that no result will have more than 921 <result_page_size> results. While the current implementation always respect 922 the exact value of <result_page_size>, we reserve ourselves the right to return 923 slightly smaller or bigger pages in the future for performance reasons. 924 925 926 9. Error codes 927 928 The supported error codes are described below: 929 0x0000 Server error: something unexpected happened. This indicates a 930 server-side bug. 931 0x000A Protocol error: some client message triggered a protocol 932 violation (for instance a QUERY message is sent before a STARTUP 933 one has been sent) 934 0x0100 Bad credentials: CREDENTIALS request failed because Cassandra 935 did not accept the provided credentials. 936 937 0x1000 Unavailable exception. The rest of the ERROR message body will be 938 <cl><required><alive> 939 where: 940 <cl> is the [consistency] level of the query having triggered 941 the exception. 942 <required> is an [int] representing the number of node that 943 should be alive to respect <cl> 944 <alive> is an [int] representing the number of replica that 945 were known to be alive when the request has been 946 processed (since an unavailable exception has been 947 triggered, there will be <alive> < <required>) 948 0x1001 Overloaded: the request cannot be processed because the 949 coordinator node is overloaded 950 0x1002 Is_bootstrapping: the request was a read request but the 951 coordinator node is bootstrapping 952 0x1003 Truncate_error: error during a truncation error. 953 0x1100 Write_timeout: Timeout exception during a write request. The rest 954 of the ERROR message body will be 955 <cl><received><blockfor><writeType> 956 where: 957 <cl> is the [consistency] level of the query having triggered 958 the exception. 959 <received> is an [int] representing the number of nodes having 960 acknowledged the request. 961 <blockfor> is an [int] representing the number of replica whose 962 acknowledgement is required to achieve <cl>. 963 <writeType> is a [string] that describe the type of the write 964 that timeouted. The value of that string can be one 965 of: 966 - "SIMPLE": the write was a non-batched 967 non-counter write. 968 - "BATCH": the write was a (logged) batch write. 969 If this type is received, it means the batch log 970 has been successfully written (otherwise a 971 "BATCH_LOG" type would have been send instead). 972 - "UNLOGGED_BATCH": the write was an unlogged 973 batch. Not batch log write has been attempted. 974 - "COUNTER": the write was a counter write 975 (batched or not). 976 - "BATCH_LOG": the timeout occured during the 977 write to the batch log when a (logged) batch 978 write was requested. 979 0x1200 Read_timeout: Timeout exception during a read request. The rest 980 of the ERROR message body will be 981 <cl><received><blockfor><data_present> 982 where: 983 <cl> is the [consistency] level of the query having triggered 984 the exception. 985 <received> is an [int] representing the number of nodes having 986 answered the request. 987 <blockfor> is an [int] representing the number of replica whose 988 response is required to achieve <cl>. Please note that 989 it is possible to have <received> >= <blockfor> if 990 <data_present> is false. And also in the (unlikely) 991 case were <cl> is achieved but the coordinator node 992 timeout while waiting for read-repair 993 acknowledgement. 994 <data_present> is a single byte. If its value is 0, it means 995 the replica that was asked for data has not 996 responded. Otherwise, the value is != 0. 997 998 0x2000 Syntax_error: The submitted query has a syntax error. 999 0x2100 Unauthorized: The logged user doesn't have the right to perform 1000 the query. 1001 0x2200 Invalid: The query is syntactically correct but invalid. 1002 0x2300 Config_error: The query is invalid because of some configuration issue 1003 0x2400 Already_exists: The query attempted to create a keyspace or a 1004 table that was already existing. The rest of the ERROR message 1005 body will be <ks><table> where: 1006 <ks> is a [string] representing either the keyspace that 1007 already exists, or the keyspace in which the table that 1008 already exists is. 1009 <table> is a [string] representing the name of the table that 1010 already exists. If the query was attempting to create a 1011 keyspace, <table> will be present but will be the empty 1012 string. 1013 0x2500 Unprepared: Can be thrown while a prepared statement tries to be 1014 executed if the provide prepared statement ID is not known by 1015 this host. The rest of the ERROR message body will be [short 1016 bytes] representing the unknown ID. 1017 1018 10. Changes from v2 1019 * stream id is now 2 bytes long (a [short] value), so the header is now 1 byte longer (9 bytes total). 1020 * BATCH messages now have <flags> (like QUERY and EXECUTE) and a corresponding optional 1021 <serial_consistency> parameters (see Section 4.1.7). 1022 * User Defined Types and tuple types have to added to ResultSet metadata (see 4.2.5.2) and a 1023 new section on the serialization format of UDT and tuple values has been added to the documentation 1024 (Section 7). 1025 * The serialization format for collection has changed (both the collection size and 1026 the length of each argument is now 4 bytes long). See Section 6. 1027 * QUERY, EXECUTE and BATCH messages can now optionally provide the default timestamp for the query. 1028 As this feature is optionally enabled by clients, implementing it is at the discretion of the 1029 client. 1030 * QUERY and EXECUTE messages can now optionally provide the names for the values of the 1031 query. As this feature is optionally enabled by clients, implementing it is at the discretion of the 1032 client (Note that while the BATCH message has a flag for this, it actually doesn't work for BATCH, 1033 see Section 4.1.7 for details). 1034 * The format of "Schema_change" results (Section 4.2.5.5) and "SCHEMA_CHANGE" events (Section 4.2.6) 1035 has been modified, and now includes changes related to user types. 1036