github.com/datastax/go-cassandra-native-protocol@v0.0.0-20220706104457-5e8aad05cf90/specs/native_protocol_v2.spec (about) 1 2 CQL BINARY PROTOCOL v2 3 4 5 Table of Contents 6 7 1. Overview 8 2. Frame header 9 2.1. version 10 2.2. flags 11 2.3. stream 12 2.4. opcode 13 2.5. length 14 3. Notations 15 4. Messages 16 4.1. Requests 17 4.1.1. STARTUP 18 4.1.2. AUTH_RESPONSE 19 4.1.3. OPTIONS 20 4.1.4. QUERY 21 4.1.5. PREPARE 22 4.1.6. EXECUTE 23 4.1.7. BATCH 24 4.1.8. REGISTER 25 4.2. Responses 26 4.2.1. ERROR 27 4.2.2. READY 28 4.2.3. AUTHENTICATE 29 4.2.4. SUPPORTED 30 4.2.5. RESULT 31 4.2.5.1. Void 32 4.2.5.2. Rows 33 4.2.5.3. Set_keyspace 34 4.2.5.4. Prepared 35 4.2.5.5. Schema_change 36 4.2.6. EVENT 37 4.2.7. AUTH_CHALLENGE 38 4.2.8. AUTH_SUCCESS 39 5. Compression 40 6. Collection types 41 7. Result paging 42 8. Error codes 43 9. Changes from v1 44 45 46 1. Overview 47 48 The CQL binary protocol is a frame based protocol. Frames are defined as: 49 50 0 8 16 24 32 51 +---------+---------+---------+---------+ 52 | version | flags | stream | opcode | 53 +---------+---------+---------+---------+ 54 | length | 55 +---------+---------+---------+---------+ 56 | | 57 . ... body ... . 58 . . 59 . . 60 +---------------------------------------- 61 62 The protocol is big-endian (network byte order). 63 64 Each frame contains a fixed size header (8 bytes) followed by a variable size 65 body. The header is described in Section 2. The content of the body depends 66 on the header opcode value (the body can in particular be empty for some 67 opcode values). The list of allowed opcode is defined Section 2.3 and the 68 details of each corresponding message is described Section 4. 69 70 The protocol distinguishes 2 types of frames: requests and responses. Requests 71 are those frame sent by the clients to the server, response are the ones sent 72 by the server. Note however that the protocol supports server pushes (events) 73 so responses does not necessarily come right after a client request. 74 75 Note to client implementors: clients library should always assume that the 76 body of a given frame may contain more data than what is described in this 77 document. It will however always be safe to ignore the remaining of the frame 78 body in such cases. The reason is that this may allow to sometimes extend the 79 protocol with optional features without needing to change the protocol 80 version. 81 82 83 84 2. Frame header 85 86 2.1. version 87 88 The version is a single byte that indicate both the direction of the message 89 (request or response) and the version of the protocol in use. The up-most bit 90 of version is used to define the direction of the message: 0 indicates a 91 request, 1 indicates a responses. This can be useful for protocol analyzers to 92 distinguish the nature of the packet from the direction which it is moving. 93 The rest of that byte is the protocol version (2 for the protocol defined in 94 this document). In other words, for this version of the protocol, version will 95 have one of: 96 0x02 Request frame for this protocol version 97 0x82 Response frame for this protocol version 98 99 Please note that the while every message ship with the version, only one version 100 of messages is accepted on a given connection. In other words, the first message 101 exchanged (STARTUP) sets the version for the connection for the lifetime of this 102 connection. 103 104 This document describe the version 2 of the protocol. For the changes made since 105 version 1, see Section 9. 106 107 108 2.2. flags 109 110 Flags applying to this frame. The flags have the following meaning (described 111 by the mask that allow to select them): 112 0x01: Compression flag. If set, the frame body is compressed. The actual 113 compression to use should have been set up beforehand through the 114 Startup message (which thus cannot be compressed; Section 4.1.1). 115 0x02: Tracing flag. For a request frame, this indicate the client requires 116 tracing of the request. Note that not all requests support tracing. 117 Currently, only QUERY, PREPARE and EXECUTE queries support tracing. 118 Other requests will simply ignore the tracing flag if set. If a 119 request support tracing and the tracing flag was set, the response to 120 this request will have the tracing flag set and contain tracing 121 information. 122 If a response frame has the tracing flag set, its body contains 123 a tracing ID. The tracing ID is a [uuid] and is the first thing in 124 the frame body. The rest of the body will then be the usual body 125 corresponding to the response opcode. 126 127 The rest of the flags is currently unused and ignored. 128 129 2.3. stream 130 131 A frame has a stream id (one signed byte). When sending request messages, this 132 stream id must be set by the client to a positive byte (negative stream id 133 are reserved for streams initiated by the server; currently all EVENT messages 134 (section 4.2.6) have a streamId of -1). If a client sends a request message 135 with the stream id X, it is guaranteed that the stream id of the response to 136 that message will be X. 137 138 This allow to deal with the asynchronous nature of the protocol. If a client 139 sends multiple messages simultaneously (without waiting for responses), there 140 is no guarantee on the order of the responses. For instance, if the client 141 writes REQ_1, REQ_2, REQ_3 on the wire (in that order), the server might 142 respond to REQ_3 (or REQ_2) first. Assigning different stream id to these 3 143 requests allows the client to distinguish to which request an received answer 144 respond to. As there can only be 128 different simultaneous stream, it is up 145 to the client to reuse stream id. 146 147 Note that clients are free to use the protocol synchronously (i.e. wait for 148 the response to REQ_N before sending REQ_N+1). In that case, the stream id 149 can be safely set to 0. Clients should also feel free to use only a subset of 150 the 128 maximum possible stream ids if it is simpler for those 151 implementation. 152 153 2.4. opcode 154 155 An integer byte that distinguish the actual message: 156 0x00 ERROR 157 0x01 STARTUP 158 0x02 READY 159 0x03 AUTHENTICATE 160 0x05 OPTIONS 161 0x06 SUPPORTED 162 0x07 QUERY 163 0x08 RESULT 164 0x09 PREPARE 165 0x0A EXECUTE 166 0x0B REGISTER 167 0x0C EVENT 168 0x0D BATCH 169 0x0E AUTH_CHALLENGE 170 0x0F AUTH_RESPONSE 171 0x10 AUTH_SUCCESS 172 173 Messages are described in Section 4. 174 175 (Note that there is no 0x04 message in this version of the protocol) 176 177 178 2.5. length 179 180 A 4 byte integer representing the length of the body of the frame (note: 181 currently a frame is limited to 256MB in length). 182 183 184 3. Notations 185 186 To describe the layout of the frame body for the messages in Section 4, we 187 define the following: 188 189 [int] A 4 bytes integer 190 [short] A 2 bytes unsigned integer 191 [string] A [short] n, followed by n bytes representing an UTF-8 192 string. 193 [long string] An [int] n, followed by n bytes representing an UTF-8 string. 194 [uuid] A 16 bytes long uuid. 195 [string list] A [short] n, followed by n [string]. 196 [bytes] A [int] n, followed by n bytes if n >= 0. If n < 0, 197 no byte should follow and the value represented is `null`. 198 [short bytes] A [short] n, followed by n bytes if n >= 0. 199 200 [option] A pair of <id><value> where <id> is a [short] representing 201 the option id and <value> depends on that option (and can be 202 of size 0). The supported id (and the corresponding <value>) 203 will be described when this is used. 204 [option list] A [short] n, followed by n [option]. 205 [inet] An address (ip and port) to a node. It consists of one 206 [byte] n, that represents the address size, followed by n 207 [byte] representing the IP address (in practice n can only be 208 either 4 (IPv4) or 16 (IPv6)), following by one [int] 209 representing the port. 210 [consistency] A consistency level specification. This is a [short] 211 representing a consistency level with the following 212 correspondance: 213 0x0000 ANY 214 0x0001 ONE 215 0x0002 TWO 216 0x0003 THREE 217 0x0004 QUORUM 218 0x0005 ALL 219 0x0006 LOCAL_QUORUM 220 0x0007 EACH_QUORUM 221 0x0008 SERIAL 222 0x0009 LOCAL_SERIAL 223 0x000A LOCAL_ONE 224 225 [string map] A [short] n, followed by n pair <k><v> where <k> and <v> 226 are [string]. 227 [string multimap] A [short] n, followed by n pair <k><v> where <k> is a 228 [string] and <v> is a [string list]. 229 230 231 4. Messages 232 233 4.1. Requests 234 235 Note that outside of their normal responses (described below), all requests 236 can get an ERROR message (Section 4.2.1) as response. 237 238 4.1.1. STARTUP 239 240 Initialize the connection. The server will respond by either a READY message 241 (in which case the connection is ready for queries) or an AUTHENTICATE message 242 (in which case credentials will need to be provided using AUTH_RESPONSE). 243 244 This must be the first message of the connection, except for OPTIONS that can 245 be sent before to find out the options supported by the server. Once the 246 connection has been initialized, a client should not send any more STARTUP 247 message. 248 249 The body is a [string map] of options. Possible options are: 250 - "CQL_VERSION": the version of CQL to use. This option is mandatory and 251 currenty, the only version supported is "3.0.0". Note that this is 252 different from the protocol version. 253 - "COMPRESSION": the compression algorithm to use for frames (See section 5). 254 This is optional, if not specified no compression will be used. 255 256 257 4.1.2. AUTH_RESPONSE 258 259 Answers a server authentication challenge. 260 261 Authentication in the protocol is SASL based. The server sends authentication 262 challenges (a bytes token) to which the client answer with this message. Those 263 exchanges continue until the server accepts the authentication by sending a 264 AUTH_SUCCESS message after a client AUTH_RESPONSE. It is however that client that 265 initiate the exchange by sending an initial AUTH_RESPONSE in response to a 266 server AUTHENTICATE request. 267 268 The body of this message is a single [bytes] token. The details of what this 269 token contains (and when it can be null/empty, if ever) depends on the actual 270 authenticator used. 271 272 The response to a AUTH_RESPONSE is either a follow-up AUTH_CHALLENGE message, 273 an AUTH_SUCCESS message or an ERROR message. 274 275 276 4.1.3. OPTIONS 277 278 Asks the server to return what STARTUP options are supported. The body of an 279 OPTIONS message should be empty and the server will respond with a SUPPORTED 280 message. 281 282 283 4.1.4. QUERY 284 285 Performs a CQL query. The body of the message must be: 286 <query><query_parameters> 287 where <query> is a [long string] representing the query and 288 <query_parameters> must be 289 <consistency><flags>[<n><value_1>...<value_n>][<result_page_size>][<paging_state>][<serial_consistency>] 290 where: 291 - <consistency> is the [consistency] level for the operation. 292 - <flags> is a [byte] whose bits define the options for this query and 293 in particular influence what the remainder of the message contains. 294 A flag is set if the bit corresponding to its `mask` is set. Supported 295 flags are, given there mask: 296 0x01: Values. In that case, a [short] <n> followed by <n> [bytes] 297 values are provided. Those value are used for bound variables in 298 the query. 299 0x02: Skip_metadata. If present, the Result Set returned as a response 300 to that query (if any) will have the NO_METADATA flag (see 301 Section 4.2.5.2). 302 0x04: Page_size. In that case, <result_page_size> is an [int] 303 controlling the desired page size of the result (in CQL3 rows). 304 See the section on paging (Section 7) for more details. 305 0x08: With_paging_state. If present, <paging_state> should be present. 306 <paging_state> is a [bytes] value that should have been returned 307 in a result set (Section 4.2.5.2). If provided, the query will be 308 executed but starting from a given paging state. This also to 309 continue paging on a different node from the one it has been 310 started (See Section 7 for more details). 311 0x10: With serial consistency. If present, <serial_consistency> should be 312 present. <serial_consistency> is the [consistency] level for the 313 serial phase of conditional updates. That consitency can only be 314 either SERIAL or LOCAL_SERIAL and if not present, it defaults to 315 SERIAL. This option will be ignored for anything else that a 316 conditional update/insert. 317 318 Note that the consistency is ignored by some queries (USE, CREATE, ALTER, 319 TRUNCATE, ...). 320 321 The server will respond to a QUERY message with a RESULT message, the content 322 of which depends on the query. 323 324 325 4.1.5. PREPARE 326 327 Prepare a query for later execution (through EXECUTE). The body consists of 328 the CQL query to prepare as a [long string]. 329 330 The server will respond with a RESULT message with a `prepared` kind (0x0004, 331 see Section 4.2.5). 332 333 334 4.1.6. EXECUTE 335 336 Executes a prepared query. The body of the message must be: 337 <id><query_parameters> 338 where <id> is the prepared query ID. It's the [short bytes] returned as a 339 response to a PREPARE message. As for <query_parameters>, it has the exact 340 same definition than in QUERY (see Section 4.1.4). 341 342 The response from the server will be a RESULT message. 343 344 345 4.1.7. BATCH 346 347 Allows executing a list of queries (prepared or not) as a batch (note that 348 only DML statements are accepted in a batch). The body of the message must 349 be: 350 <type><n><query_1>...<query_n><consistency> 351 where: 352 - <type> is a [byte] indicating the type of batch to use: 353 - If <type> == 0, the batch will be "logged". This is equivalent to a 354 normal CQL3 batch statement. 355 - If <type> == 1, the batch will be "unlogged". 356 - If <type> == 2, the batch will be a "counter" batch (and non-counter 357 statements will be rejected). 358 - <n> is a [short] indicating the number of following queries. 359 - <query_1>...<query_n> are the queries to execute. A <query_i> must be of the 360 form: 361 <kind><string_or_id><n><value_1>...<value_n> 362 where: 363 - <kind> is a [byte] indicating whether the following query is a prepared 364 one or not. <kind> value must be either 0 or 1. 365 - <string_or_id> depends on the value of <kind>. If <kind> == 0, it should be 366 a [long string] query string (as in QUERY, the query string might contain 367 bind markers). Otherwise (that is, if <kind> == 1), it should be a 368 [short bytes] representing a prepared query ID. 369 - <n> is a [short] indicating the number (possibly 0) of following values. 370 - <value_1>...<value_n> are the [bytes] to use for bound variables. 371 - <consistency> is the [consistency] level for the operation. 372 373 The server will respond with a RESULT message with a `Void` kind (0x0001, 374 see Section 4.2.5). 375 376 377 4.1.8. REGISTER 378 379 Register this connection to receive some type of events. The body of the 380 message is a [string list] representing the event types to register to. See 381 section 4.2.6 for the list of valid event types. 382 383 The response to a REGISTER message will be a READY message. 384 385 Please note that if a client driver maintains multiple connections to a 386 Cassandra node and/or connections to multiple nodes, it is advised to 387 dedicate a handful of connections to receive events, but to *not* register 388 for events on all connections, as this would only result in receiving 389 multiple times the same event messages, wasting bandwidth. 390 391 392 4.2. Responses 393 394 This section describes the content of the frame body for the different 395 responses. Please note that to make room for future evolution, clients should 396 support extra informations (that they should simply discard) to the one 397 described in this document at the end of the frame body. 398 399 4.2.1. ERROR 400 401 Indicates an error processing a request. The body of the message will be an 402 error code ([int]) followed by a [string] error message. Then, depending on 403 the exception, more content may follow. The error codes are defined in 404 Section 8, along with their additional content if any. 405 406 407 4.2.2. READY 408 409 Indicates that the server is ready to process queries. This message will be 410 sent by the server either after a STARTUP message if no authentication is 411 required, or after a successful CREDENTIALS message. 412 413 The body of a READY message is empty. 414 415 416 4.2.3. AUTHENTICATE 417 418 Indicates that the server require authentication, and which authentication 419 mechanism to use. 420 421 The authentication is SASL based and thus consists on a number of server 422 challenges (AUTH_CHALLENGE, Section 4.2.7) followed by client responses 423 (AUTH_RESPONSE, Section 4.1.2). The Initial exchange is however boostrapped 424 by an initial client response. The details of that exchange (including how 425 much challenge-response pair are required) are specific to the authenticator 426 in use. The exchange ends when the server sends an AUTH_SUCCESS message or 427 an ERROR message. 428 429 This message will be sent following a STARTUP message if authentication is 430 required and must be answered by a AUTH_RESPONSE message from the client. 431 432 The body consists of a single [string] indicating the full class name of the 433 IAuthenticator in use. 434 435 436 4.2.4. SUPPORTED 437 438 Indicates which startup options are supported by the server. This message 439 comes as a response to an OPTIONS message. 440 441 The body of a SUPPORTED message is a [string multimap]. This multimap gives 442 for each of the supported STARTUP options, the list of supported values. 443 444 445 4.2.5. RESULT 446 447 The result to a query (QUERY, PREPARE, EXECUTE or BATCH messages). 448 449 The first element of the body of a RESULT message is an [int] representing the 450 `kind` of result. The rest of the body depends on the kind. The kind can be 451 one of: 452 0x0001 Void: for results carrying no information. 453 0x0002 Rows: for results to select queries, returning a set of rows. 454 0x0003 Set_keyspace: the result to a `use` query. 455 0x0004 Prepared: result to a PREPARE message. 456 0x0005 Schema_change: the result to a schema altering query. 457 458 The body for each kind (after the [int] kind) is defined below. 459 460 461 4.2.5.1. Void 462 463 The rest of the body for a Void result is empty. It indicates that a query was 464 successful without providing more information. 465 466 467 4.2.5.2. Rows 468 469 Indicates a set of rows. The rest of body of a Rows result is: 470 <metadata><rows_count><rows_content> 471 where: 472 - <metadata> is composed of: 473 <flags><columns_count>[<paging_state>][<global_table_spec>?<col_spec_1>...<col_spec_n>] 474 where: 475 - <flags> is an [int]. The bits of <flags> provides information on the 476 formatting of the remaining informations. A flag is set if the bit 477 corresponding to its `mask` is set. Supported flags are, given there 478 mask: 479 0x0001 Global_tables_spec: if set, only one table spec (keyspace 480 and table name) is provided as <global_table_spec>. If not 481 set, <global_table_spec> is not present. 482 0x0002 Has_more_pages: indicates whether this is not the last 483 page of results and more should be retrieve. If set, the 484 <paging_state> will be present. The <paging_state> is a 485 [bytes] value that should be used in QUERY/EXECUTE to 486 continue paging and retrieve the remained of the result for 487 this query (See Section 7 for more details). 488 0x0004 No_metadata: if set, the <metadata> is only composed of 489 these <flags>, the <column_count> and optionally the 490 <paging_state> (depending on the Has_more_pages flage) but 491 no other information (so no <global_table_spec> nor <col_spec_i>). 492 This will only ever be the case if this was requested 493 during the query (see QUERY and RESULT messages). 494 - <columns_count> is an [int] representing the number of columns selected 495 by the query this result is of. It defines the number of <col_spec_i> 496 elements in and the number of element for each row in <rows_content>. 497 - <global_table_spec> is present if the Global_tables_spec is set in 498 <flags>. If present, it is composed of two [string] representing the 499 (unique) keyspace name and table name the columns return are of. 500 - <col_spec_i> specifies the columns returned in the query. There is 501 <column_count> such column specifications that are composed of: 502 (<ksname><tablename>)?<name><type> 503 The initial <ksname> and <tablename> are two [string] are only present 504 if the Global_tables_spec flag is not set. The <column_name> is a 505 [string] and <type> is an [option] that correspond to the description 506 (what this description is depends a bit on the context: in results to 507 selects, this will be either the user chosen alias or the selection used 508 (often a colum name, but it can be a function call too). In results to 509 a PREPARE, this will be either the name of the bind variable corresponding 510 or the column name for the variable if it is "anonymous") and type of 511 the corresponding result. The option for <type> is either a native 512 type (see below), in which case the option has no value, or a 513 'custom' type, in which case the value is a [string] representing 514 the full qualified class name of the type represented. Valid option 515 ids are: 516 0x0000 Custom: the value is a [string], see above. 517 0x0001 Ascii 518 0x0002 Bigint 519 0x0003 Blob 520 0x0004 Boolean 521 0x0005 Counter 522 0x0006 Decimal 523 0x0007 Double 524 0x0008 Float 525 0x0009 Int 526 0x000A Text 527 0x000B Timestamp 528 0x000C Uuid 529 0x000D Varchar 530 0x000E Varint 531 0x000F Timeuuid 532 0x0010 Inet 533 0x0020 List: the value is an [option], representing the type 534 of the elements of the list. 535 0x0021 Map: the value is two [option], representing the types of the 536 keys and values of the map 537 0x0022 Set: the value is an [option], representing the type 538 of the elements of the set 539 - <rows_count> is an [int] representing the number of rows present in this 540 result. Those rows are serialized in the <rows_content> part. 541 - <rows_content> is composed of <row_1>...<row_m> where m is <rows_count>. 542 Each <row_i> is composed of <value_1>...<value_n> where n is 543 <columns_count> and where <value_j> is a [bytes] representing the value 544 returned for the jth column of the ith row. In other words, <rows_content> 545 is composed of (<rows_count> * <columns_count>) [bytes]. 546 547 548 4.2.5.3. Set_keyspace 549 550 The result to a `use` query. The body (after the kind [int]) is a single 551 [string] indicating the name of the keyspace that has been set. 552 553 554 4.2.5.4. Prepared 555 556 The result to a PREPARE message. The rest of the body of a Prepared result is: 557 <id><metadata><result_metadata> 558 where: 559 - <id> is [short bytes] representing the prepared query ID. 560 - <metadata> is defined exactly as for a Rows RESULT (See section 4.2.5.2; you 561 can however assume that the Has_more_pages flag is always off) and 562 is the specification for the variable bound in this prepare statement. 563 - <result_metadata> is defined exactly as <metadata> but correspond to the 564 metadata for the resultSet that execute this query will yield. Note that 565 <result_metadata> may be empty (have the No_metadata flag and 0 columns, See 566 section 4.2.5.2) and will be for any query that is not a Select. There is 567 in fact never a guarantee that this will non-empty so client should protect 568 themselves accordingly. The presence of this information is an 569 optimization that allows to later execute the statement that has been 570 prepared without requesting the metadata (Skip_metadata flag in EXECUTE). 571 Clients can safely discard this metadata if they do not want to take 572 advantage of that optimization. 573 574 Note that prepared query ID return is global to the node on which the query 575 has been prepared. It can be used on any connection to that node and this 576 until the node is restarted (after which the query must be reprepared). 577 578 4.2.5.5. Schema_change 579 580 The result to a schema altering query (creation/update/drop of a 581 keyspace/table/index). The body (after the kind [int]) is composed of 3 582 [string]: 583 <change><keyspace><table> 584 where: 585 - <change> describe the type of change that has occured. It can be one of 586 "CREATED", "UPDATED" or "DROPPED". 587 - <keyspace> is the name of the affected keyspace or the keyspace of the 588 affected table. 589 - <table> is the name of the affected table. <table> will be empty (i.e. 590 the empty string "") if the change was affecting a keyspace and not a 591 table. 592 593 Note that queries to create and drop an index are considered as change 594 updating the table the index is on. 595 596 597 4.2.6. EVENT 598 599 And event pushed by the server. A client will only receive events for the 600 type it has REGISTER to. The body of an EVENT message will start by a 601 [string] representing the event type. The rest of the message depends on the 602 event type. The valid event types are: 603 - "TOPOLOGY_CHANGE": events related to change in the cluster topology. 604 Currently, events are sent when new nodes are added to the cluster, and 605 when nodes are removed. The body of the message (after the event type) 606 consists of a [string] and an [inet], corresponding respectively to the 607 type of change ("NEW_NODE" or "REMOVED_NODE") followed by the address of 608 the new/removed node. 609 - "STATUS_CHANGE": events related to change of node status. Currently, 610 up/down events are sent. The body of the message (after the event type) 611 consists of a [string] and an [inet], corresponding respectively to the 612 type of status change ("UP" or "DOWN") followed by the address of the 613 concerned node. 614 - "SCHEMA_CHANGE": events related to schema change. The body of the message 615 (after the event type) consists of 3 [string] corresponding respectively 616 to the type of schema change ("CREATED", "UPDATED" or "DROPPED"), 617 followed by the name of the affected keyspace and the name of the 618 affected table within that keyspace. For changes that affect a keyspace 619 directly, the table name will be empty (i.e. the empty string ""). 620 621 All EVENT message have a streamId of -1 (Section 2.3). 622 623 Please note that "NEW_NODE" and "UP" events are sent based on internal Gossip 624 communication and as such may be sent a short delay before the binary 625 protocol server on the newly up node is fully started. Clients are thus 626 advise to wait a short time before trying to connect to the node (1 seconds 627 should be enough), otherwise they may experience a connection refusal at 628 first. 629 630 4.2.7. AUTH_CHALLENGE 631 632 A server authentication challenge (see AUTH_RESPONSE (Section 4.1.2) for more 633 details). 634 635 The body of this message is a single [bytes] token. The details of what this 636 token contains (and when it can be null/empty, if ever) depends on the actual 637 authenticator used. 638 639 Clients are expected to answer the server challenge by an AUTH_RESPONSE 640 message. 641 642 4.2.7. AUTH_SUCCESS 643 644 Indicate the success of the authentication phase. See Section 4.2.3 for more 645 details. 646 647 The body of this message is a single [bytes] token holding final information 648 from the server that the client may require to finish the authentication 649 process. What that token contains and whether it can be null depends on the 650 actual authenticator used. 651 652 653 5. Compression 654 655 Frame compression is supported by the protocol, but then only the frame body 656 is compressed (the frame header should never be compressed). 657 658 Before being used, client and server must agree on a compression algorithm to 659 use, which is done in the STARTUP message. As a consequence, a STARTUP message 660 must never be compressed. However, once the STARTUP frame has been received 661 by the server can be compressed (including the response to the STARTUP 662 request). Frame do not have to be compressed however, even if compression has 663 been agreed upon (a server may only compress frame above a certain size at its 664 discretion). A frame body should be compressed if and only if the compressed 665 flag (see Section 2.2) is set. 666 667 As of this version 2 of the protocol, the following compressions are available: 668 - lz4 (https://code.google.com/p/lz4/). In that, note that the 4 first bytes 669 of the body will be the uncompressed length (followed by the compressed 670 bytes). 671 - snappy (https://code.google.com/p/snappy/). This compression might not be 672 available as it depends on a native lib (server-side) that might not be 673 avaivable on some installation. 674 675 676 6. Collection types 677 678 This section describe the serialization format for the collection types: 679 list, map and set. This serialization format is both useful to decode values 680 returned in RESULT messages but also to encode values for EXECUTE ones. 681 682 The serialization formats are: 683 List: a [short] n indicating the size of the list, followed by n elements. 684 Each element is [short bytes] representing the serialized element 685 value. 686 Map: a [short] n indicating the size of the map, followed by n entries. 687 Each entry is composed of two [short bytes] representing the key and 688 the value of the entry map. 689 Set: a [short] n indicating the size of the set, followed by n elements. 690 Each element is [short bytes] representing the serialized element 691 value. 692 693 694 7. Result paging 695 696 The protocol allows for paging the result of queries. For that, the QUERY and 697 EXECUTE messages have a <result_page_size> value that indicate the desired 698 page size in CQL3 rows. 699 700 If a positive value is provided for <result_page_size>, the result set of the 701 RESULT message returned for the query will contain at most the 702 <result_page_size> first rows of the query result. If that first page of result 703 contains the full result set for the query, the RESULT message (of kind `Rows`) 704 will have the Has_more_pages flag *not* set. However, if some results are not 705 part of the first response, the Has_more_pages flag will be set and the result 706 will contain a <paging_state> value. In that case, the <paging_state> value 707 should be used in a QUERY or EXECUTE message (that has the *same* query than 708 the original one or the behavior is undefined) to retrieve the next page of 709 results. 710 711 Only CQL3 queries that return a result set (RESULT message with a Rows `kind`) 712 support paging. For other type of queries, the <result_page_size> value is 713 ignored. 714 715 Note to client implementors: 716 - While <result_page_size> can be as low as 1, it will likely be detrimental 717 to performance to pick a value too low. A value below 100 is probably too 718 low for most use cases. 719 - Clients should not rely on the actual size of the result set returned to 720 decide if there is more result to fetch or not. Instead, they should always 721 check the Has_more_pages flag (unless they did not enabled paging for the query 722 obviously). Clients should also not assert that no result will have more than 723 <result_page_size> results. While the current implementation always respect 724 the exact value of <result_page_size>, we reserve ourselves the right to return 725 slightly smaller or bigger pages in the future for performance reasons. 726 727 728 8. Error codes 729 730 The supported error codes are described below: 731 0x0000 Server error: something unexpected happened. This indicates a 732 server-side bug. 733 0x000A Protocol error: some client message triggered a protocol 734 violation (for instance a QUERY message is sent before a STARTUP 735 one has been sent) 736 0x0100 Bad credentials: CREDENTIALS request failed because Cassandra 737 did not accept the provided credentials. 738 739 0x1000 Unavailable exception. The rest of the ERROR message body will be 740 <cl><required><alive> 741 where: 742 <cl> is the [consistency] level of the query having triggered 743 the exception. 744 <required> is an [int] representing the number of node that 745 should be alive to respect <cl> 746 <alive> is an [int] representing the number of replica that 747 were known to be alive when the request has been 748 processed (since an unavailable exception has been 749 triggered, there will be <alive> < <required>) 750 0x1001 Overloaded: the request cannot be processed because the 751 coordinator node is overloaded 752 0x1002 Is_bootstrapping: the request was a read request but the 753 coordinator node is bootstrapping 754 0x1003 Truncate_error: error during a truncation error. 755 0x1100 Write_timeout: Timeout exception during a write request. The rest 756 of the ERROR message body will be 757 <cl><received><blockfor><writeType> 758 where: 759 <cl> is the [consistency] level of the query having triggered 760 the exception. 761 <received> is an [int] representing the number of nodes having 762 acknowledged the request. 763 <blockfor> is the number of replica whose acknowledgement is 764 required to achieve <cl>. 765 <writeType> is a [string] that describe the type of the write 766 that timeouted. The value of that string can be one 767 of: 768 - "SIMPLE": the write was a non-batched 769 non-counter write. 770 - "BATCH": the write was a (logged) batch write. 771 If this type is received, it means the batch log 772 has been successfully written (otherwise a 773 "BATCH_LOG" type would have been send instead). 774 - "UNLOGGED_BATCH": the write was an unlogged 775 batch. Not batch log write has been attempted. 776 - "COUNTER": the write was a counter write 777 (batched or not). 778 - "BATCH_LOG": the timeout occured during the 779 write to the batch log when a (logged) batch 780 write was requested. 781 0x1200 Read_timeout: Timeout exception during a read request. The rest 782 of the ERROR message body will be 783 <cl><received><blockfor><data_present> 784 where: 785 <cl> is the [consistency] level of the query having triggered 786 the exception. 787 <received> is an [int] representing the number of nodes having 788 answered the request. 789 <blockfor> is the number of replica whose response is 790 required to achieve <cl>. Please note that it is 791 possible to have <received> >= <blockfor> if 792 <data_present> is false. And also in the (unlikely) 793 case were <cl> is achieved but the coordinator node 794 timeout while waiting for read-repair 795 acknowledgement. 796 <data_present> is a single byte. If its value is 0, it means 797 the replica that was asked for data has not 798 responded. Otherwise, the value is != 0. 799 800 0x2000 Syntax_error: The submitted query has a syntax error. 801 0x2100 Unauthorized: The logged user doesn't have the right to perform 802 the query. 803 0x2200 Invalid: The query is syntactically correct but invalid. 804 0x2300 Config_error: The query is invalid because of some configuration issue 805 0x2400 Already_exists: The query attempted to create a keyspace or a 806 table that was already existing. The rest of the ERROR message 807 body will be <ks><table> where: 808 <ks> is a [string] representing either the keyspace that 809 already exists, or the keyspace in which the table that 810 already exists is. 811 <table> is a [string] representing the name of the table that 812 already exists. If the query was attempting to create a 813 keyspace, <table> will be present but will be the empty 814 string. 815 0x2500 Unprepared: Can be thrown while a prepared statement tries to be 816 executed if the provide prepared statement ID is not known by 817 this host. The rest of the ERROR message body will be [short 818 bytes] representing the unknown ID. 819 820 9. Changes from v1 821 * Protocol is versioned to allow old client connects to a newer server, if a 822 newer client connects to an older server, it needs to check if it gets a 823 ProtocolException on connection and try connecting with a lower version. 824 * A query can now have bind variables even though the statement is not 825 prepared; see Section 4.1.4. 826 * A new BATCH message allows to batch a set of queries (prepared or not); see 827 Section 4.1.7. 828 * Authentication now uses SASL. Concretely, the CREDENTIALS message has been 829 removed and replaced by a server/client challenges/responses exchanges (done 830 through the new AUTH_RESPONSE/AUTH_CHALLENGE messages). See Section 4.2.3 for 831 details. 832 * Query paging has been added (Section 7): QUERY and EXECUTE message have an 833 additional <result_page_size> [int] and <paging_state> [bytes], and 834 the Rows kind of RESULT message has an additional flag and <paging_state> 835 value. Note that paging is optional, and a client that do not want to handle 836 can simply avoid including the Page_size flag and parameter in QUERY and 837 EXECUTE. 838 * QUERY and EXECUTE statements can request for the metadata to be skipped in 839 the result set returned (for efficiency reasons) if said metadata are known 840 in advance. Furthermore, the result to a PREPARE (section 4.2.5.4) now 841 includes the metadata for the result of executing the statement just 842 prepared (though those metadata will be empty for non SELECT statements).