github.com/datastax/go-cassandra-native-protocol@v0.0.0-20220706104457-5e8aad05cf90/specs/native_protocol_v1.spec (about) 1 2 CQL BINARY PROTOCOL v1 3 4 5 Table of Contents 6 7 1. Overview 8 2. Frame header 9 2.1. version 10 2.2. flags 11 2.3. stream 12 2.4. opcode 13 2.5. length 14 3. Notations 15 4. Messages 16 4.1. Requests 17 4.1.1. STARTUP 18 4.1.2. CREDENTIALS 19 4.1.3. OPTIONS 20 4.1.4. QUERY 21 4.1.5. PREPARE 22 4.1.6. EXECUTE 23 4.1.7. REGISTER 24 4.2. Responses 25 4.2.1. ERROR 26 4.2.2. READY 27 4.2.3. AUTHENTICATE 28 4.2.4. SUPPORTED 29 4.2.5. RESULT 30 4.2.5.1. Void 31 4.2.5.2. Rows 32 4.2.5.3. Set_keyspace 33 4.2.5.4. Prepared 34 4.2.5.5. Schema_change 35 4.2.6. EVENT 36 5. Compression 37 6. Collection types 38 7. Error codes 39 40 41 1. Overview 42 43 The CQL binary protocol is a frame based protocol. Frames are defined as: 44 45 0 8 16 24 32 46 +---------+---------+---------+---------+ 47 | version | flags | stream | opcode | 48 +---------+---------+---------+---------+ 49 | length | 50 +---------+---------+---------+---------+ 51 | | 52 . ... body ... . 53 . . 54 . . 55 +---------------------------------------- 56 57 The protocol is big-endian (network byte order). 58 59 Each frame contains a fixed size header (8 bytes) followed by a variable size 60 body. The header is described in Section 2. The content of the body depends 61 on the header opcode value (the body can in particular be empty for some 62 opcode values). The list of allowed opcode is defined Section 2.3 and the 63 details of each corresponding message is described Section 4. 64 65 The protocol distinguishes 2 types of frames: requests and responses. Requests 66 are those frame sent by the clients to the server, response are the ones sent 67 by the server. Note however that while communication are initiated by the 68 client with the server responding to request, the protocol may likely add 69 server pushes in the future, so responses does not obligatory come right after 70 a client request. 71 72 Note to client implementors: clients library should always assume that the 73 body of a given frame may contain more data than what is described in this 74 document. It will however always be safe to ignore the remaining of the frame 75 body in such cases. The reason is that this may allow to sometimes extend the 76 protocol with optional features without needing to change the protocol 77 version. 78 79 80 2. Frame header 81 82 2.1. version 83 84 The version is a single byte that indicate both the direction of the message 85 (request or response) and the version of the protocol in use. The up-most bit 86 of version is used to define the direction of the message: 0 indicates a 87 request, 1 indicates a responses. This can be useful for protocol analyzers to 88 distinguish the nature of the packet from the direction which it is moving. 89 The rest of that byte is the protocol version (1 for the protocol defined in 90 this document). In other words, for this version of the protocol, version will 91 have one of: 92 0x01 Request frame for this protocol version 93 0x81 Response frame for this protocol version 94 95 96 2.2. flags 97 98 Flags applying to this frame. The flags have the following meaning (described 99 by the mask that allow to select them): 100 0x01: Compression flag. If set, the frame body is compressed. The actual 101 compression to use should have been set up beforehand through the 102 Startup message (which thus cannot be compressed; Section 4.1.1). 103 0x02: Tracing flag. For a request frame, this indicate the client requires 104 tracing of the request. Note that not all requests support tracing. 105 Currently, only QUERY, PREPARE and EXECUTE queries support tracing. 106 Other requests will simply ignore the tracing flag if set. If a 107 request support tracing and the tracing flag was set, the response to 108 this request will have the tracing flag set and contain tracing 109 information. 110 If a response frame has the tracing flag set, its body contains 111 a tracing ID. The tracing ID is a [uuid] and is the first thing in 112 the frame body. The rest of the body will then be the usual body 113 corresponding to the response opcode. 114 115 The rest of the flags is currently unused and ignored. 116 117 2.3. stream 118 119 A frame has a stream id (one signed byte). When sending request messages, this 120 stream id must be set by the client to a positive byte (negative stream id 121 are reserved for streams initiated by the server; currently all EVENT messages 122 (section 4.2.6) have a streamId of -1). If a client sends a request message 123 with the stream id X, it is guaranteed that the stream id of the response to 124 that message will be X. 125 126 This allow to deal with the asynchronous nature of the protocol. If a client 127 sends multiple messages simultaneously (without waiting for responses), there 128 is no guarantee on the order of the responses. For instance, if the client 129 writes REQ_1, REQ_2, REQ_3 on the wire (in that order), the server might 130 respond to REQ_3 (or REQ_2) first. Assigning different stream id to these 3 131 requests allows the client to distinguish to which request an received answer 132 respond to. As there can only be 128 different simultaneous stream, it is up 133 to the client to reuse stream id. 134 135 Note that clients are free to use the protocol synchronously (i.e. wait for 136 the response to REQ_N before sending REQ_N+1). In that case, the stream id 137 can be safely set to 0. Clients should also feel free to use only a subset of 138 the 128 maximum possible stream ids if it is simpler for those 139 implementation. 140 141 2.4. opcode 142 143 An integer byte that distinguish the actual message: 144 0x00 ERROR 145 0x01 STARTUP 146 0x02 READY 147 0x03 AUTHENTICATE 148 0x04 CREDENTIALS 149 0x05 OPTIONS 150 0x06 SUPPORTED 151 0x07 QUERY 152 0x08 RESULT 153 0x09 PREPARE 154 0x0A EXECUTE 155 0x0B REGISTER 156 0x0C EVENT 157 158 Messages are described in Section 4. 159 160 161 2.5. length 162 163 A 4 byte integer representing the length of the body of the frame (note: 164 currently a frame is limited to 256MB in length). 165 166 167 3. Notations 168 169 To describe the layout of the frame body for the messages in Section 4, we 170 define the following: 171 172 [int] A 4 bytes integer 173 [short] A 2 bytes unsigned integer 174 [string] A [short] n, followed by n bytes representing an UTF-8 175 string. 176 [long string] An [int] n, followed by n bytes representing an UTF-8 string. 177 [uuid] A 16 bytes long uuid. 178 [string list] A [short] n, followed by n [string]. 179 [bytes] A [int] n, followed by n bytes if n >= 0. If n < 0, 180 no byte should follow and the value represented is `null`. 181 [short bytes] A [short] n, followed by n bytes if n >= 0. 182 183 [option] A pair of <id><value> where <id> is a [short] representing 184 the option id and <value> depends on that option (and can be 185 of size 0). The supported id (and the corresponding <value>) 186 will be described when this is used. 187 [option list] A [short] n, followed by n [option]. 188 [inet] An address (ip and port) to a node. It consists of one 189 [byte] n, that represents the address size, followed by n 190 [byte] representing the IP address (in practice n can only be 191 either 4 (IPv4) or 16 (IPv6)), following by one [int] 192 representing the port. 193 [consistency] A consistency level specification. This is a [short] 194 representing a consistency level with the following 195 correspondance: 196 0x0000 ANY 197 0x0001 ONE 198 0x0002 TWO 199 0x0003 THREE 200 0x0004 QUORUM 201 0x0005 ALL 202 0x0006 LOCAL_QUORUM 203 0x0007 EACH_QUORUM 204 0x000A LOCAL_ONE 205 206 [string map] A [short] n, followed by n pair <k><v> where <k> and <v> 207 are [string]. 208 [string multimap] A [short] n, followed by n pair <k><v> where <k> is a 209 [string] and <v> is a [string list]. 210 211 212 4. Messages 213 214 4.1. Requests 215 216 Note that outside of their normal responses (described below), all requests 217 can get an ERROR message (Section 4.2.1) as response. 218 219 4.1.1. STARTUP 220 221 Initialize the connection. The server will respond by either a READY message 222 (in which case the connection is ready for queries) or an AUTHENTICATE message 223 (in which case credentials will need to be provided using CREDENTIALS). 224 225 This must be the first message of the connection, except for OPTIONS that can 226 be sent before to find out the options supported by the server. Once the 227 connection has been initialized, a client should not send any more STARTUP 228 message. 229 230 The body is a [string map] of options. Possible options are: 231 - "CQL_VERSION": the version of CQL to use. This option is mandatory and 232 currenty, the only version supported is "3.0.0". Note that this is 233 different from the protocol version. 234 - "COMPRESSION": the compression algorithm to use for frames (See section 5). 235 This is optional, if not specified no compression will be used. 236 237 238 4.1.2. CREDENTIALS 239 240 Provides credentials information for the purpose of identification. This 241 message comes as a response to an AUTHENTICATE message from the server, but 242 can be use later in the communication to change the authentication 243 information. 244 245 The body is a list of key/value informations. It is a [short] n, followed by n 246 pair of [string]. These key/value pairs are passed as is to the Cassandra 247 IAuthenticator and thus the detail of which informations is needed depends on 248 that authenticator. 249 250 The response to a CREDENTIALS is a READY message (or an ERROR message). 251 252 253 4.1.3. OPTIONS 254 255 Asks the server to return what STARTUP options are supported. The body of an 256 OPTIONS message should be empty and the server will respond with a SUPPORTED 257 message. 258 259 260 4.1.4. QUERY 261 262 Performs a CQL query. The body of the message consists of a CQL query as a [long 263 string] followed by the [consistency] for the operation. 264 265 Note that the consistency is ignored by some queries (USE, CREATE, ALTER, 266 TRUNCATE, ...). 267 268 The server will respond to a QUERY message with a RESULT message, the content 269 of which depends on the query. 270 271 272 4.1.5. PREPARE 273 274 Prepare a query for later execution (through EXECUTE). The body consists of 275 the CQL query to prepare as a [long string]. 276 277 The server will respond with a RESULT message with a `prepared` kind (0x0004, 278 see Section 4.2.5). 279 280 281 4.1.6. EXECUTE 282 283 Executes a prepared query. The body of the message must be: 284 <id><n><value_1>....<value_n><consistency> 285 where: 286 - <id> is the prepared query ID. It's the [short bytes] returned as a 287 response to a PREPARE message. 288 - <n> is a [short] indicating the number of following values. 289 - <value_1>...<value_n> are the [bytes] to use for bound variables in the 290 prepared query. 291 - <consistency> is the [consistency] level for the operation. 292 293 Note that the consistency is ignored by some (prepared) queries (USE, CREATE, 294 ALTER, TRUNCATE, ...). 295 296 The response from the server will be a RESULT message. 297 298 299 4.1.7. REGISTER 300 301 Register this connection to receive some type of events. The body of the 302 message is a [string list] representing the event types to register to. See 303 section 4.2.6 for the list of valid event types. 304 305 The response to a REGISTER message will be a READY message. 306 307 Please note that if a client driver maintains multiple connections to a 308 Cassandra node and/or connections to multiple nodes, it is advised to 309 dedicate a handful of connections to receive events, but to *not* register 310 for events on all connections, as this would only result in receiving 311 multiple times the same event messages, wasting bandwidth. 312 313 314 4.2. Responses 315 316 This section describes the content of the frame body for the different 317 responses. Please note that to make room for future evolution, clients should 318 support extra informations (that they should simply discard) to the one 319 described in this document at the end of the frame body. 320 321 4.2.1. ERROR 322 323 Indicates an error processing a request. The body of the message will be an 324 error code ([int]) followed by a [string] error message. Then, depending on 325 the exception, more content may follow. The error codes are defined in 326 Section 7, along with their additional content if any. 327 328 329 4.2.2. READY 330 331 Indicates that the server is ready to process queries. This message will be 332 sent by the server either after a STARTUP message if no authentication is 333 required, or after a successful CREDENTIALS message. 334 335 The body of a READY message is empty. 336 337 338 4.2.3. AUTHENTICATE 339 340 Indicates that the server require authentication. This will be sent following 341 a STARTUP message and must be answered by a CREDENTIALS message from the 342 client to provide authentication informations. 343 344 The body consists of a single [string] indicating the full class name of the 345 IAuthenticator in use. 346 347 348 4.2.4. SUPPORTED 349 350 Indicates which startup options are supported by the server. This message 351 comes as a response to an OPTIONS message. 352 353 The body of a SUPPORTED message is a [string multimap]. This multimap gives 354 for each of the supported STARTUP options, the list of supported values. 355 356 357 4.2.5. RESULT 358 359 The result to a query (QUERY, PREPARE or EXECUTE messages). 360 361 The first element of the body of a RESULT message is an [int] representing the 362 `kind` of result. The rest of the body depends on the kind. The kind can be 363 one of: 364 0x0001 Void: for results carrying no information. 365 0x0002 Rows: for results to select queries, returning a set of rows. 366 0x0003 Set_keyspace: the result to a `use` query. 367 0x0004 Prepared: result to a PREPARE message. 368 0x0005 Schema_change: the result to a schema altering query. 369 370 The body for each kind (after the [int] kind) is defined below. 371 372 373 4.2.5.1. Void 374 375 The rest of the body for a Void result is empty. It indicates that a query was 376 successful without providing more information. 377 378 379 4.2.5.2. Rows 380 381 Indicates a set of rows. The rest of body of a Rows result is: 382 <metadata><rows_count><rows_content> 383 where: 384 - <metadata> is composed of: 385 <flags><columns_count><global_table_spec>?<col_spec_1>...<col_spec_n> 386 where: 387 - <flags> is an [int]. The bits of <flags> provides information on the 388 formatting of the remaining informations. A flag is set if the bit 389 corresponding to its `mask` is set. Supported flags are, given there 390 mask: 391 0x0001 Global_tables_spec: if set, only one table spec (keyspace 392 and table name) is provided as <global_table_spec>. If not 393 set, <global_table_spec> is not present. 394 - <columns_count> is an [int] representing the number of columns selected 395 by the query this result is of. It defines the number of <col_spec_i> 396 elements in and the number of element for each row in <rows_content>. 397 - <global_table_spec> is present if the Global_tables_spec is set in 398 <flags>. If present, it is composed of two [string] representing the 399 (unique) keyspace name and table name the columns return are of. 400 - <col_spec_i> specifies the columns returned in the query. There is 401 <column_count> such column specification that are composed of: 402 (<ksname><tablename>)?<column_name><type> 403 The initial <ksname> and <tablename> are two [string] are only present 404 if the Global_tables_spec flag is not set. The <column_name> is a 405 [string] and <type> is an [option] that correspond to the column name 406 and type. The option for <type> is either a native type (see below), 407 in which case the option has no value, or a 'custom' type, in which 408 case the value is a [string] representing the full qualified class 409 name of the type represented. Valid option ids are: 410 0x0000 Custom: the value is a [string], see above. 411 0x0001 Ascii 412 0x0002 Bigint 413 0x0003 Blob 414 0x0004 Boolean 415 0x0005 Counter 416 0x0006 Decimal 417 0x0007 Double 418 0x0008 Float 419 0x0009 Int 420 0x000A Text 421 0x000B Timestamp 422 0x000C Uuid 423 0x000D Varchar 424 0x000E Varint 425 0x000F Timeuuid 426 0x0010 Inet 427 0x0020 List: the value is an [option], representing the type 428 of the elements of the list. 429 0x0021 Map: the value is two [option], representing the types of the 430 keys and values of the map 431 0x0022 Set: the value is an [option], representing the type 432 of the elements of the set 433 - <rows_count> is an [int] representing the number of rows present in this 434 result. Those rows are serialized in the <rows_content> part. 435 - <rows_content> is composed of <row_1>...<row_m> where m is <rows_count>. 436 Each <row_i> is composed of <value_1>...<value_n> where n is 437 <columns_count> and where <value_j> is a [bytes] representing the value 438 returned for the jth column of the ith row. In other words, <rows_content> 439 is composed of (<rows_count> * <columns_count>) [bytes]. 440 441 442 4.2.5.3. Set_keyspace 443 444 The result to a `use` query. The body (after the kind [int]) is a single 445 [string] indicating the name of the keyspace that has been set. 446 447 448 4.2.5.4. Prepared 449 450 The result to a PREPARE message. The rest of the body of a Prepared result is: 451 <id><metadata> 452 where: 453 - <id> is [short bytes] representing the prepared query ID. 454 - <metadata> is defined exactly as for a Rows RESULT (See section 4.2.5.2). 455 456 Note that prepared query ID return is global to the node on which the query 457 has been prepared. It can be used on any connection to that node and this 458 until the node is restarted (after which the query must be reprepared). 459 460 4.2.5.5. Schema_change 461 462 The result to a schema altering query (creation/update/drop of a 463 keyspace/table/index). The body (after the kind [int]) is composed of 3 464 [string]: 465 <change><keyspace><table> 466 where: 467 - <change> describe the type of change that has occured. It can be one of 468 "CREATED", "UPDATED" or "DROPPED". 469 - <keyspace> is the name of the affected keyspace or the keyspace of the 470 affected table. 471 - <table> is the name of the affected table. <table> will be empty (i.e. 472 the empty string "") if the change was affecting a keyspace and not a 473 table. 474 475 Note that queries to create and drop an index are considered as change 476 updating the table the index is on. 477 478 479 4.2.6. EVENT 480 481 And event pushed by the server. A client will only receive events for the 482 type it has REGISTER to. The body of an EVENT message will start by a 483 [string] representing the event type. The rest of the message depends on the 484 event type. The valid event types are: 485 - "TOPOLOGY_CHANGE": events related to change in the cluster topology. 486 Currently, events are sent when new nodes are added to the cluster, and 487 when nodes are removed. The body of the message (after the event type) 488 consists of a [string] and an [inet], corresponding respectively to the 489 type of change ("NEW_NODE" or "REMOVED_NODE") followed by the address of 490 the new/removed node. 491 - "STATUS_CHANGE": events related to change of node status. Currently, 492 up/down events are sent. The body of the message (after the event type) 493 consists of a [string] and an [inet], corresponding respectively to the 494 type of status change ("UP" or "DOWN") followed by the address of the 495 concerned node. 496 - "SCHEMA_CHANGE": events related to schema change. The body of the message 497 (after the event type) consists of 3 [string] corresponding respectively 498 to the type of schema change ("CREATED", "UPDATED" or "DROPPED"), 499 followed by the name of the affected keyspace and the name of the 500 affected table within that keyspace. For changes that affect a keyspace 501 directly, the table name will be empty (i.e. the empty string ""). 502 503 All EVENT message have a streamId of -1 (Section 2.3). 504 505 Please note that "NEW_NODE" and "UP" events are sent based on internal Gossip 506 communication and as such may be sent a short delay before the binary 507 protocol server on the newly up node is fully started. Clients are thus 508 advise to wait a short time before trying to connect to the node (1 seconds 509 should be enough), otherwise they may experience a connection refusal at 510 first. 511 512 513 5. Compression 514 515 Frame compression is supported by the protocol, but then only the frame body 516 is compressed (the frame header should never be compressed). 517 518 Before being used, client and server must agree on a compression algorithm to 519 use, which is done in the STARTUP message. As a consequence, a STARTUP message 520 must never be compressed. However, once the STARTUP frame has been received 521 by the server can be compressed (including the response to the STARTUP 522 request). Frame do not have to be compressed however, even if compression has 523 been agreed upon (a server may only compress frame above a certain size at its 524 discretion). A frame body should be compressed if and only if the compressed 525 flag (see Section 2.2) is set. 526 527 528 6. Collection types 529 530 This section describe the serialization format for the collection types: 531 list, map and set. This serialization format is both useful to decode values 532 returned in RESULT messages but also to encode values for EXECUTE ones. 533 534 The serialization formats are: 535 List: a [short] n indicating the size of the list, followed by n elements. 536 Each element is [short bytes] representing the serialized element 537 value. 538 Map: a [short] n indicating the size of the map, followed by n entries. 539 Each entry is composed of two [short bytes] representing the key and 540 the value of the entry map. 541 Set: a [short] n indicating the size of the set, followed by n elements. 542 Each element is [short bytes] representing the serialized element 543 value. 544 545 546 7. Error codes 547 548 The supported error codes are described below: 549 0x0000 Server error: something unexpected happened. This indicates a 550 server-side bug. 551 0x000A Protocol error: some client message triggered a protocol 552 violation (for instance a QUERY message is sent before a STARTUP 553 one has been sent) 554 0x0100 Bad credentials: CREDENTIALS request failed because Cassandra 555 did not accept the provided credentials. 556 557 0x1000 Unavailable exception. The rest of the ERROR message body will be 558 <cl><required><alive> 559 where: 560 <cl> is the [consistency] level of the query having triggered 561 the exception. 562 <required> is an [int] representing the number of node that 563 should be alive to respect <cl> 564 <alive> is an [int] representing the number of replica that 565 were known to be alive when the request has been 566 processed (since an unavailable exception has been 567 triggered, there will be <alive> < <required>) 568 0x1001 Overloaded: the request cannot be processed because the 569 coordinator node is overloaded 570 0x1002 Is_bootstrapping: the request was a read request but the 571 coordinator node is bootstrapping 572 0x1003 Truncate_error: error during a truncation error. 573 0x1100 Write_timeout: Timeout exception during a write request. The rest 574 of the ERROR message body will be 575 <cl><received><blockfor><writeType> 576 where: 577 <cl> is the [consistency] level of the query having triggered 578 the exception. 579 <received> is an [int] representing the number of nodes having 580 acknowledged the request. 581 <blockfor> is the number of replica whose acknowledgement is 582 required to achieve <cl>. 583 <writeType> is a [string] that describe the type of the write 584 that timeouted. The value of that string can be one 585 of: 586 - "SIMPLE": the write was a non-batched 587 non-counter write. 588 - "BATCH": the write was a (logged) batch write. 589 If this type is received, it means the batch log 590 has been successfully written (otherwise a 591 "BATCH_LOG" type would have been send instead). 592 - "UNLOGGED_BATCH": the write was an unlogged 593 batch. Not batch log write has been attempted. 594 - "COUNTER": the write was a counter write 595 (batched or not). 596 - "BATCH_LOG": the timeout occured during the 597 write to the batch log when a (logged) batch 598 write was requested. 599 0x1200 Read_timeout: Timeout exception during a read request. The rest 600 of the ERROR message body will be 601 <cl><received><blockfor><data_present> 602 where: 603 <cl> is the [consistency] level of the query having triggered 604 the exception. 605 <received> is an [int] representing the number of nodes having 606 answered the request. 607 <blockfor> is the number of replica whose response is 608 required to achieve <cl>. Please note that it is 609 possible to have <received> >= <blockfor> if 610 <data_present> is false. And also in the (unlikely) 611 case were <cl> is achieved but the coordinator node 612 timeout while waiting for read-repair 613 acknowledgement. 614 <data_present> is a single byte. If its value is 0, it means 615 the replica that was asked for data has not 616 responded. Otherwise, the value is != 0. 617 618 0x2000 Syntax_error: The submitted query has a syntax error. 619 0x2100 Unauthorized: The logged user doesn't have the right to perform 620 the query. 621 0x2200 Invalid: The query is syntactically correct but invalid. 622 0x2300 Config_error: The query is invalid because of some configuration issue 623 0x2400 Already_exists: The query attempted to create a keyspace or a 624 table that was already existing. The rest of the ERROR message 625 body will be <ks><table> where: 626 <ks> is a [string] representing either the keyspace that 627 already exists, or the keyspace in which the table that 628 already exists is. 629 <table> is a [string] representing the name of the table that 630 already exists. If the query was attempting to create a 631 keyspace, <table> will be present but will be the empty 632 string. 633 0x2500 Unprepared: Can be thrown while a prepared statement tries to be 634 executed if the provide prepared statement ID is not known by 635 this host. The rest of the ERROR message body will be [short 636 bytes] representing the unknown ID.