github.com/pingcap/tiflow@v0.0.0-20240520035814-5bf52d54e205/docs/design/2020-11-04-ticdc-protocol-list.md (about) 1 # TiCDC 支持的消息队列输出协议指南 2 3 ## A Guide to the Protocols Supported by TiCDC 4 5 # Overview 概要 6 7 TiCDC 目前支持 4 种消息队列输出协议。“协议”这里指将 TiDB 的数据变化输出到 Kafka 或其他消息队列时的序列化格式,以及其他配套的策略(如时序、消息分片)等。 8 9 The current version of TiCDC supports four protocols for writing to a message queue. A “protocol” here refers to the serialization scheme together with relevant policies (such as message ordering guarantees and partitioning rules). 10 11 所有的协议均能保证,如果忽略重复消息,在**同一 Partition 中,同一数据表中**的改动,按照 commit 时间戳顺序输出。重复消息在 TiCDC 与 TiKV 的连接中断重连,或数据表在 TiCDC 节点迁移的时候**可能发生**。 12 13 All protocols guarantee that, with duplicate messages ignored, **in any given partition**, **data changes** to **a given table** are ordered by their commit timestamps. Duplication **can** happen when a TiCDC node re-establishes connection with TiKV or when the replication task of a table is being migrated across TiCDC nodes. 14 15 除了 Avro 协议之外,所有其他协议均可以输出行变更之前的旧值 (old value)。 16 17 Except for Avro, all protocols support outputting the old value before the change. 18 19 # Protocols 分协议介绍 20 21 ### Open Protocol 22 23 Open Protocol 为 PingCAP 自研的协议,具备还原全局事务的能力。输出包含 DDL。但需要用户自行开发消费端([示例](https://docs.pingcap.com/zh/tidb/dev/ticdc-open-protocol#%E6%B6%88%E8%B4%B9%E7%AB%AF%E5%8D%8F%E8%AE%AE%E8%A7%A3%E6%9E%90))。参见:[用户文档](https://docs.pingcap.com/zh/tidb/dev/ticdc-open-protocol)。 24 25 Open Protocol is developed by PingCAP. Its output can be used to fully restore cross-table transactions in the downstream. The output contains DDL information. Users might need to write their own consumer to suit their needs ([demo](https://docs.pingcap.com/zh/tidb/dev/ticdc-open-protocol#%E6%B6%88%E8%B4%B9%E7%AB%AF%E5%8D%8F%E8%AE%AE%E8%A7%A3%E6%9E%90)). Refer to: [User Documentation](https://docs.pingcap.com/tidb/dev/ticdc-open-protocol) 26 27 ### Canal 28 29 Canal 这里特指 Canal-Protobuf,是阿里巴巴主导开发的数据变更同步协议。目前 TiCDC 的 Canal 输出不支持还原事务,但有支持的计划。输出包含 DDL。阿里巴巴官方的 [Adapter](https://github.com/alibaba/canal/wiki/ClientAdapter) **理论上**可用于同步数据到 HBase、Elasticsearch 和兼容 JDBC 的关系型数据库。 参见:[官方代码库](https://github.com/alibaba/canal)。 30 31 Canal is a data change replication protocol developed by Alibaba Group. The keyword “canal” in TiCDC configuration specifically refers to Canal’s Protobuf-based form (as opposed to its JSON form). The Canal output, as generated by TiCDC, **does not** contain enough information to fully restore transactions in the downstream, but PingCAP has plans to support transactions in the future. The output contains DDL. The [Adapter](https://github.com/alibaba/canal/wiki/ClientAdapter) by Alibaba is **theoretically** capable of replicating data to HBase, Elasticsearch and JDBC-compatible relational databases. 32 33 ### Avro 34 35 Avro 是 Apache 基金会开发的序列化格式。Confluent Platform (Kafka) 原生支持 Avro ,并将其作为推荐的数据序列化格式。TiCDC 输出的 Avro 数据可以被 [Kafka Connect](https://docs.confluent.io/3.0.1/connect/intro.html) 解析。目前 TiCDC 输出的 Avro 消息中仅包含所有字段的值,**不含有任何时间戳**。不支持还原事务,不能复原 DDL。 36 37 Avro is a data serialization system developed by Apache Software Foundation. Confluent Platform (Kafka) has native support for Avro, and recommends using Avro to convert data changes to Kafka messages. The Avro data produced by TiCDC _is_ compatible with [Kafka Connect](https://docs.confluent.io/3.0.1/connect/intro.html). In the current version of TiCDC, the Avro data contains only the data in the rows, in appropriate Avro types, but **not the commit timestamps**, and hence **cannot** be used to restore transactions, nor does Avro provide any DDL information directly. 38 39 ### Maxwell 40 41 [Maxwell](https://maxwells-daemon.io/) 是由 Zendesk 开发的数据同步协议。TiCDC 不支持 Maxwell 中的 xid 和 commit 字段,故下游无法还原事务。Maxwell 输出中包含 DDL 信息。 42 43 [Maxwell](https://maxwells-daemon.io/) is a change data capture protocol developed by Zendesk. TiCDC **does not** support the **xid** and **commit** fields in the Maxwell message, and as a result does not provide enough information for the downstream to restore the transactions. Maxwell contains DDL information. 44 45 ### Canal-Json 46 47 Canal-Json 有有限的生态支持,(如 [Flink](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/canal.html))。Canal-Json 协议无法用于还原事务,但包含完整 DDL 信息。 48 49 Canal-Json has some third-party support in the ecosystem ([Flink](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/canal.html), for example). Canal-Json data cannot be used to restore transactions, but they do contain DDL information. 50 51 # 对比 Comparison 52 53 | | | | | | | 54 | -------------------------------------------------- | ------------------------------ | ----------------------------------- | ------------------ | ------------------------------- | ------- | 55 | | Open Protocol | Avro | Canal | Canal-Json | Maxwell | 56 | 协议制定者 Protocol Maintainer | PingCAP | Apache Foundation | Alibaba Group | Alibaba Group | Zendesk | 57 | 可还原表级事务 Intra-table transactions restorable | Yes | No | No (Plans for yes) | No (due to protocol limitation) | No | 58 | 可还原全局事务 Global transactions restorable | Yes | No | No | No (due to protocol limitation) | No | 59 | 可还原 DDL DDL restorable | Yes | No | Yes | Yes | Yes | 60 | 格式 Serialization Scheme | JSON with some raw binary data | Custom (Avro binary) | Protobuf | JSON | JSON | 61 | 支持完善度 Level of Support | High (by PingCAP) | High (by Confluent) | Low | Low | Medium | 62 | 文档完善度 Documentation Quality | High | High for supported Kafka Connectors | Low | Low | High |