github.com/pingcap/tiflow@v0.0.0-20240520035814-5bf52d54e205/docs/design/2023-07-04-ticdc-pulsar-sink.md (about) 1 # TiCDC Design Documents 2 3 - Author(s): [yumchina](https://github.com/yumchina) 4 - Tracking Issue: https://github.com/pingcap/tiflow/issues/9413 5 6 ## Table of Contents 7 8 - [Introduction](#introduction) 9 - [Motivation or Background](#motivation-or-background) 10 - [Detailed Design](#detailed-design) 11 - [Protocol-support](#protocol-support) 12 - [Row Order and Transactions](#row-order-and-transactions) 13 - [Pulsar Client](#pulsar-client) 14 - [Information](#information) 15 - [Different from Kafka](#different-from-kafka) 16 - [Pulsar Client Config](#pulsar-client-config) 17 - [Pulsar Producer](#pulsar-producer) 18 - [Producer Message](#producer-message) 19 - [Producer Authentication](#pulsar-authentication) 20 - [Producer Route Rule](#pulsar-route-rule) 21 - [Producer Topic Rule](#pulsar-topic-rule) 22 - [Produce DDL Event](#produce-ddl-event) 23 - [SyncSendMessage Method](#syncsendmessage-method) 24 - [SyncBroadcastMessage Method](#syncbroadcastmessage-method) 25 - [Close Method](#close-method) 26 - [Produce DML Event](#produce-dml-event) 27 - [AsyncSendMessage Method](#asyncsendmessage-method) 28 - [Close Method](#close-method) 29 - [Pulsar Metrics](#pulsar-metrics) 30 - [User Interface](#user-interface) 31 - [Test Design](#test-design) 32 - [Functional Tests](#functional-tests) 33 - [Scenario Tests](#scenario-tests) 34 - [Compatibility Tests](#compatibility-tests) 35 - [Benchmark Tests](#benchmark-tests) 36 - [Impacts & Risks](#impacts--risks) 37 - [Investigation & Alternatives](#investigation--alternatives) 38 - [Unresolved Questions](#unresolved-questions) 39 40 ## Introduction 41 42 This document provides a complete design on implementing pulsar sink for TiCDC. 43 The pulsar sink is used to distribute the DML change records, and DDL events generated by TiCDC. 44 45 ## Motivation or Background 46 47 Incorporating Pulsar into Ticdc is for the purpose of expanding the downstream MQ distribution channels. 48 Users want to output TiDB events to Pulsar, because they can reuse machines from Pulsar with others, 49 the pulsar easily expanded horizontally etc. 50 51 ## Detailed Design 52 53 #### Protocol-support 54 55 In order to maintain the consistency of the middleware of the MQ class, 56 we give priority support some of the protocols supported by Kafka: 57 58 **CanalJSON** 59 60 **Canal** 61 62 **Maxwell** 63 64 CanalJSON protocol sample: 65 66 ``` 67 for more information, please refer to: https://docs.pingcap.com/tidb/dev/ticdc-canal-json 68 69 { 70 "id": 0, 71 "database": "test", 72 "table": "", 73 "pkNames": null, 74 "isDdl": true, 75 "type": "QUERY", 76 "es": 1639633094670, 77 "ts": 1639633095489, 78 "sql": "drop database if exists test", 79 "sqlType": null, 80 "mysqlType": null, 81 "data": null, 82 "old": null, 83 "_tidb": { // TiDB extension field 84 "commitTs": 163963309467037594 85 } 86 } 87 ``` 88 89 #### Row Order and Transactions 90 91 - Ensure that each event of commit-ts is incremented and be sent to Pulsar in order . 92 - Ensure that there are no incomplete inner-table transactions in Pulsar. 93 - Ensure that every event must be sent to Pulsar at least once. 94 95 #### Pulsar Client 96 97 ##### Information 98 99 https://github.com/apache/pulsar-client-go Version: v0.10.0 100 Requirement Golang 1.18+ 101 102 ##### Different from Kafka 103 104 The difference between pulsar and kafka is that the producer in the client of pulsar must be bound to a topic, but kafka does not. 105 106 ##### Pulsar Client Config 107 108 ```api 109 type ClientOptions struct { 110 // Configure the service URL for the Pulsar service. 111 // This parameter is required 112 URL string 113 // Timeout for the establishment of a TCP connection (default: 5 seconds) 114 ConnectionTimeout time.Duration 115 116 // Set the operation timeout (default: 30 seconds) 117 // Producer-create, subscribe and unsubscribe operations will be retried until this interval, after which the 118 // operation will be marked as failed 119 OperationTimeout time.Duration 120 121 // Configure the ping send and check interval, default to 30 seconds. 122 KeepAliveInterval time.Duration 123 124 // Configure the authentication provider. (default: no authentication) 125 // Example: `Authentication: NewAuthenticationToken("token")` 126 Authentication 127 128 // Add custom labels to all the metrics reported by this client instance 129 CustomMetricsLabels map[string]string 130 131 // Specify metric registerer used to register metrics. 132 // Default prometheus.DefaultRegisterer 133 MetricsRegisterer prometheus.Registerer 134 } 135 ``` 136 137 **Main Note:** 138 139 - URL: like pulsar://127.0.0.1:6650 140 - Authentication: We only support token/token-from-file/account-with-password. 141 - MetricsRegisterer: We initialize pulsar MetricsRegisterer with `prometheus.NewRegistry()` from tiflow project `cdc/server/metrics.go` 142 143 #### Pulsar Producer 144 145 ```go 146 type ProducerOptions struct { 147 // Topic specifies the topic this producer will be publishing on. 148 // This argument is required when constructing the producer. 149 Topic string 150 151 // Properties specifies a set of application defined properties for the producer. 152 // This properties will be visible in the topic stats 153 Properties map[string]string 154 155 //……… others 156 157 } 158 159 ``` 160 161 - Payload: is carrying real binary data 162 - Value: Value and payload is mutually exclusive, Value for schema message. 163 - Key: The optional key associated with the message (particularly useful for things like topic compaction) 164 165 **We must cache all producers to the client for different topics 166 Every changefeed of pulsar client have a producer map. Type as `map[string]pulsar.Producer`, the key is topic name, value is producer of pulsar client.** 167 168 ##### Producer Message: 169 170 ```go 171 type ProducerMessage struct { 172 // Payload for the message 173 Payload []byte 174 // Value and payload is mutually exclusive, `Value interface{}` for schema message. 175 Value interface{} 176 // Key sets the key of the message for routing policy 177 Key string 178 // OrderingKey sets the ordering key of the message 179 OrderingKey string 180 181 ……… others no use 182 } 183 ``` 184 185 - Payload: is carrying real binary data 186 - Value: Value and payload is mutually exclusive, Value for schema message. 187 - Key: The optional key associated with the message (particularly useful for things like topic compaction) 188 - OrderingKey: OrderingKey sets the ordering key of the message.Same as Key, so we do not use it. 189 190 #### Pulsar Authentication 191 192 - Use authentication-token from sink-uri support token to authenticate the pulsar server. 193 - Use basic-user-name and basic-password from sink-uri authenticate to the pulsar server. 194 - Use token-from-file from sink-uri support token to authenticate the pulsar server. 195 196 #### Pulsar Route Rule 197 198 - We support route events to different partitions by changefeed config dispatchers, 199 refer to `Pulsar Topic Rule` 200 - You can set the message-key to any characters. We do not set any characters default, the event will be sent to the partition by hash algorithm. 201 202 #### Pulsar Topic Rule 203 204 ```yaml 205 dispatchers = [ 206 {matcher = ['test1.*', 'test2.*'], topic = "Topic expression 1",partition="table" }, 207 {matcher = ['test6.*'],topic = "Topic expression 2",partition="ts" } 208 ] 209 The topic expression syntax is legal if it meets the following conditions: 210 1.{schema} and {table} respectively identify the database name and table name that need to be matched, and are required fields. 211 Pulsar support "(persistent|non-persistent)://tenant/namespace/topic" as topic name。 212 2.The tenant, namespace and topic must be separated by 2 slashes, such as: "tenant/namespace/topic". 213 3. If the topic does not match, it will enter the default topic, which is the topic in the sink-uri 214 4. "partition" ="xxx" choose [refer to https://docs.pingcap.com/tidb/dev/ticdc-sink-to-kafka#customize-the-rules-for-topic-and-partition-dispatchers-of-kafka-sink]: 215 default: When multiple unique indexes (including the primary key) exist or the Old Value feature is enabled, events are dispatched in the table mode. When only one unique index (or the primary key) exists, events are dispatched in the index-value mode. 216 ts: Use the commitTs of the row change to hash and dispatch events. 217 index-value: Use the value of the primary key or the unique index of the table to hash and dispatch events. 218 table: Use the schema name of the table and the table name to hash and dispatch events. 219 220 ``` 221 222 #### Produce DDL Event 223 224 We implement the DDLProducer interface 225 226 ##### SyncSendMessage Method 227 228 It will find a producer by topic name. 229 Send the event to pulsar. 230 Report some metrics . 231 `partitionNum` is not used, because the pulsar server supports set partition num only. 232 233 ##### SyncBroadcastMessage Method 234 235 It do nothing 236 237 ##### Close Method 238 239 Close every producers 240 241 ##### Produce DML Event 242 243 We implement the DMLProducer interface 244 245 ##### AsyncSendMessage Method 246 247 It will find a producer by topic name. 248 Set a callback function to the pulsar producer client. 249 Send the event to pulsar. 250 Report some metrics. 251 `partitionNum` is not used, because the pulsar server supports set partition num only. 252 253 ##### Close Method 254 255 Close every producers 256 257 #### Pulsar Metrics 258 259 Pulsar client support metric of `prometheus.Registry` 260 Following are pulsar client metrics 261 262 ``` 263 pulsar_client_bytes_published 264 pulsar_client_bytes_received 265 pulsar_client_connections_closed 266 pulsar_client_connections_establishment_errors 267 pulsar_client_connections_handshake_errors 268 pulsar_client_connections_opened 269 pulsar_client_lookup_count 270 pulsar_client_messages_published 271 pulsar_client_messages_received 272 pulsar_client_partitioned_topic_metadata_count 273 pulsar_client_producer_errors 274 pulsar_client_producer_latency_seconds_bucket 275 pulsar_client_producer_latency_seconds_count 276 pulsar_client_producer_latency_seconds_sum 277 pulsar_client_producer_pending_bytes 278 pulsar_client_producer_pending_messages 279 pulsar_client_producer_rpc_latency_seconds_bucket 280 pulsar_client_producer_rpc_latency_seconds_count 281 pulsar_client_producer_rpc_latency_seconds_sum 282 pulsar_client_producers_closed 283 pulsar_client_producers_opened 284 pulsar_client_producers_partitions_active 285 pulsar_client_producers_reconnect_failure 286 pulsar_client_producers_reconnect_max_retry 287 pulsar_client_readers_closed 288 pulsar_client_readers_opened 289 pulsar_client_rpc_count 290 ``` 291 292 #### User Interface 293 294 **Sink-URI** 295 296 When creating a changefeed, the user can specify the sink-uri like this: 297 cdc cli changefeed create --sink-uri="${scheme}://${address}/${topic-name}?protocol=${protocol}&pulsar-version=${pulsar-version}&authentication-token=${authentication-token} 298 299 Example: 300 301 ``` 302 cdc cli changefeed create --server=http://127.0.0.1:8300 303 --sink-uri="pulsar://127.0.0.1:6650/persistent://public/default/test?protocol=canal-json&pulsar-version=v2.10.0&authentication-token=eyJhbGciOiJSUzIxxxxxxxxxxxxxxxxx" 304 ``` 305 306 ## Test Design 307 308 Pulsar sink is a new feature, For tests, we focus on the functional tests, scenario tests and benchmark. 309 310 ### Functional Tests 311 312 - Regular unit testing and integration testing cover the correctness of data replication using canal/maxwell/canal-json protocol. 313 314 ### Scenario Tests 315 316 Run stability and chaos tests under different workloads. 317 318 - The upstream and downstream data are consistent. 319 - Throughput and latency are stable for most scenarios. 320 321 ### Compatibility Tests 322 323 #### Compatibility with other features/components 324 325 Should be compatible with other features. 326 327 #### Upgrade Downgrade Compatibility 328 329 Pulsar sink is a new feature, so there should be no upgrade 330 or downgrade compatibility issues. 331 332 ### Benchmark Tests 333 334 Perform benchmark tests under common scenarios, big data scenarios, multi-table scenarios, and wide table scenarios with different parameters. 335 336 ## Impacts & Risks 337 338 N/A 339 340 ## Investigation & Alternatives 341 342 N/A 343 344 ## Unresolved Questions 345 346 N/A