github.com/pingcap/tiflow@v0.0.0-20240520035814-5bf52d54e205/docs/design/2022-01-20-ticdc-mq-sink-multiple-topics.md (about)

     1  # TiCDC supports multi-topic dispatch
     2  
     3  - Author(s): [hi-rustin](https://github.com/hi-rustin)
     4  - Tracking Issue: https://github.com/pingcap/tiflow/issues/4423
     5  
     6  ## Table of Contents
     7  
     8  - [Introduction](#introduction)
     9  - [Motivation or Background](#motivation-or-background)
    10  - [Detailed Design](#detailed-design)
    11  - [Test Design](#test-design)
    12    - [Functional Tests](#functional-tests)
    13    - [Scenario Tests](#scenario-tests)
    14    - [Compatibility Tests](#compatibility-tests)
    15    - [Benchmark Tests](#benchmark-tests)
    16  - [Impacts & Risks](#impacts--risks)
    17  - [Investigation & Alternatives](#investigation--alternatives)
    18  - [Unresolved Questions](#unresolved-questions)
    19  
    20  ## Introduction
    21  
    22  This document provides a complete design on implementing multi-topic support in TiCDC MQ Sink.
    23  
    24  ## Motivation or Background
    25  
    26  TiCDC MQ Sink only supports sending messages to a single topic, but in the MQ Sink usage scenario, we send data to
    27  systems like [Flink], which requires us to support multiple topics, each topic as a data source.
    28  
    29  ## Detailed Design
    30  
    31  This solution will introduce a new configuration to the configuration file that specifies which topic the sink will send
    32  the table data to.
    33  
    34  We will continue to keep the original topic configuration in the sinkURI, which serves two purposes.
    35  
    36  1. when there is no new configuration or the configuration does not match, the data will be sent to that default topic.
    37  2. DDLs of the schema level will be sent to this topic by default.
    38  
    39  ### Topic dispatch configuration format
    40  
    41  This configuration will be added to the TiCDC changefeed configuration file.
    42  
    43  ```toml
    44  [sink]
    45  dispatchers = [
    46    { matcher = ['test1.*', 'test2.*'], dispatcher = "ts", topic = "Topic dispatch expression" },
    47    { matcher = ['test3.*', 'test4.*'], dispatcher = "rowid", topic = "Topic dispatch expression" },
    48    { matcher = ['test1.*', 'test5.*'], dispatcher = "ts", topic = "Topic dispatch expression" },
    49  ]
    50  ```
    51  
    52  A new topic field has been added to dispatchers that will specify the topic dispatching rules for these tables.
    53  
    54  ### Topic dispatch expression details
    55  
    56  The expression format looks like `flink_{schema}{table}`. This expression consists of two keywords and the `flink_`
    57  prefix.
    58  
    59  Two keywords(case-insensitive):
    60  
    61  | Keyword  | Description            | Required |
    62  | -------- | ---------------------- | -------- |
    63  | {schema} | the name of the schema | no       |
    64  | {table}  | the name of the table  | no       |
    65  
    66  > When neither keyword is filled in, it is equivalent to sending the data to a fixed topic.
    67  
    68  `flink_` is the user-definable part, where the user can fill in the expression with a custom string.
    69  
    70  Some examples:
    71  
    72  ```toml
    73  [sink]
    74  dispatchers = [
    75    { matcher = ['test1.table1', 'test2.table1'], topic = "{schema}_{table}" },
    76    { matcher = ['test3.*', 'test4.*'], topic = "flink{schema}" },
    77    { matcher = ['test1.*', 'test5.*'], topic = "test-cdc" },
    78  ]
    79  ```
    80  
    81  - matcher = ['test1.*', 'test2.*'], topic = "{schema}\_{table}"
    82    - Send the data from `test1.table1` and `test2.table1` to the `test1_table1` and `test2_table1` topics, respectively
    83  - matcher = ['test3.*', 'test4.*'], topic = "flink\_{schema}"
    84    - Send the data from all tables in `test3` and `test4` to `flinktest3` and `flinktest4` topics, respectively
    85  - matcher = ['test1.*', 'test5.*'], topic = "test-cdc"
    86    - Send the data of all the tables in `test1` (except `test1.table1`) and `test5` to the `test-cdc` topic
    87    - The `table1` in `test1` is sent to the `test1_table1` topic, because for tables matching multiple matcher rules, the
    88      topic expression corresponding to the top matcher prevails
    89  
    90  ### DDL dispatch rules
    91  
    92  - schema-level DDLs that are sent to the default topic
    93  - table-level DDLs, will be sent to the matching topic, if there is no topic match, it will be sent to the default topic
    94  
    95  ## Test Design
    96  
    97  This functionality will be mainly covered by unit and integration tests.
    98  
    99  ### Functional Tests
   100  
   101  #### Unit test
   102  
   103  Coverage should be more than 75% in new added code.
   104  
   105  #### Integration test
   106  
   107  Can pass all existing integration tests when changefeed without topic dispatch configuration. In addition, we will
   108  integrate [Flink] into our integration tests to verify multi-topic functionality.
   109  
   110  ### Scenario Tests
   111  
   112  We will test the scenario of using `canal-json` format to connect data to [Flink].
   113  
   114  ### Compatibility Tests
   115  
   116  #### Compatibility with other features/components
   117  
   118  For TiCDC's original support of only a single topic, we're not going to break it this time. When you pass only the
   119  default topic in the sinkURI and there is no topic expression configuration, it will work as is.
   120  
   121  #### Upgrade compatibility
   122  
   123  When not configured, it works as a single topic, so just add the configuration and create a new changefeed after the
   124  upgrade.
   125  
   126  #### Downgrade compatibility
   127  
   128  The new configuration is not recognized by the old TiCDC, so you need to remove the changefeed before downgrading.
   129  
   130  ### Benchmark Tests
   131  
   132  N/A
   133  
   134  ## Impacts & Risks
   135  
   136  N/A
   137  
   138  ## Investigation & Alternatives
   139  
   140  N/A
   141  
   142  ## Unresolved Questions
   143  
   144  N/A
   145  
   146  [flink]: https://flink.apache.org/