github.com/apache/beam/sdks/v2@v2.48.2/java/io/debezium/src/README.md (about)

     1  <!--
     2      Licensed to the Apache Software Foundation (ASF) under one
     3      or more contributor license agreements.  See the NOTICE file
     4      distributed with this work for additional information
     5      regarding copyright ownership.  The ASF licenses this file
     6      to you under the Apache License, Version 2.0 (the
     7      "License"); you may not use this file except in compliance
     8      with the License.  You may obtain a copy of the License at
     9  
    10        http://www.apache.org/licenses/LICENSE-2.0
    11  
    12      Unless required by applicable law or agreed to in writing,
    13      software distributed under the License is distributed on an
    14      "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    15      KIND, either express or implied.  See the License for the
    16      specific language governing permissions and limitations
    17      under the License.
    18  -->
    19  
    20  # DebeziumIO
    21  ## Connect your Debezium Databases to Apache Beam easily.
    22  
    23  ### What is DebeziumIO?
    24  DebeziumIO is an Apache Beam connector that lets users connect their Events-Driven Databases on [Debezium](https://debezium.io) to [Apache Beam](https://beam.apache.org/) without the need to set up a [Kafka](https://kafka.apache.org/) instance.
    25  
    26  ### Getting Started
    27  
    28  DebeziumIO uses [Debezium Connectors v1.3](https://debezium.io/documentation/reference/1.3/connectors/) to connect to Apache Beam. All you need to do is choose the Debezium Connector that suits your Debezium setup and pick a [Serializable Function](https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/transforms/SerializableFunction.html), then you will be able to connect to Apache Beam and start building your own Pipelines.
    29  
    30  These connectors have been successfully tested and are known to work fine:
    31  *  MySQL Connector
    32  *  PostgreSQL Connector
    33  *  SQLServer Connector
    34  *  DB2 Connector
    35  
    36  Other connectors might also work.
    37  
    38  
    39  Setting up a connector and running a Pipeline should be as simple as:
    40  ```
    41  Pipeline p = Pipeline.create();                   // Create a Pipeline
    42          p.apply(DebeziumIO.<String>read()
    43                  .withConnectorConfiguration(...)  // Debezium Connector setup
    44                  .withFormatFunction(...)          // Serializable Function to use
    45          ).setCoder(StringUtf8Coder.of());
    46  p.run().waitUntilFinish();                        // Run your pipeline!
    47  ```
    48  
    49  ### Setting up a Debezium Connector
    50  
    51  DebeziumIO comes with a handy ConnectorConfiguration builder, which lets you provide all the configuration needed to access your Debezium Database.
    52  
    53  A basic configuration such as **username**, **password**, **port number**, and **host name** must be specified along with the **Debezium Connector class** you will use by using these methods:
    54  
    55  |Method|Param|Description|
    56  |-|-|-|
    57  |`.withConnectorClass(connectorClass)`|_Class_|Debezium Connector|
    58  |`.withUsername(username)`|_String_|Database Username|
    59  |`.withPassword(password)`|_String_|Database Password|
    60  |`.withHostName(hostname)`|_String_|Database Hostname|
    61  |`.withPort(portNumber)`|_String_|Database Port number|
    62  
    63  You can also add more configuration, such as Connector-specific Properties with the `_withConnectionProperty_` method:
    64  
    65  |Method|Params|Description|
    66  |-|-|-|
    67  |`.withConnectionProperty(propName, propValue)`|_String_, _String_|Adds a custom property to the connector.|
    68  > **Note:** For more information on custom properties, see your [Debezium Connector](https://debezium.io/documentation/reference/1.3/connectors/) specific documentation.
    69  
    70  Example of a MySQL Debezium Connector setup:
    71  ```
    72  DebeziumIO.ConnectorConfiguration.create()
    73          .withUsername("dbUsername")
    74          .withPassword("dbPassword")
    75          .withConnectorClass(MySqlConnector.class)
    76          .withHostName("127.0.0.1")
    77          .withPort("3306")
    78          .withConnectionProperty("database.server.id", "serverId")
    79          .withConnectionProperty("database.server.name", "serverName")
    80          .withConnectionProperty("database.include.list", "dbName")
    81          .withConnectionProperty("include.schema.changes", "false")
    82  ```
    83  
    84  ### Setting a Serializable Function
    85  
    86  A serializable function is required to depict each `SourceRecord` fetched from the Database.
    87  
    88  DebeziumIO comes with a built-in JSON Mapper that you can optionally use to map every `SourceRecord` fetched from the Database to a JSON object. This helps users visualize and access their data in a simple way.
    89  
    90  If you want to use this built-in JSON Mapper, you can do it by setting an instance of **SourceRecordJsonMapper** as a Serializable Function to the DebeziumIO:
    91  ```
    92  .withFormatFunction(new SourceRecordJson.SourceRecordJsonMapper())
    93  ```
    94  > **Note:** `SourceRecordJsonMapper`comes out of the box, but you may use any Format Function you prefer.
    95  
    96  ## Quick Example
    97  
    98  The following example is how an actual setup would look like using a **MySQL Debezium Connector** and **SourceRecordJsonMapper** as the Serializable Function.
    99  ```
   100  PipelineOptions options = PipelineOptionsFactory.create();
   101  Pipeline p = Pipeline.create(options);
   102  p.apply(DebeziumIO.<String>read().
   103          withConnectorConfiguration(                     // Debezium Connector setup
   104                  DebeziumIO.ConnectorConfiguration.create()
   105                          .withUsername("debezium")
   106                          .withPassword("dbz")
   107                          .withConnectorClass(MySqlConnector.class)
   108                          .withHostName("127.0.0.1")
   109                          .withPort("3306")
   110                          .withConnectionProperty("database.server.id", "184054")
   111                          .withConnectionProperty("database.server.name", "dbserver1")
   112                          .withConnectionProperty("database.include.list", "inventory")
   113                          .withConnectionProperty("include.schema.changes", "false")
   114          ).withFormatFunction(
   115                  new SourceRecordJson.SourceRecordJsonMapper() // Serializable Function
   116          )
   117  ).setCoder(StringUtf8Coder.of());
   118  
   119  p.run().waitUntilFinish();
   120  ```
   121  
   122  ## Shortcut!
   123  
   124  If you will be using the built-in **SourceRecordJsonMapper** as your Serializable Function for all your pipelines, you should use **readAsJson()**.
   125  
   126  DebeziumIO comes with a method called `readAsJson`, which automatically sets the `SourceRecordJsonMapper` as the Serializable Function for your pipeline. This way, you would need to setup your connector before running your pipeline, without explicitly setting a Serializable Function.
   127  
   128  Example of using **readAsJson**:
   129  ```
   130  PipelineOptions options = PipelineOptionsFactory.create();
   131  Pipeline p = Pipeline.create(options);
   132  p.apply(DebeziumIO.<String>read().
   133          withConnectorConfiguration(                     // Debezium Connector setup
   134                  DebeziumIO.ConnectorConfiguration.create()
   135                          .withUsername("debezium")
   136                          .withPassword("dbz")
   137                          .withConnectorClass(MySqlConnector.class)
   138                          .withHostName("127.0.0.1")
   139                          .withPort("3306")
   140                          .withConnectionProperty("database.server.id", "184054")
   141                          .withConnectionProperty("database.server.name", "dbserver1")
   142                          .withConnectionProperty("database.include.list", "inventory")
   143                          .withConnectionProperty("include.schema.changes", "false"));
   144  
   145  p.run().waitUntilFinish();
   146  ```
   147  
   148  ## Under the hood
   149  
   150  ### KafkaSourceConsumerFn and Restrictions
   151  
   152  KafkaSourceConsumerFn (KSC onwards) is a [DoFn](https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/transforms/DoFn.html) in charge of the Database replication and CDC.
   153  
   154  There are two ways of initializing KSC:
   155  *  Restricted by number of records
   156  *  Restricted by amount of time (minutes)
   157  
   158  By default, DebeziumIO initializes it with the former, though user may choose the latter by setting the amount of minutes as a parameter:
   159  
   160  |Function|Param|Description|
   161  |-|-|-|
   162  |`KafkaSourceConsumerFn(connectorClass, recordMapper, maxRecords)`|_Class, SourceRecordMapper, Int_|Restrict run by number of records (Default).|
   163  |`KafkaSourceConsumerFn(connectorClass, recordMapper, timeToRun)`|_Class, SourceRecordMapper, Long_|Restrict run by amount of time (in minutes).|
   164  
   165  ### Requirements and Supported versions
   166  
   167  -  JDK v8
   168  -  Debezium Connectors v1.3
   169  -  Apache Beam 2.25
   170  
   171  ## Running Unit Tests
   172  
   173  You can run Integration Tests using **gradlew**.
   174  
   175  Example of running the MySQL Connector Integration Test:
   176  ```
   177  ./gradlew integrationTest -p sdks/java/io/debezium/ --tests org.apache.beam.io.debezium.DebeziumIOMySqlConnectorIT -DintegrationTestRunner=direct
   178  ```