github.com/1aal/kubeblocks@v0.0.0-20231107070852-e1c03e598921/docs/user_docs/kubeblocks-for-postgresql/migration/migration-postgresql.md (about)

     1  ---
     2  title: Migrate data in PostgreSQL to KubeBlocks
     3  description: How to migrate data in PostgreSQL v14 to KubeBlocks by kbcli migration
     4  keywords: [postgresql, migration, kbcli migration, migrate data in PostgreSQL to KubeBlocks]
     5  sidebar_position: 2
     6  sidebar_label: Migration
     7  ---
     8  
     9  # Migrate data in PostgreSQL to KubeBlocks
    10  
    11  ## Environment and aim
    12  
    13  * Source: PostgreSQL version 14 installed by Pigsty on Alibaba Cloud ECS. PostGIS plugin is also installed and the open-source map data is imported by osm2pgsql.
    14  
    15  * Sink: PostgreSQL version 14.7.0 installed by KubeBlocks on AWS EKS. No plugin is installed.
    16  
    17  * Aim: Migrate db_test.public.table_test_1 and db_test.public.table_test2 in the source to the sink.
    18  
    19  ## Before you start
    20  
    21  ### Enable kbcli migration
    22  
    23  1. Install KubeBlocks: You can install KubeBlocks by [kbcli](./../../installation/install-with-kbcli/install-kubeblocks-with-kbcli.md) or by [Helm](./../../installation/install-with-helm/install-kubeblocks-with-helm.md).
    24  2. [Enable the migration add-on](./../../overview/supported-addons.md).
    25  
    26     ```bash
    27     kbcli addon list
    28  
    29     kbcli addon enable migration
    30     ```
    31  
    32  ### Configure the source
    33  
    34  Modify the configuration of the source to support CDC.
    35  
    36  1. Set 'wal_level' configuration to 'logical'.
    37  2. Make sure that the number of 'max_replication_slots' configured is sufficient.
    38  3. Make sure that the number of 'max_wal_senders' configured is sufficient.
    39  
    40  :::note
    41  
    42  * If you install PostgreSQL by Pigsty, you can modify the configuration by executing the `pg edit-config` command.
    43  * Modifying WAL (Write Ahead Log) configuration restarts the database. Make sure the modification is performed during off-peak hours.
    44  
    45  :::
    46  
    47  ### Check the account permission
    48  
    49  Make sure both the source and sink account meet the following permissions.
    50  
    51  * The source account
    52    * LOGIN permission
    53    * The read permission of the source migration objects
    54    * REPLICATION permission
    55  * The sink account
    56    * LOGIN permission
    57    * The read/write permission of the sink
    58  
    59  ### Initialize the sink
    60  
    61  1. Create a database named `db_test`.
    62  
    63     ```bash
    64     create database db_test;
    65     ```
    66  
    67  2. Install PostGIS and import osm data.
    68  
    69     * [Install PostGIS](https://postgis.net/install/). If you install PostgreSQL by Pigsty, PostGIS is built-in and you can execute `CREATE EXTENSION` according to your need.
    70     * [Import osm data](https://github.com/openstreetmap/osm2pgsql).
    71    
    72    :::note
    73  
    74    For the migration function in KubeBlocks version 5.0, there are limits for the structure initialization.
    75  
    76     1. User-defined types are not supported.
    77     2. A field filled with an Array data type (such as text[], text[3][3], and integer[]) is not supported for migration.
    78  
    79    :::
    80  
    81  ### Prepare data sampling
    82  
    83  It is recommended to prepare data sampling for verification after the migration to ensure correctness.
    84  
    85  ## Migrate data
    86  
    87  ### Steps
    88  
    89  1. Create a migration task.
    90  
    91     ```bash
    92     kbcli migration create mytask --template apecloud-pg2pg \
    93     --source user:123456@127.0.0.1:5432/db_test \
    94     --sink user:123456@127.0.0.2:5432/db_test \
    95     --migration-object '"public.table_test_1","public.table_test_2"'
    96     ```
    97  
    98     :paperclip: Table 1. Options explanation
    99  
   100     | Option     | Descriprion |
   101     | :--------- | :---------- |
   102     | mystask    | The name of the migration task. You can customize it. |
   103     | `--template` | It specifies the migration template. `--template apecloud-pg2pg` means that this migration task uses the template of migrating from PostgreSQL to PostgreSQL created by KubeBlocks. Run `kbcli migration templates` to view all available templates and the supported database information.   |
   104     | `--source`  | It specifies the source. `user:123456@127.0.0.1:5432/db_test` in the above example follows the format `${user_name}:${password}@${database connection url}/${database}`. For this guide, the connection URL uses the public network address. |
   105     | `--sink`     | It specifies the destination. `user:123456@127.0.0.2:5432/db_test` in the above example follows the format `${user_name}:${password}@${database connection url}/${database}`. For this guide, the connection URL uses the service address inside the Kubernetes cluster. |
   106     | `--migration-object`  | It specifies the migration object. The above example describes data in "public.table_test_1" and "public.table_test_2", including structure data, stock data, and incremental data generated during running migration task, will be migrated to the sink.    |
   107  
   108     :::note
   109  
   110     An example of the `--sink` URL:
   111  
   112     ![Sink](./../../../img/pg-migration-sink.png)
   113  
   114     :::
   115  
   116  2. (Optional) Specify migration steps by the flag `--steps`.
   117  
   118     The default steps follow the order precheck -> structure initialization -> data initialization -> incremental migration. You can use `--steps` to specify migration steps. For example, perform tasks in the order of precheck, full initialization, and incremental migration.
   119  
   120     ```bash
   121     kbcli migration create mytask --template apecloud-pg2pg \
   122     --source user:123456@127.0.0.1:5432/db_test \
   123     --sink user:123456@127.0.0.2:5432/db_test \
   124     --migration-object '"public.table_test_1","public.table_test_2"'
   125     --steps precheck=true,init-struct=false,init-data=true,cdc=true
   126     ```
   127  
   128  3. View the task status.
   129  
   130     ```bash
   131     # View the migration task list
   132     kbcli migration list
   133  
   134     # View the details of a specified task
   135     kbcli migration describe ${migration-task-name}
   136     ```
   137  
   138     ![Describe migration task](../../../img/pgsql-migration-describe-task.png)
   139  
   140     Pay attention to Initialization, CDC, and CDC Metrics.
   141  
   142     * Initialization
   143       * Precheck: If the status shows `Failed`, it means the initialization precheck does not pass. Troubleshoot the initialization by [the following examples in troubleshooting](#troubleshooting).
   144       * Init-struct: Structure initialization. Idempotent processing logic is adopted. A failure occurs only when a severe problem occurs, such as failing to connect a database.
   145       * Init-data: Data initialization. If there is a large amount of stock data, it takes a long time to perform this step and you should pay attention to Status.
   146     * CDC: Incremental migration. Based on the timestamp recorded by the system before the init-data step, the system starts data migration following eventual consistency and performs capturing the source library WAL (Write Ahead Log) changes -> writing to the sink. Under normal circumstances, the CDC phase continues if the migration link is not actively terminated.
   147     * CDC Metrics: Incremental migration indicators. Currently, the indicators mainly provide the WAL LSN (Log Sequencer Number) of the source library and the corresponding timestamp (note that the timestamp shows the local time zone of the Pod Container runtime) when the CDC process has completed "capturing -> writing" process.
   148  
   149       :::note
   150  
   151       The CDC Metrics are updated every 10 minutes by the system, i.e. if there exists continuous data writing into the source, metrics.timestamp here delays 10 minutes compared with the current time.
   152  
   153       :::
   154  
   155  4. Validate the migration with the prepared data sampling.
   156  
   157  ### Troubleshooting
   158  
   159  If any step above fails, run the command below to troubleshoot the failure.
   160  
   161  ```bash
   162  # --step: Specify the step. Allowed values: precheck,init-struct,init-data,cdc
   163  kbcli migration logs ${migration-task-name} --step ${step-name}
   164  ```
   165  
   166  ## Switch applications
   167  
   168  ### Before you start
   169  
   170  * Make sure the KubeBlocks migration task runs normally.
   171  * To differentiate the dialogue information and to improve data security, it is recommended to create and authorize another account dedicated to data migration.
   172  * For safety concerns, it is necessary to stop the business write and switch the application during off-peak hours.
   173  * Before switching the application, it is recommended to prepare data sampling for verification after switching to ensure correctness.
   174  * Pay attention to serial, sequence, and slot.
   175  * Serial and sequence
   176  
   177     Search and record the max. value of Sequence before switching applications and set it as the initial value of Sequence in the sink.
   178  
   179     After the business is switched to the sink, the new written Sequence does not take the max. value in the source as the initial value to continue in an increment order by default. You need to set it manually.
   180  
   181      ```bash
   182      # PostgreSQL Function Sample:
   183  
   184      CREATE FUNCTION build_setvals() returns void
   185      AS $$
   186      declare
   187        nsp name;
   188        rel name;
   189        val int8;
   190      begin
   191        for nsp,rel in select nspname,relname from pg_class t2 , pg_namespace t3 where t2.relnamespace=t3.oid and t2.relkind='S'
   192        loop
   193          execute format($_$select last_value from %I.%I$_$, nsp, rel) into val;
   194          raise notice '%',
   195          format($_$select setval('%I.%I'::regclass, %s);$_$, nsp, rel, val+1);
   196        end loop;
   197      end;
   198      $$ 
   199      LANGUAGE plpgsql;
   200  
   201      # Execute:
   202      psql -hxx -p xx -U xx -d xx -c "set client_min_messages = notice; select build_setvals();" | grep setval
   203  
   204      # Output like:
   205      NOTICE:  select setval('public.seq_test_1'::regclass, 2);
   206      NOTICE:  select setval('public.seq_test_2'::regclass, 1001);
   207      NOTICE:  select setval('public.seq_test_3'::regclass, 203);
   208      NOTICE:  select setval('public.seq_test_4'::regclass, 476);
   209  
   210      # Execute the above script in the sink
   211      ```
   212  
   213  * Slot lifecycle
   214  
   215    The CDC phase (the incremental migration) relies on the slot. A replication slot and publication are created during the CDC phase and these metadata should be deleted first before deleting the migration task metadata.
   216  
   217    If the incremental migration is performed, the migration task creates a replication slot named after the task name during the initialization phase of the init-data step (hyphens, "-", in the task name are replaced with underlines, "_"). When the incremental migration starts, the migration task creates a publication with the previous replication slot name and "_publication_for_all_tables" in the name to perform WAL consumption.
   218  
   219    When the migration task is deleted (running `kbcli migration terminate`), this task changes to the Terminating status. The termination operation first stops the CDC process, then tries to clear the above replication slot and publication. Only when the above metadata are cleared, is the metadata of the migration task deleted.
   220  
   221    :::note
   222  
   223    If the slot is not cleaned up, it affects the log-cleaning logic in PostgreSQL. When the PostgreSQL disk is insufficient and a redundant slot prevents the log from being cleaned up, a serious failure may occur.
   224  
   225    Therefore, the migration task can be deleted only when the slot cleanup is completed. If there is a PostgreSQL task that keeps the Terminating status for a long time, you need to pay attention to the status of the slot and disk water level and intervene manually when necessary.
   226  
   227    Cleanup operations adopt the idempotent logic, and the reasons for general cleanup failures include but are not limited to the following:
   228  
   229    * While the migration task is running, the connection string of the source library changes, which causes the migration task cannot connect to the source.
   230    * While the migration task is running, the account and password of the source change, which causes the migration task cannot connect to the source.
   231    * While the migration task is running, the permissions of the source library account change, which causes the migration task cannot be deleted.
   232  
   233    :::
   234  
   235  ### Steps
   236  
   237  1. Check the migration task status and ensure the task is performed normally.
   238     1. Describe the migration task details and all steps in Initialization are `Complete` and CDC is `Running`.
   239  
   240        ```bash
   241        kbcli migration describe ${migration-task-name}
   242        ```
   243  
   244     2. Under the prerequisite that there exists continuous write into the source, observe whether the timestamp is still in progress and whether there is almost no delay. For example,
   245  
   246        ```bash
   247        kbcli migration logs ${migration-task-name} --step cdc | grep current_position
   248        ```
   249  
   250        The results update every 10 seconds.
   251  
   252        ![Timestamp](../../../img/pgsql-migration-timestamp.png)
   253  2. Pause the business and stop new business data from being written into the source.
   254  3. View the migration status again and ensure the migration task runs normally, lasting at least one minute.
   255  
   256     Refer to the operations in step 1 and observe whether the link is normal and the timestamp meets the expectation.
   257  4. Use the sink to restore the business.
   258  5. Validate the switch with the prepared data sampling.
   259  
   260  ## Clean up the environment
   261  
   262  After the migration task is completed, you can terminate the migration task and function.
   263  
   264  ### Terminate the migration task
   265  
   266  Deleting the migration task does not affect the data in the source and sink.
   267  
   268  ```bash
   269  kbcli migration terminate ${migration-task-name}
   270  ```
   271  
   272  ### Terminate kbcli migration
   273  
   274  1. Check whether there are running migration tasks.
   275  
   276     ```bash
   277     kbcli migration list
   278     ```
   279  
   280  2. Disable the migration add-on.
   281  
   282     ```bash
   283     kbcli addon disable migration
   284     ```
   285  
   286  3. Delete the Kubernetes CRD (Custom Resource Definition) manually.
   287  
   288     ```bash
   289     kubectl delete crd migrationtasks.datamigration.apecloud.io migrationtemplates.datamigration.apecloud.io serialjobs.common.apecloud.io
   290     ```