github.com/pingcap/tiflow@v0.0.0-20240520035814-5bf52d54e205/dm/roadmap.md (about)

     1  # Data Migration Roadmap
     2  
     3  ## Primary Focus
     4  
     5  - Make DM stable to use: all bugs with [`severity/critical`](https://github.com/pingcap/dm/labels/severity%2Fcritical), [`severity/major`](https://github.com/pingcap/dm/issues?q=is%3Aissue+is%3Aopen+label%3Aseverity%2Fmajor) or [`severity/moderate`](https://github.com/pingcap/dm/issues?q=is%3Aissue+is%3Aopen+label%3Aseverity%2Fmoderate) label must be fixed, and it's better to fix bugs with [`severity/minor`](https://github.com/pingcap/dm/issues?q=is%3Aissue+is%3Aopen+label%3Aseverity%2Fminor) label too.
     6  - Make DM easy to use: continue to improve usability and optimize user experience.
     7  
     8  ## Usability Improvement
     9  
    10  - [ ] bring relay log support back in v2.0 [#1234](https://github.com/pingcap/dm/issues/1234)
    11    - What: binlog replication unit can read binlog events from relay log, as it did in v1.0
    12    - Why:
    13      - AWS Aurora and some other RDS may purge binlog ASAP, but full dump & import may take a long time
    14      - some users will create many data migration tasks for a single upstream instance, but it's better to avoid pull binlog events many times
    15  - [ ] support to migrate exceeded 4GB binlog file automatically [#989](https://github.com/pingcap/dm/issues/989)
    16    - What: exceeded 4GB binlog file doesn't interrupt the migration task
    17    - Why: some operations (like `DELETE FROM` with large numbers of rows, `CREATE TABLE new_tbl AS SELECT * FROM orig_tbl`) in upstream may generate large binlog files
    18  - [ ] better configuration file [#775](https://github.com/pingcap/dm/issues/775)
    19    - What: avoid misusing for configuration files
    20    - Why: many users meet problem when write configuration file but don’t know how to deal with it
    21  - [ ] solve other known usability issues (continuous work)
    22    - What: solve usability issues recorded in [the project](https://github.com/pingcap/dm/projects/3)
    23    - Why: a lot of usability issues that have been not resolved yet, we need to stop user churn
    24  
    25  ## New features
    26  
    27  - [ ] stop/pause until reached the end of a transaction [#1095](https://github.com/pingcap/dm/issues/1095)
    28    - What: replicate binlog events until reached the end of the current transaction when stopping or pausing the task
    29    - Why: achieve transaction consistency as the upstream MySQL after stopped/paused the task
    30  - [ ] stop at the specified position/GTID [#348](https://github.com/pingcap/dm/issues/348)
    31    - What: stop to replicate when reached the specified binlog position or GTID, like `until_option` in MySQL
    32    - Why: control over the data we want to replicate more precisely
    33  - [ ] update source config online [#1076](https://github.com/pingcap/dm/issues/1076)
    34    - What: update source configs (like upstream connection arguments) online
    35    - Why: switch from one MySQL instance to another in the replica group easier
    36  - [ ] provides a complete online replication checksum feature [#1097](https://github.com/pingcap/dm/issues/1097)
    37    - What:
    38      - check data without stopping writing in upstream
    39      - no extra writes in upstream
    40    - Why: found potential inconsistency earlier
    41  - [ ] support DM v2.0 in TiDB Operator [tidb-operator#2868](https://github.com/pingcap/tidb-operator/issues/2868)
    42    - What: use TiDB-Operator to manage DM 2.0
    43  - [ ] use [Lightning](https://github.com/pingcap/tidb-lightning/) to import full dumped data [#405](https://github.com/pingcap/dm/issues/405)
    44    - What: use Lightning as the full data load unit
    45    - Why:
    46      - Lightning is stabler than current Loader in DM
    47      - Lightning support more source data formats, like CSV
    48      - Lightning support more storage drivers, like AWS S3
    49  
    50  ## Performance Improvement
    51  
    52  - [ ] flush incremental checkpoint asynchronously [#605](https://github.com/pingcap/dm/pull/605)
    53    - What: flush checkpoint doesn't block replication for DML statements
    54    - Why: no block or serialization for DML replication should get better performance
    55  
    56  ## Out of Scope currently
    57  
    58  - Only supports synchronization of MySQL protocol Binlog to TiDB cluster but not all MySQL protocol databases
    59    - Why: some MySQL protocol databases are not binlog protocol compatible
    60  - Provides a fully automated shard DDL merge and synchronization solution
    61    - Why: many scenes are difficult to automate
    62  - Replicates multiple upstream MySQL sources with one dm-worker
    63    - Why: Large amount of development work and many uncertainties