github.com/pingcap/tiflow@v0.0.0-20240520035814-5bf52d54e205/dm/roadmap.md (about) 1 # Data Migration Roadmap 2 3 ## Primary Focus 4 5 - Make DM stable to use: all bugs with [`severity/critical`](https://github.com/pingcap/dm/labels/severity%2Fcritical), [`severity/major`](https://github.com/pingcap/dm/issues?q=is%3Aissue+is%3Aopen+label%3Aseverity%2Fmajor) or [`severity/moderate`](https://github.com/pingcap/dm/issues?q=is%3Aissue+is%3Aopen+label%3Aseverity%2Fmoderate) label must be fixed, and it's better to fix bugs with [`severity/minor`](https://github.com/pingcap/dm/issues?q=is%3Aissue+is%3Aopen+label%3Aseverity%2Fminor) label too. 6 - Make DM easy to use: continue to improve usability and optimize user experience. 7 8 ## Usability Improvement 9 10 - [ ] bring relay log support back in v2.0 [#1234](https://github.com/pingcap/dm/issues/1234) 11 - What: binlog replication unit can read binlog events from relay log, as it did in v1.0 12 - Why: 13 - AWS Aurora and some other RDS may purge binlog ASAP, but full dump & import may take a long time 14 - some users will create many data migration tasks for a single upstream instance, but it's better to avoid pull binlog events many times 15 - [ ] support to migrate exceeded 4GB binlog file automatically [#989](https://github.com/pingcap/dm/issues/989) 16 - What: exceeded 4GB binlog file doesn't interrupt the migration task 17 - Why: some operations (like `DELETE FROM` with large numbers of rows, `CREATE TABLE new_tbl AS SELECT * FROM orig_tbl`) in upstream may generate large binlog files 18 - [ ] better configuration file [#775](https://github.com/pingcap/dm/issues/775) 19 - What: avoid misusing for configuration files 20 - Why: many users meet problem when write configuration file but don’t know how to deal with it 21 - [ ] solve other known usability issues (continuous work) 22 - What: solve usability issues recorded in [the project](https://github.com/pingcap/dm/projects/3) 23 - Why: a lot of usability issues that have been not resolved yet, we need to stop user churn 24 25 ## New features 26 27 - [ ] stop/pause until reached the end of a transaction [#1095](https://github.com/pingcap/dm/issues/1095) 28 - What: replicate binlog events until reached the end of the current transaction when stopping or pausing the task 29 - Why: achieve transaction consistency as the upstream MySQL after stopped/paused the task 30 - [ ] stop at the specified position/GTID [#348](https://github.com/pingcap/dm/issues/348) 31 - What: stop to replicate when reached the specified binlog position or GTID, like `until_option` in MySQL 32 - Why: control over the data we want to replicate more precisely 33 - [ ] update source config online [#1076](https://github.com/pingcap/dm/issues/1076) 34 - What: update source configs (like upstream connection arguments) online 35 - Why: switch from one MySQL instance to another in the replica group easier 36 - [ ] provides a complete online replication checksum feature [#1097](https://github.com/pingcap/dm/issues/1097) 37 - What: 38 - check data without stopping writing in upstream 39 - no extra writes in upstream 40 - Why: found potential inconsistency earlier 41 - [ ] support DM v2.0 in TiDB Operator [tidb-operator#2868](https://github.com/pingcap/tidb-operator/issues/2868) 42 - What: use TiDB-Operator to manage DM 2.0 43 - [ ] use [Lightning](https://github.com/pingcap/tidb-lightning/) to import full dumped data [#405](https://github.com/pingcap/dm/issues/405) 44 - What: use Lightning as the full data load unit 45 - Why: 46 - Lightning is stabler than current Loader in DM 47 - Lightning support more source data formats, like CSV 48 - Lightning support more storage drivers, like AWS S3 49 50 ## Performance Improvement 51 52 - [ ] flush incremental checkpoint asynchronously [#605](https://github.com/pingcap/dm/pull/605) 53 - What: flush checkpoint doesn't block replication for DML statements 54 - Why: no block or serialization for DML replication should get better performance 55 56 ## Out of Scope currently 57 58 - Only supports synchronization of MySQL protocol Binlog to TiDB cluster but not all MySQL protocol databases 59 - Why: some MySQL protocol databases are not binlog protocol compatible 60 - Provides a fully automated shard DDL merge and synchronization solution 61 - Why: many scenes are difficult to automate 62 - Replicates multiple upstream MySQL sources with one dm-worker 63 - Why: Large amount of development work and many uncertainties