github.com/matrixorigin/matrixone@v1.2.0/pkg/txn/trace/README.md (about) 1 ## Txn trace framework 2 During MO iterations, you may encounter various transaction correctness issues, such as Duplicate and W-W Conflict issues. 3 4 These problems are often difficult to analyze by relying on logs and error messages, and often take a lot of developer time because debugging transactional correctness problems is quite difficult. Often, these problems can be very difficult to reproduce. 5 6 ## Design thinking 7 ## Trace what data 8 Combined with previous experience analyzing transaction correctness issues, we typically need some of the following information: 9 10 * Transaction metadata 11 12 The metadata of the transaction, and information about each metadata change. There is no doubt that this information is very important. At least, it includes, the transaction's snapshot ts, commit ts, and so on. 13 14 * Transaction Read data 15 16 The content and version of all the data read during the execution of the transaction. 17 18 * Transaction Commit data 19 20 The content of the data that the transaction commits to TN. 21 22 * Logtail data 23 24 What is the Logtail received by CN, including the data content, and the order. 25 26 * Transaction correctness errors 27 28 Known transaction correctness errors, including information about the table and row where the problem was found. 29 30 * Conflict Handling 31 32 Information about lock conflicts that occur in pessimistic mode and how they are handled. This information determines whether or not the transaction will lost update. 33 34 When we have this information, and in chronological order of occurrence, then we have all the information we need to analyze the problem, and theoretically find the problem whenever it occurs. 35 36 ### Storage and Access 37 The amount of data in Trace is huge and how it is stored and used determines how difficult it is to analyze the problem. 38 39 When analyzing the problem, we need to retrieve the data as well as filter the data based on some conditions and the best way to access it is SQL. 40 41 Once we decide to use SQL to access this trace data, it is natural that this data should be stored in MO's own internal tables. 42 43 ## Design program 44 45 ### Trace data internal table 46 47 #### Features 48 ```sql 49 create table trace_features ( 50 name varchar(50) not null primary key, ``sql create table trace_features ( 51 state varchar(20) not null 52 ); 53 ``` 54 55 * txn 56 57 Tracks transaction metadata change events over the lifetime of the transaction. 58 59 Enable : `select mo_ctl('cn', 'txn-trace', 'enable txn') ` 60 61 Disable: `select mo_ctl('cn', 'txn-trace', 'disable txn')` 62 63 64 * data 65 66 Trace data change events such as commit, apply logtail, flush, etc. 67 68 Enable : `select mo_ctl('cn', 'txn-trace', 'enable data')` 69 70 Disable: `select mo_ctl('cn', 'txn-trace', 'disable data')` 71 72 * txn-workspace 73 74 Tracks all data changes to the workspace during the lifecycle of the transaction. 75 76 Enable: `select mo_ctl('cn', 'txn-trace', 'enable txn-workspace')` 77 78 Disable: `select mo_ctl('cn', 'txn-trace', 'disable txn-workspace')` 79 80 * txn-action 81 82 Trace the overhead of all actions in the transaction's lifecycle (time consumed, number of data blocks read, etc.). 83 84 Enable: `select mo_ctl('cn', 'txn-trace', 'enable txn-action') ` 85 86 Disabled: `select mo_ctl('cn', 'txn-trace', 'disable txn-action')` 87 88 * statement 89 90 Trace the elapsed time of a statement that satisfies a condition. Used in conjunction with the txn-action feature to track the overhead of each stage of statement execution for performance tuning. 91 92 Enable: `select mo_ctl('cn', 'txn-trace', 'enable statement')` 93 94 Disable: `select mo_ctl('cn', 'txn-trace', 'disable statement')` 95 96 #### Trace data 97 ```sql 98 create table trace_event_txn ( 99 ts bigint not null, 100 txn_id varchar(50) not null, 101 cn varchar(100) not null, 102 event_type varchar(50) not null, 103 txn_status varchar(10), 104 snapshot_ts varchar(50), 105 commit_ts varchar(50), 106 info varchar(1000) 107 ) 108 109 create table trace_event_data ( 110 ts bigint not null, 111 cn varchar(100) not null, 112 event_type varchar(50) not null, 113 entry_type varchar(50) not null, 114 table_id bigint UNSIGNED not null, 115 txn_id varchar(50), 116 row_data varchar(500) not null, 117 committed_ts varchar(50), 118 snapshot_ts varchar(50) 119 ) 120 121 create table trace_event_txn_action ( 122 ts bigint not null, 123 txn_id varchar(50) not null, 124 cn varchar(50) not null, 125 table_id bigint UNSIGNED, 126 action varchar(100) not null, 127 action_sequence bigint UNSIGNED not null, 128 value bigint, 129 unit varchar(10), 130 err varchar(100) 131 ) 132 133 create table trace_event_error ( 134 ts bigint not null, 135 txn_id varchar(50) not null, 136 error_info varchar(1000) not null 137 ) 138 139 create table trace_statement ( 140 ts bigint not null, 141 txn_id varchar(50) not null, 142 sql varchar(1000) not null, 143 cost_us bigint not null 144 ) 145 ``` 146 147 * trace_event_txn 148 149 The transaction events table, which records data about changes to the transaction metadata, as well as information about lock conflict handling for the transaction. The `txn` feature takes effect when it is turned on. 150 151 * trace_event_data 152 153 The trace_event_data table contains all transaction read data, commit data, and logtail event data. The `data` feature takes effect when it is turned on. 154 155 * trace_event_txn_action 156 157 The transaction action table, which records information about all actions during the transaction lifecycle, including the overhead of those actions (execution time, number of data blocks, etc.). The `txn-action` feature takes effect when it is turned on. 158 159 * ` trace_event_error 160 161 Event data table for transaction correctness errors. This table is the starting point for querying transaction correctness issues. Find the transactions that went wrong and analyze the data. The `txn` feature takes effect when it is opened. 162 163 Use `select mo_ctl('cn', 'txn-trace', 'decode-complex complex_pk_value')` to decode those with a complex primary key. 164 165 * trace_statement 166 167 statement table, the statement that meets the conditions will be recorded, including the content of the sql, time consumed, and transaction 168 169 #### filter 170 ```sql 171 create table trace_table_filters ( 172 id bigint UNSIGNED primary key auto_increment, 173 table_id bigint UNSIGNED not null, 174 table_name varchar(50) not null, 175 columns varchar(200) 176 ); 177 178 create table trace_txn_filters ( 179 id bigint UNSIGNED primary key auto_increment, 180 method varchar(50) not null, 181 value varchar(500) not null 182 ); 183 184 create table trace_statement_filters ( 185 id bigint UNSIGNED primary key auto_increment, 186 method varchar(50) not null, 187 value varchar(500) not null 188 ); 189 ``` 190 191 Filter condition table, supports filtering by table, transaction and statement conditions, used to trace the transaction and data to be concerned. 192 193 194 * trace_table_filters 195 196 Table data filtering, one row corresponds to one table, can create multiple records at the same time, used to trace multiple tables. columns field is not set, trace all the fields of this table. `There are multiple rows, data that satisfies any row will be traced; 0 rows, no data will be traced`. 197 198 Add all columns of the table: `select mo_ctl('cn', 'txn-trace', 'add-table table_name')` 199 200 Add table specified columns: `select mo_ctl('cn', 'txn-trace', 'add-table table_name col1,col2')`, multiple cols split by `,` 201 202 Clear: `select mo_ctl('cn', 'txn-trace', 'clear data')` 203 204 Refresh: `select mo_ctl('cn', 'txn-trace', 'refresh data')` 205 206 207 * trace_txn_filters 208 209 Transaction filtering, one row represents a constraint on a transaction, multiple constraints can be created at the same time, `when there are multiple rows, all constraints must be satisfied for the transaction to be traced; 0 rows, no trace will be performed on the transaction`. 210 211 Filter transactions that specify a user: `select mo_ctl('cn', 'txn-trace', 'add-txn user user_name')` 212 213 Filter transactions for a specified tenant: `select mo_ctl('cn', 'txn-trace', 'add-txn tenant account_id user_name')` 214 215 Filter transactions that specify a session: `select mo_ctl('cn', 'txn-trace', 'add-txn session session_id')` 216 217 Filter transactions for the specified connection: `select mo_ctl('cn', 'txn-trace', 'add-txn connection session_id connection_id')` 218 219 clear: `select mo_ctl('cn', 'txn-trace', 'clear txn')` 220 221 Refresh: `select mo_ctl('cn', 'txn-trace', 'refresh txn')` 222 223 * trace_statement_filters 224 225 Statement filtering, one row of records represents the constraints of a statement, multiple constraints can be created at the same time, `when there are multiple rows of records, all constraints must be satisfied for the statement to be traced; 0 rows of records, no trace will be performed on the statement`. 226 227 Filter statements whose execution time exceeds a threshold: `select mo_ctl('cn', 'txn-trace', 'add-statement cost value[us|ms|s|m|h]')`. 228 229 Filter statements containing keywords: `select mo_ctl('cn', 'txn-trace', 'add-statement contains value)` 230 231 Clear: `select mo_ctl('cn', 'txn-trace', 'clear statement')` 232 233 Refresh: `select mo_ctl('cn', 'txn-trace', 'refresh statement')` 234 235 ### How trace data is written 236 Trace data if written in real time then this performance is terrible and almost unusable. So we take an asynchronous write approach. 237 238 Our CN nodes come with a local disk, so we use the local disk as a cache and use load csv file to write the trace data. 239 240 Using a single threaded approach, write trace data to CSV file using sequential write, write a certain size (default 16M), flush to disk, and load to MO.