github.com/matrixorigin/matrixone@v1.2.0/pkg/txn/trace/README.md (about)

     1  ## Txn trace framework
     2  During MO iterations, you may encounter various transaction correctness issues, such as Duplicate and W-W Conflict issues.
     3  
     4  These problems are often difficult to analyze by relying on logs and error messages, and often take a lot of developer time because debugging transactional correctness problems is quite difficult. Often, these problems can be very difficult to reproduce.
     5  
     6  ## Design thinking
     7  ## Trace what data
     8  Combined with previous experience analyzing transaction correctness issues, we typically need some of the following information:
     9  
    10  * Transaction metadata
    11  
    12    The metadata of the transaction, and information about each metadata change. There is no doubt that this information is very important. At least, it includes, the transaction's snapshot ts, commit ts, and so on.
    13  
    14  * Transaction Read data
    15    
    16    The content and version of all the data read during the execution of the transaction.
    17  
    18  * Transaction Commit data
    19  
    20    The content of the data that the transaction commits to TN.
    21  
    22  * Logtail data
    23   
    24    What is the Logtail received by CN, including the data content, and the order.
    25  
    26  * Transaction correctness errors
    27    
    28    Known transaction correctness errors, including information about the table and row where the problem was found.
    29  
    30  * Conflict Handling
    31    
    32    Information about lock conflicts that occur in pessimistic mode and how they are handled. This information determines whether or not the transaction will lost update.
    33  
    34  When we have this information, and in chronological order of occurrence, then we have all the information we need to analyze the problem, and theoretically find the problem whenever it occurs.
    35  
    36  ### Storage and Access
    37  The amount of data in Trace is huge and how it is stored and used determines how difficult it is to analyze the problem.
    38  
    39  When analyzing the problem, we need to retrieve the data as well as filter the data based on some conditions and the best way to access it is SQL.
    40  
    41  Once we decide to use SQL to access this trace data, it is natural that this data should be stored in MO's own internal tables.
    42  
    43  ## Design program
    44  
    45  ### Trace data internal table
    46  
    47  #### Features
    48  ```sql
    49  create table trace_features (
    50      name varchar(50) not null primary key, ``sql create table trace_features (
    51      state varchar(20) not null
    52  );
    53  ```
    54  
    55  * txn
    56    
    57    Tracks transaction metadata change events over the lifetime of the transaction.
    58  
    59    Enable : `select mo_ctl('cn', 'txn-trace', 'enable txn') `
    60    
    61    Disable: `select mo_ctl('cn', 'txn-trace', 'disable txn')`
    62  
    63  
    64  * data
    65  
    66    Trace data change events such as commit, apply logtail, flush, etc.
    67    
    68    Enable : `select mo_ctl('cn', 'txn-trace', 'enable data')`
    69    
    70    Disable: `select mo_ctl('cn', 'txn-trace', 'disable data')`
    71  
    72  * txn-workspace
    73  
    74    Tracks all data changes to the workspace during the lifecycle of the transaction.
    75    
    76    Enable:  `select mo_ctl('cn', 'txn-trace', 'enable txn-workspace')`
    77    
    78    Disable: `select mo_ctl('cn', 'txn-trace', 'disable txn-workspace')`
    79  
    80  * txn-action
    81  
    82    Trace the overhead of all actions in the transaction's lifecycle (time consumed, number of data blocks read, etc.).
    83    
    84    Enable:   `select mo_ctl('cn', 'txn-trace', 'enable txn-action') `
    85    
    86    Disabled: `select mo_ctl('cn', 'txn-trace', 'disable txn-action')`
    87  
    88  * statement
    89   
    90    Trace the elapsed time of a statement that satisfies a condition. Used in conjunction with the txn-action feature to track the overhead of each stage of statement execution for performance tuning.
    91    
    92    Enable:  `select mo_ctl('cn', 'txn-trace', 'enable statement')`
    93    
    94    Disable: `select mo_ctl('cn', 'txn-trace', 'disable statement')`
    95  
    96  #### Trace data
    97  ```sql
    98  create table trace_event_txn (
    99      ts 			          bigint       not null,
   100      txn_id            varchar(50)  not null,
   101      cn                varchar(100) not null,
   102      event_type        varchar(50)  not null,
   103      txn_status			  varchar(10),
   104      snapshot_ts       varchar(50),
   105      commit_ts         varchar(50),
   106      info              varchar(1000)
   107  )
   108  
   109  create table trace_event_data (
   110      ts 			          bigint          not null,
   111      cn                varchar(100)    not null,
   112      event_type        varchar(50)     not null,
   113      entry_type			  varchar(50)     not null,
   114      table_id 	        bigint UNSIGNED not null,
   115      txn_id            varchar(50),
   116      row_data          varchar(500)    not null, 
   117      committed_ts      varchar(50),
   118      snapshot_ts       varchar(50)
   119  )
   120  
   121  create table trace_event_txn_action (
   122      ts 			          bigint          not null,
   123      txn_id            varchar(50)     not null,
   124      cn                varchar(50)     not null,
   125      table_id          bigint UNSIGNED,
   126      action            varchar(100)    not null,
   127      action_sequence   bigint UNSIGNED not null,
   128      value             bigint,
   129      unit              varchar(10),
   130      err               varchar(100) 
   131  )
   132  
   133  create table trace_event_error (
   134      ts 			          bigint          not null,
   135      txn_id            varchar(50)     not null,
   136      error_info        varchar(1000)   not null
   137  )
   138  
   139  create table trace_statement (
   140      ts 			   bigint          not null,
   141      txn_id     varchar(50)     not null,
   142      sql        varchar(1000)   not null,
   143      cost_us    bigint          not null 
   144  )
   145  ```
   146  
   147  * trace_event_txn
   148  
   149    The transaction events table, which records data about changes to the transaction metadata, as well as information about lock conflict handling for the transaction. The `txn` feature takes effect when it is turned on.
   150  
   151  * trace_event_data
   152   
   153    The trace_event_data table contains all transaction read data, commit data, and logtail event data. The `data` feature takes effect when it is turned on.
   154  
   155  * trace_event_txn_action
   156  
   157    The transaction action table, which records information about all actions during the transaction lifecycle, including the overhead of those actions (execution time, number of data blocks, etc.). The `txn-action` feature takes effect when it is turned on.
   158  
   159  * ` trace_event_error
   160  
   161    Event data table for transaction correctness errors. This table is the starting point for querying transaction correctness issues. Find the transactions that went wrong and analyze the data. The `txn` feature takes effect when it is opened.
   162  
   163    Use `select mo_ctl('cn', 'txn-trace', 'decode-complex complex_pk_value')` to decode those with a complex primary key.
   164  
   165  * trace_statement
   166  
   167    statement table, the statement that meets the conditions will be recorded, including the content of the sql, time consumed, and transaction
   168  
   169  #### filter
   170  ```sql
   171  create table trace_table_filters (
   172      id              bigint UNSIGNED primary key auto_increment,
   173      table_id			  bigint UNSIGNED not null,
   174      table_name      varchar(50)     not null,
   175      columns         varchar(200)
   176  );
   177  
   178  create table trace_txn_filters (
   179      id             bigint UNSIGNED primary key auto_increment,
   180      method         varchar(50)     not null,
   181      value          varchar(500)    not null
   182  );
   183  
   184  create table trace_statement_filters (
   185      id             bigint UNSIGNED primary key auto_increment,
   186      method         varchar(50)     not null,
   187      value          varchar(500)    not null
   188  );
   189  ```
   190  
   191  Filter condition table, supports filtering by table, transaction and statement conditions, used to trace the transaction and data to be concerned.
   192  
   193  
   194  * trace_table_filters
   195  
   196    Table data filtering, one row corresponds to one table, can create multiple records at the same time, used to trace multiple tables. columns field is not set, trace all the fields of this table. `There are multiple rows, data that satisfies any row will be traced; 0 rows, no data will be traced`.
   197  
   198    Add all columns of the table: `select mo_ctl('cn', 'txn-trace', 'add-table table_name')`
   199  
   200    Add table specified columns: `select mo_ctl('cn', 'txn-trace', 'add-table table_name col1,col2')`, multiple cols split by `,`
   201  
   202    Clear: `select mo_ctl('cn', 'txn-trace', 'clear data')`
   203  
   204    Refresh: `select mo_ctl('cn', 'txn-trace', 'refresh data')`
   205  
   206  
   207  * trace_txn_filters
   208   
   209    Transaction filtering, one row represents a constraint on a transaction, multiple constraints can be created at the same time, `when there are multiple rows, all constraints must be satisfied for the transaction to be traced; 0 rows, no trace will be performed on the transaction`.
   210  
   211    Filter transactions that specify a user: `select mo_ctl('cn', 'txn-trace', 'add-txn user user_name')`
   212  
   213    Filter transactions for a specified tenant: `select mo_ctl('cn', 'txn-trace', 'add-txn tenant account_id user_name')`
   214  
   215    Filter transactions that specify a session: `select mo_ctl('cn', 'txn-trace', 'add-txn session session_id')`
   216  
   217    Filter transactions for the specified connection: `select mo_ctl('cn', 'txn-trace', 'add-txn connection session_id connection_id')`
   218  
   219    clear: `select mo_ctl('cn', 'txn-trace', 'clear txn')`
   220  
   221    Refresh: `select mo_ctl('cn', 'txn-trace', 'refresh txn')`
   222  
   223  * trace_statement_filters
   224  
   225    Statement filtering, one row of records represents the constraints of a statement, multiple constraints can be created at the same time, `when there are multiple rows of records, all constraints must be satisfied for the statement to be traced; 0 rows of records, no trace will be performed on the statement`.
   226  
   227    Filter statements whose execution time exceeds a threshold: `select mo_ctl('cn', 'txn-trace', 'add-statement cost value[us|ms|s|m|h]')`.
   228  
   229    Filter statements containing keywords: `select mo_ctl('cn', 'txn-trace', 'add-statement contains value)`
   230  
   231    Clear: `select mo_ctl('cn', 'txn-trace', 'clear statement')`
   232  
   233    Refresh: `select mo_ctl('cn', 'txn-trace', 'refresh statement')`
   234  
   235  ### How trace data is written
   236  Trace data if written in real time then this performance is terrible and almost unusable. So we take an asynchronous write approach.
   237  
   238  Our CN nodes come with a local disk, so we use the local disk as a cache and use load csv file to write the trace data.
   239  
   240  Using a single threaded approach, write trace data to CSV file using sequential write, write a certain size (default 16M), flush to disk, and load to MO.