github.com/matrixorigin/matrixone@v0.7.0/docs/rfcs/20211210_aoe_layout.md (about) 1 - Feature Name: Analytic Optimized Engine Layout 2 - Status: In Progress 3 - Start Date: 2021-05-10 4 - Authors: [Xu Peng](https://github.com/XuPeng-SH) 5 - Implementation PR: [#1348](https://github.com/matrixorigin/matrixone/pull/1348) 6 - Issue for this RFC: [#1320](https://github.com/matrixorigin/matrixone/pull/1320) 7 8 # Summary 9 This is a proposal to define the data persistent layout. 10 11 # Motivation 12 **AOE** (Analytic Optimized Engine) is designed for analytical query workloads. 13 - In practice, a columnar store is well-suited for OLAP-like worloads. 14 - Dynamic index creation and deletion 15 - In order to keep down the cost of queries, avoid too many sort runs 16 - High compression rate 17 18 # Detailed Design 19 ## Data Hierachy 20  21 22 As described [Here](https://github.com/matrixorigin/matrixone/blob/main/docs/rfcs/20211210_aoe_overall_design.md#data-storage). Each table data is a three-level LSM tree. For example, the maximum number of blocks in a segment is 4. Segment [0,1,2] are already merge sorted, respectively corresponding to a sort run in **L2**. Segment [3] is not merge sorted, but Block [12,13] have been sorted, respectively corresponding to a sort run in **L1**. Transient block 14 has reached the maximum row count of a block and is flowing to **L1**. Transient block 15 is the latest appendable block. 23  24 25 ## File Format 26  27 - Zones 28 1) Header 29 2) Footer 30 3) MetaInfo 31 4) Columns 32 5) Indexes 33 - Indexes that specified in **CREATE TABLE** statement will be embedded in the segment file. Otherwise, there is a dedicate index file for the specified index. 34 - Zonemap index is automatically created for all columns and embedded in the segment file. 35 - In a segment or block file, the data of each column is stored in a continuous space. 36 - Column block 37 1) Fixed-length type column block format (Uncompressed) 38  39 2) Variable-length type column block format (Uncompressed) 40  41 42 ## Compression 43 The compression unit is always a column block. The compression algo, compressed and uncompressed size are all serialized into **MetaInfo** zone. 44 45 # Future Work 46 1. Per-column compress codec 47 2. Data deletion