github.com/voedger/voedger@v0.0.0-20240520144910-273e84102129/design/0-inv/20230304-about-EE-design/README.md (about) 1 # Some ideas about EE Design 2 3 Motivation 4 - heeus.io: [About Heeus EE design](https://dev.heeus.io/launchpad/#!26633) 5 - Deliver PaaS in a short time (so that potential customers can try the product) 6 - Allow potential customers to upload application with fixed and small number of Partitions/Query Processors 7 8 TOC 9 - [Requests kinds](#requests-kinds) 10 - [Node kinds](#node-kinds) 11 - [VVVMNode](#vappnode) 12 - [Scheduling](#scheduling) 13 - [Scheduling: Init cluster](#scheduling-init-cluster) 14 - [Scheduling: Deploy app](#scheduling-deploy-app) 15 - [Scheduling: Scale cluster](#scheduling-scale-cluster) 16 17 ## Requests kinds 18 19 1. Commands 20 2. Queries 21 3. BLOBs-related requests 22 23 This article is focused on Commands and Queries. 24 25 ## Node kinds 26 27 ```mermaid 28 flowchart TD 29 30 %% Entities ==================== 31 32 Routing:::G 33 subgraph Routing[Routing Layer] 34 RouterNode1{{RouterNode1}}:::H 35 RouterNode2{{RouterNode2}}:::H 36 end 37 38 Applications:::G 39 subgraph Applications[Applications Layer] 40 VVMNode1{{VVMNode1}}:::H 41 VVMNode2{{VVMNode2}}:::H 42 VVMNode3{{VVMNode3}}:::H 43 VVMNode4{{VVMNode4}}:::H 44 VVMNode5{{VVMNode5}}:::H 45 end 46 47 Database:::G 48 subgraph Database[Database Layer] 49 DBNode1{{DBNode1}}:::H 50 DBNode2{{DBNode2}}:::H 51 DBNode3{{DBNode3}}:::H 52 end 53 54 %% Relations ==================== 55 56 Routing -.- Applications 57 Applications -.- Database 58 59 classDef B fill:#FFFFB5 60 classDef S fill:#B5FFFF 61 classDef H fill:#C9E7B7 62 classDef G fill:#FFFFFF,stroke:#000000, stroke-width:1px, stroke-dasharray: 5 5 63 ``` 64 65 - RouterNode: Узлы маршрутизации 66 - VVMNode: Узлы виртуальных машин 67 - DBNode: Узлы базы данных 68 69 **Naive design** 70 - Use swarm 71 - For `N` VVMNodes create `N` VVMNodeService swarm services 72 - If VVMNodeN goes down then swarm runs VVMNodeServiceN on another node 73 - Problem: VVMNode can have 100% resources (CPU/RAM) overload 74 75 76 ## VVM 77 78 ```mermaid 79 flowchart TD 80 81 %% Entities ==================== 82 83 VVMNode{{VVMNode}}:::H 84 85 VVMNode --x|up to 8| VVM["VVM"]:::S 86 87 classDef B fill:#FFFFB5 88 classDef S fill:#B5FFFF 89 classDef H fill:#C9E7B7 90 classDef G fill:#FFFFFF,stroke:#000000, stroke-width:1px, stroke-dasharray: 5 5 91 ``` 92 93 94 Why 8 VVMs per VVMNode? 95 96 [Cassandra Virtual Nodes](https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archDataDistributeDistribute.html): 97 98 > - Prior to Cassandra 1.2, you had to calculate and assign a single token to each node in a cluster 99 > - Each token determined the node's position in the ring and its portion of data according to its hash value. 100 > - In Cassandra 1.2 and later, each node is allowed many tokens 101 > - The new paradigm is called **virtual nodes (vnodes)** 102 > - Vnodes allow each node to own a large number of small partition ranges distributed throughout the cluster 103 > - Vnodes also use consistent hashing to distribute data but using them doesn't require token generation and assignment 104 > 105 > Note: DataStax recommends using 8 vnodes (tokens). Using 8 vnodes distributes the workload between systems with a ~10% variance and has minimal impact on performance. 106 107 > It is no longer recommended to assign 256 as it was for a number of years since large values impact the performance of nodes and operations such as repairs ([link](https://community.datastax.com/questions/4966/what-is-the-maximum-vnodes-per-node.html)). 108 109 Ideas 110 - Cassandra VNode => VVM 111 - Each VVM is a swarm service 112 - If VVMNode goes down swarm runs VVM-s on another VVMNodes 113 - If cluster has 9 VVMNodes then to handle one-node-crash each node should have reserved ~ 1/8 of CPU/Memory resources, 5 VVMNodes => 1/4 resources, 3 VVMNodes => 1/2 resources 114 115 ## Scheduling 116 117 ```mermaid 118 flowchart TD 119 120 %% Entities ==================== 121 122 Cluster{{Cluster}}:::H 123 124 VVMNode{{VVMNode}}:::H 125 126 App[App]:::S 127 AppPartition[AppPartition]:::S 128 VVM["VVM"]:::S 129 130 %% Relations ==================== 131 132 Cluster --x |consists of few| VVMNode 133 Cluster -.-x |has few deployed| App 134 135 VVMNode -.-x|runs up to 8| VVM 136 137 App --x AppPartition 138 139 VVM -.-x |processes few| AppPartition 140 141 142 143 144 classDef B fill:#FFFFB5 145 classDef S fill:#B5FFFF 146 classDef H fill:#C9E7B7 147 classDef G fill:#FFFFFF,stroke:#000000, stroke-width:1px, stroke-dasharray: 5 5 148 ``` 149 150 ### Assumptions 151 152 - Definition: `CommandProcessor` throughput is `T` kbit/s 153 - `T` is specific for every cluster 154 - `T` depends, among other things, on the Database layer latency 155 - For a stretched cluster it will be higher than for a non-stretched one 156 - AppPartition_ComputingPower = k0 * T * ( k1 * NumQueryProcessors + k2 * NumProjectors ) 157 158 Problems 159 - BLOBs 160 - Lazy Projectors 161 162 ### Init cluster 163 164 - Create 8 swarm services, each service runs `voedger.VVM` image 165 - Each service has 1 replica 166 - Each service has assigned VVMID 167 - Each service has equal CPU/Memory resources 168 169 ### Deploy app 170 171 **Developer**: 172 173 - Calculate number of AppPartitions 174 - One of the factors: Command Processor throughput is `T` kbit/s 175 - Design AppPartition 176 - NumQueryProcessors: fixed yet, say, 10 177 - CacheSize, MB (can also be fixed, say, 10 MB) 178 - Create a Deployment Descriptor with the calculated number of AppPartitions 179 - Upload the Deployment Descriptor to the Cluster 180 181 **Cluster Scheduler**: 182 183 - Assign each AppPartition to a randomly selected VVVNodeID which satisfies AppPartition requirements 184 185 186 187 ### Scheduling: Scale cluster 188 189 - If there are too many AppPartitions per VVM then increase number of VVMs 190 - If there are too many VVMs per VVMNode then increase number of VVMNodes 191 192 193 ## Problems 194 195 - Cassadra Wide Partition problem 196 - Solution: FoundationDB ??? 197 - FoundationDB problem: space used by App? 198 - FoundationDB problem: removing application can take a lot (for Cassandra we just removes keyspace)