github.com/voedger/voedger@v0.0.0-20240520144910-273e84102129/design/0-inv/20230304-about-EE-design/README.md (about)

     1  # Some ideas about EE Design
     2  
     3  Motivation
     4  - heeus.io: [About Heeus EE design](https://dev.heeus.io/launchpad/#!26633)
     5    - Deliver PaaS in a short time (so that potential customers can try the product)
     6    - Allow potential customers to upload application with fixed and small number of Partitions/Query Processors
     7  
     8  TOC
     9  - [Requests kinds](#requests-kinds)
    10  - [Node kinds](#node-kinds)
    11  - [VVVMNode](#vappnode)
    12  - [Scheduling](#scheduling)
    13      - [Scheduling: Init cluster](#scheduling-init-cluster)
    14      - [Scheduling: Deploy app](#scheduling-deploy-app)
    15      - [Scheduling: Scale cluster](#scheduling-scale-cluster)
    16  
    17  ## Requests kinds
    18  
    19  1. Commands
    20  2. Queries
    21  3. BLOBs-related requests
    22  
    23  This article is focused on Commands and Queries.
    24  
    25  ## Node kinds
    26  
    27  ```mermaid
    28  flowchart TD
    29  
    30      %% Entities ====================
    31  
    32      Routing:::G
    33      subgraph Routing[Routing Layer]
    34          RouterNode1{{RouterNode1}}:::H
    35          RouterNode2{{RouterNode2}}:::H
    36      end
    37  
    38      Applications:::G
    39      subgraph Applications[Applications Layer]
    40          VVMNode1{{VVMNode1}}:::H
    41          VVMNode2{{VVMNode2}}:::H
    42          VVMNode3{{VVMNode3}}:::H
    43          VVMNode4{{VVMNode4}}:::H
    44          VVMNode5{{VVMNode5}}:::H
    45      end
    46  
    47      Database:::G
    48      subgraph Database[Database Layer]
    49          DBNode1{{DBNode1}}:::H
    50          DBNode2{{DBNode2}}:::H
    51          DBNode3{{DBNode3}}:::H
    52      end
    53  
    54      %% Relations ====================
    55  
    56      Routing -.- Applications
    57      Applications -.- Database
    58  
    59      classDef B fill:#FFFFB5
    60      classDef S fill:#B5FFFF
    61      classDef H fill:#C9E7B7
    62      classDef G fill:#FFFFFF,stroke:#000000, stroke-width:1px, stroke-dasharray: 5 5
    63  ```
    64  
    65  - RouterNode: Узлы маршрутизации
    66  - VVMNode: Узлы виртуальных машин
    67  - DBNode: Узлы базы данных
    68  
    69  **Naive design**
    70  - Use swarm
    71  - For `N` VVMNodes create `N` VVMNodeService swarm services
    72  - If VVMNodeN goes down then swarm runs VVMNodeServiceN on another node
    73  - Problem: VVMNode can have 100% resources (CPU/RAM) overload
    74  
    75  
    76  ## VVM
    77  
    78  ```mermaid
    79  flowchart TD
    80  
    81      %% Entities ====================
    82  
    83      VVMNode{{VVMNode}}:::H
    84  
    85      VVMNode --x|up to 8| VVM["VVM"]:::S
    86  
    87      classDef B fill:#FFFFB5
    88      classDef S fill:#B5FFFF
    89      classDef H fill:#C9E7B7
    90      classDef G fill:#FFFFFF,stroke:#000000, stroke-width:1px, stroke-dasharray: 5 5
    91  ```
    92  
    93  
    94  Why 8 VVMs per VVMNode?
    95  
    96  [Cassandra Virtual Nodes](https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archDataDistributeDistribute.html):
    97  
    98  > - Prior to Cassandra 1.2, you had to calculate and assign a single token to each node in a cluster
    99  > - Each token determined the node's position in the ring and its portion of data according to its hash value.
   100  > - In Cassandra 1.2 and later, each node is allowed many tokens
   101  > - The new paradigm is called **virtual nodes (vnodes)**
   102  > - Vnodes allow each node to own a large number of small partition ranges distributed throughout the cluster
   103  > - Vnodes also use consistent hashing to distribute data but using them doesn't require token generation and assignment
   104  >
   105  > Note: DataStax recommends using 8 vnodes (tokens). Using 8 vnodes distributes the workload between systems with a ~10% variance and has minimal impact on performance.
   106  
   107  > It is no longer recommended to assign 256 as it was for a number of years since large values impact the performance of nodes and operations such as repairs ([link](https://community.datastax.com/questions/4966/what-is-the-maximum-vnodes-per-node.html)).
   108  
   109  Ideas
   110  - Cassandra VNode => VVM
   111  - Each VVM is a swarm service
   112  - If VVMNode goes down swarm runs VVM-s on another VVMNodes
   113  - If cluster has 9 VVMNodes then to handle one-node-crash each node should have reserved ~ 1/8 of CPU/Memory resources, 5 VVMNodes => 1/4 resources, 3 VVMNodes => 1/2 resources
   114  
   115  ## Scheduling
   116  
   117  ```mermaid
   118  flowchart TD
   119  
   120      %% Entities ====================
   121  
   122      Cluster{{Cluster}}:::H
   123  
   124      VVMNode{{VVMNode}}:::H
   125  
   126      App[App]:::S
   127      AppPartition[AppPartition]:::S
   128      VVM["VVM"]:::S
   129  
   130      %% Relations ====================
   131  
   132      Cluster --x |consists of few| VVMNode
   133      Cluster -.-x |has few deployed| App
   134  
   135      VVMNode -.-x|runs up to 8| VVM
   136  
   137      App --x AppPartition
   138  
   139      VVM -.-x |processes few| AppPartition
   140  
   141  
   142  
   143  
   144      classDef B fill:#FFFFB5
   145      classDef S fill:#B5FFFF
   146      classDef H fill:#C9E7B7
   147      classDef G fill:#FFFFFF,stroke:#000000, stroke-width:1px, stroke-dasharray: 5 5
   148  ```
   149  
   150  ### Assumptions
   151  
   152  - Definition: `CommandProcessor` throughput is `T` kbit/s
   153  - `T` is specific for every cluster
   154  - `T` depends, among other things, on the Database layer latency
   155  - For a stretched cluster it will be higher than for a non-stretched one
   156  - AppPartition_ComputingPower = k0 * T * ( k1 * NumQueryProcessors + k2 * NumProjectors )
   157  
   158  Problems
   159  - BLOBs
   160  - Lazy Projectors
   161  
   162  ### Init cluster
   163  
   164  - Create 8 swarm services, each service runs `voedger.VVM` image
   165    - Each service has 1 replica
   166    - Each service has assigned VVMID
   167    - Each service has equal CPU/Memory resources
   168  
   169  ### Deploy app
   170  
   171  **Developer**:
   172  
   173  - Calculate number of AppPartitions
   174    - One of the factors: Command Processor throughput is `T` kbit/s
   175  - Design AppPartition
   176    - NumQueryProcessors: fixed yet, say, 10
   177    - CacheSize, MB (can also be fixed, say, 10 MB)
   178  - Create a Deployment Descriptor with the calculated number of AppPartitions
   179  - Upload the Deployment Descriptor to the Cluster
   180  
   181  **Cluster Scheduler**:
   182  
   183  - Assign each AppPartition to a randomly selected VVVNodeID which satisfies AppPartition requirements
   184  
   185  
   186  
   187  ### Scheduling: Scale cluster
   188  
   189  - If there are too many AppPartitions per VVM then increase number of VVMs
   190  - If there are too many VVMs per VVMNode then increase number of VVMNodes
   191  
   192  
   193  ## Problems
   194  
   195  - Cassadra Wide Partition problem
   196    - Solution: FoundationDB ???
   197    - FoundationDB problem: space used by App?
   198    - FoundationDB problem: removing application can take a lot (for Cassandra we just removes keyspace)