github.com/igggame/nebulas-go@v2.1.0+incompatible/nbre/common/ipc/README.md (about)

     1  # What's shared memory for?
     2  Shared memory is for interprocess communication, like exchanging data or
     3  commands. Specially, shared memory is for multiple processes on the same
     4  machine.
     5  
     6  In Nebulas, we follow the thumb role in system design, *modularity*, which
     7  means we separate difference functionalities into different processes. Thus, we
     8  shall have multiple processes even on one node (physical machine). Given this,
     9  interprocess communication is quite critical to Nebulas mainnet.
    10  
    11  # Why shared memory?
    12  As we stated, interprocess communication (IPC) is critical to Nebulas mainnet.
    13  However, there are different ways for IPC, and some of the most popular way is
    14  to use networking, like RPC, or gRPC from Google. Then why shared memory?
    15  
    16  The answer is *performance*. Most networking libraries, like RPC and gRPC are
    17  designed for cross-machine communication, which means they need to meet design
    18  goals that totally different from locale-machine IPC. For example, networking needs to
    19  do data serialization and deserialization, consider
    20    different endians (big-endian vs. little endian), etc. What's worse, they
    21    may need to conform standard protocols (like HTTP), and common security
    22    standards (like SSL). Even we can ignore some limitations given
    23    locale-machine, we still need copy data to the networking buffer on the sender
    24    side, and copy data from the networking buffer on the client side. This could
    25    cause performance penalty.
    26  
    27  To the best of our knowledge, shared memory is the best IPC practice when
    28  pursuing performance.
    29  
    30  # The design space of Nebulas shared memory communication
    31  The two main goals of designing this shared memory communication mechanism are
    32  performance and stability. To achieve these goals, we simplify the model of
    33  shared memory communication.
    34  
    35  In our communication model, we have one *server* and one *client*. The server
    36  side is responsible to initiate/close the enviornments. Both server and client
    37  can write and read data from the shared memory.
    38  
    39  Performance is relatively easy to archieve. However, there are concerns since
    40  we are using Boost, instead of low-level POSIX APIs. We may need revise this in
    41  the future.
    42  
    43  For stability, there are tradeoffs to make. The key is how we handle failures
    44  of interacting processes. For example, what should we do when the client/server
    45  crash? Notice that it could be different scenario for the client crash and the
    46  server crash. For client crash, the server could restart the client without
    47  initiating the handlers. Yet, for the server crash, should we restart the
    48  server directly, or should we restart both the client and the server? Here we
    49  choose restarting both the client and the server. There are two reasons. First,
    50  the server may fail to initiate the handlers since the client still hold them.
    51  Second, the server crash means the whole functionalities stop working, and it's
    52  meaningless to keep the client running.
    53  
    54  Thus, it is important to be aware of when the other side is crashed or not.
    55  An intuitive way is to involve heart beating in typical distributed
    56  systems. We use the same idea here, yet with different implementation.
    57  
    58  Consequently, we have two design choices. First, the server cannot start when there is already
    59  another server instance or any client instance. Second, both the server and the client may raise exceptions when the other side crash (heart beat timeout). And the server may choose restart the client and the client may choose crash immediately.
    60  
    61  # Future work
    62  We need a comprehensive performance evaluation compared to networking and
    63  native implementation.