github.com/igggame/nebulas-go@v2.1.0+incompatible/nbre/common/ipc/README.md

github.com/igggame/nebulas-go@v2.1.0+incompatible/nbre/common/ipc/README.md (about)

1 # What's shared memory for?
2 Shared memory is for interprocess communication, like exchanging data or
3 commands. Specially, shared memory is for multiple processes on the same
4 machine.
5
6 In Nebulas, we follow the thumb role in system design, *modularity*, which
7 means we separate difference functionalities into different processes. Thus, we
8 shall have multiple processes even on one node (physical machine). Given this,
9 interprocess communication is quite critical to Nebulas mainnet.
10
11 # Why shared memory?
12 As we stated, interprocess communication (IPC) is critical to Nebulas mainnet.
13 However, there are different ways for IPC, and some of the most popular way is
14 to use networking, like RPC, or gRPC from Google. Then why shared memory?
15
16 The answer is *performance*. Most networking libraries, like RPC and gRPC are
17 designed for cross-machine communication, which means they need to meet design
18 goals that totally different from locale-machine IPC. For example, networking needs to
19 do data serialization and deserialization, consider
20 different endians (big-endian vs. little endian), etc. What's worse, they
21 may need to conform standard protocols (like HTTP), and common security
22 standards (like SSL). Even we can ignore some limitations given
23 locale-machine, we still need copy data to the networking buffer on the sender
24 side, and copy data from the networking buffer on the client side. This could
25 cause performance penalty.
26
27 To the best of our knowledge, shared memory is the best IPC practice when
28 pursuing performance.
29
30 # The design space of Nebulas shared memory communication
31 The two main goals of designing this shared memory communication mechanism are
32 performance and stability. To achieve these goals, we simplify the model of
33 shared memory communication.
34
35 In our communication model, we have one *server* and one *client*. The server
36 side is responsible to initiate/close the enviornments. Both server and client
37 can write and read data from the shared memory.
38
39 Performance is relatively easy to archieve. However, there are concerns since
40 we are using Boost, instead of low-level POSIX APIs. We may need revise this in
41 the future.
42
43 For stability, there are tradeoffs to make. The key is how we handle failures
44 of interacting processes. For example, what should we do when the client/server
45 crash? Notice that it could be different scenario for the client crash and the
46 server crash. For client crash, the server could restart the client without
47 initiating the handlers. Yet, for the server crash, should we restart the
48 server directly, or should we restart both the client and the server? Here we
49 choose restarting both the client and the server. There are two reasons. First,
50 the server may fail to initiate the handlers since the client still hold them.
51 Second, the server crash means the whole functionalities stop working, and it's
52 meaningless to keep the client running.
53
54 Thus, it is important to be aware of when the other side is crashed or not.
55 An intuitive way is to involve heart beating in typical distributed
56 systems. We use the same idea here, yet with different implementation.
57
58 Consequently, we have two design choices. First, the server cannot start when there is already
59 another server instance or any client instance. Second, both the server and the client may raise exceptions when the other side crash (heart beat timeout). And the server may choose restart the client and the client may choose crash immediately.
60
61 # Future work
62 We need a comprehensive performance evaluation compared to networking and
63 native implementation.