github.com/igggame/nebulas-go@v2.1.0+incompatible/nbre/common/ipc/README.md (about) 1 # What's shared memory for? 2 Shared memory is for interprocess communication, like exchanging data or 3 commands. Specially, shared memory is for multiple processes on the same 4 machine. 5 6 In Nebulas, we follow the thumb role in system design, *modularity*, which 7 means we separate difference functionalities into different processes. Thus, we 8 shall have multiple processes even on one node (physical machine). Given this, 9 interprocess communication is quite critical to Nebulas mainnet. 10 11 # Why shared memory? 12 As we stated, interprocess communication (IPC) is critical to Nebulas mainnet. 13 However, there are different ways for IPC, and some of the most popular way is 14 to use networking, like RPC, or gRPC from Google. Then why shared memory? 15 16 The answer is *performance*. Most networking libraries, like RPC and gRPC are 17 designed for cross-machine communication, which means they need to meet design 18 goals that totally different from locale-machine IPC. For example, networking needs to 19 do data serialization and deserialization, consider 20 different endians (big-endian vs. little endian), etc. What's worse, they 21 may need to conform standard protocols (like HTTP), and common security 22 standards (like SSL). Even we can ignore some limitations given 23 locale-machine, we still need copy data to the networking buffer on the sender 24 side, and copy data from the networking buffer on the client side. This could 25 cause performance penalty. 26 27 To the best of our knowledge, shared memory is the best IPC practice when 28 pursuing performance. 29 30 # The design space of Nebulas shared memory communication 31 The two main goals of designing this shared memory communication mechanism are 32 performance and stability. To achieve these goals, we simplify the model of 33 shared memory communication. 34 35 In our communication model, we have one *server* and one *client*. The server 36 side is responsible to initiate/close the enviornments. Both server and client 37 can write and read data from the shared memory. 38 39 Performance is relatively easy to archieve. However, there are concerns since 40 we are using Boost, instead of low-level POSIX APIs. We may need revise this in 41 the future. 42 43 For stability, there are tradeoffs to make. The key is how we handle failures 44 of interacting processes. For example, what should we do when the client/server 45 crash? Notice that it could be different scenario for the client crash and the 46 server crash. For client crash, the server could restart the client without 47 initiating the handlers. Yet, for the server crash, should we restart the 48 server directly, or should we restart both the client and the server? Here we 49 choose restarting both the client and the server. There are two reasons. First, 50 the server may fail to initiate the handlers since the client still hold them. 51 Second, the server crash means the whole functionalities stop working, and it's 52 meaningless to keep the client running. 53 54 Thus, it is important to be aware of when the other side is crashed or not. 55 An intuitive way is to involve heart beating in typical distributed 56 systems. We use the same idea here, yet with different implementation. 57 58 Consequently, we have two design choices. First, the server cannot start when there is already 59 another server instance or any client instance. Second, both the server and the client may raise exceptions when the other side crash (heart beat timeout). And the server may choose restart the client and the client may choose crash immediately. 60 61 # Future work 62 We need a comprehensive performance evaluation compared to networking and 63 native implementation.