github.com/jingruilea/kubeedge@v1.2.0-beta.0.0.20200410162146-4bb8902b3879/docs/proposals/reliable-message-delivery.md (about)

     1  ---
     2  title: Reliable message delivery
     3  authors:
     4    - "@fisherxu"
     5    - "@kevin-wangzefeng"
     6    - "@rohitsardesai83"
     7  approvers:
     8    - "@kevin-wangzefeng"
     9    - "@sids-b"
    10    - "@kadisi"
    11  creation-date: 2019-12-23
    12  last-updated: 2019-12-23
    13  status: Implememted
    14  ---
    15  
    16  # Reliable message delivery
    17  
    18  ## Motivation
    19  
    20  At present, the message delivery mechanism with ACK is not completed. Unstable networks 
    21  between cloud and edge can result in frequent disconnection of edge nodes.
    22  If cloudcore or edgecore being restarted or offline for a while, and this can result in 
    23  loss of messages sent to edge nodes which can’t be temporarily reached. Without new event successfully 
    24  delivered to the edge, this will cause inconsistency between cloud and edge. 
    25  This proposal addresses this problem thus improve the reliable message delivery.
    26  
    27  ### Goals
    28  
    29  - Improve the reliable message delivery mechanism between cloud and edge.
    30  
    31  ### Non-goals
    32  - To provide HA / failover mechanism for cloudhub.
    33  - To address secure communication.
    34  - To address encryption of data stored.
    35  
    36  ## Proposal
    37  
    38  Currently all the messages from the controllers go via the channel queue (which uses beehive context for messaging) 
    39  to the cloudhub. The cloudhub then uses the configured protocol server (websocket/quic) to send the data to edge nodes. 
    40  The proposal is to introduce the node level sending message queues in cloudhub, and use the ACK message 
    41  returned from edge nodes to ensure the message delivery is in a reliable fashion.
    42  
    43  ### Use Cases
    44  
    45  - If cloudcore being restarted or offline for a while, whenever the cloudcore is back online, 
    46  send the latest event to the edge node (if there is any update to be sent).
    47  - If edgenode being restarted or offline for a while, whenever the node is back online, 
    48  cloudcore will sent the latest event to make it up to date.
    49  
    50  ## Design Details
    51  
    52  ### Message Delivery Mechanisms
    53  
    54  There are three types of message delivery mechanisms:
    55  
    56  - At-Most-Once
    57  - Exactly-Once
    58  - At-Least-Once
    59  
    60  The existing implementation (without this proposal) in KubeEdge is 
    61  the first approach “At-Most-Once”, which is unreliable.
    62  
    63  The second approach “Exactly-Once” is very expensive and exhibits worst performance 
    64  although it provides guaranteed delivery with no message loss or duplication. 
    65  Since KubeEdge follows Kubernetes’ eventual consistency design principles, 
    66  it is not a problem for the edge to receive the same message repeatedly, as long as message is the latest one.
    67  
    68  In this proposal, “At-Least-Once” is the proposed mechanism.
    69  
    70  ### At-Least-Once Delivery
    71  
    72  Shown below is a design using MessageQueue and ACKs to ensure that 
    73  the messages are delivered from the cloud to the edge.
    74  
    75  <img src="../images/reliable-message-delivery/reliablemessage-workflow.PNG">
    76  
    77  - We use K8s CRD stores the latest resourceVersion of resource that has been sent
    78   successfully to edge. When cloudcore restarts or starts normally, 
    79   it will check the resourceVersion to avoid sending old messages.
    80   
    81  - EdgeController and devicecontroller send the messages to the Cloudhub, and MessageDispatcher will send messages 
    82  to corresponding NodeMessageQueue according to the node name in message.
    83  
    84  - CloudHub will sequentially send data from the NodeMessageQueue to the corresponding edge node,
    85   and will also store the message ID in an ACK channel. When the ACK message from the edge node received,
    86   ACK channel will trigger to save the message resourceVersion to K8s as CRD, and send the next message.
    87   
    88  - When the edgecore receives the message, it will first save the message to the local datastore and 
    89  then return an ACK message to the cloud.
    90  
    91  - If cloudhub does not receive an ACK message within the interval, it will keep resending the message 5 times. 
    92  If all 5 retries fail, cloudhub will discard the event. SyncController will handling these failed events.
    93  
    94  - Even if the edge node receives the message, the returned ACK message may lost during transmission.
    95   In this case, cloudhub will send the message again and the edge can handle the duplicate message.
    96  
    97  ### SyncController
    98  
    99  SyncController will periodically compare the saved objects resourceVersion with the objects in K8s, 
   100  and then trigger the events such as retry and deletion.
   101  
   102  When cloudhub add events to nodeMessageQueue, it will be compared with the corresponding object in nodeMessageQueue.
   103  If the object in nodeMessageQueue is newer, it will directly discard these events.
   104  
   105  <img src="../images/reliable-message-delivery/sync-controller.PNG">
   106  
   107  ### Message Queue
   108  
   109  When each edge node successfully connects to the cloud, a message queue will be created, 
   110  which will cache all the messages sent to the edge node.
   111  
   112  We use the [workQueue](https://github.com/kubernetes/client-go/blob/master/util/workqueue/rate_limiting_queue.go) and
   113   [cacheStore](https://github.com/kubernetes/client-go/blob/master/tools/cache/store.go) from [kubernetes/client-go](https://github.com/kubernetes/client-go) 
   114  to implement the message queue and object storage. With Kubernetes workQueue, 
   115  duplicate events will be merged to improve the transmission efficiency.
   116  
   117  - Add message to the queue:
   118  
   119  ```go
   120  key,_:=getMsgKey(&message)
   121  nodeStore.Add(message)
   122  nodeQueue.Add(message)
   123  ````
   124  
   125  - Get the message from the queue:
   126  
   127  ```go
   128  key,_:=nodeQueue.Get()
   129  msg,_,_:=nodeStore.GetByKey(key.(string))
   130  ```
   131  
   132  - Structure of the message key:
   133  
   134  ```go
   135  Key = resourceType/resourceNamespace/resourceName
   136  ```
   137  
   138  ### ACK message Format
   139  
   140  We will construct the following ACK message format:
   141  
   142  ```go
   143  AckMessage.ParentID = receivedMessage.ID
   144  AckMessage.Operation = "response"
   145  ```
   146  
   147  ### ReliableSync CRD
   148  
   149  We use K8s CRD to save the resourceVersion of objects that have been successfully persisted to the edge.
   150  
   151  We designed two types of CRD to save the resourceVersion. ClusterObjectSync is used to save the cluster
   152  scoped object and ObjectSync is used to save the namesapce scoped object. 
   153  Their names consist of the related node name and object UUID.
   154  
   155  #### The ClusterObjectSync
   156  
   157  ```go
   158  type ClusterObjectSync struct {
   159  	metav1.TypeMeta   `json:",inline"`
   160  	metav1.ObjectMeta `json:"metadata,omitempty"`
   161  
   162  	Spec   ClusterObjectSyncSpec   `json:"spec,omitempty"`
   163  	Status ClusterObjectSyncStatus `json:"spec,omitempty"`
   164  }
   165  
   166  // ClusterObjectSyncSpec stores the details of objects that sent to the edge.
   167  type ClusterObjectSyncSpec struct {
   168      // Required: ObjectGroupVerion is the group and version of the object
   169      // that was successfully sent to the edge node.
   170      ObjectGroupVerion string `json:"objectGroupVerion,omitempty"`
   171  	// Required: ObjectKind is the type of the object
   172  	// that was successfully sent to the edge node.
   173  	ObjectKind string `json:"objectKind,omitempty"`
   174  	// Required: ObjectName is the name of the object
   175  	// that was successfully sent to the edge node.
   176  	ObjectName string `json:"objectName,omitempty"`
   177  }
   178  
   179  // ClusterObjectSyncSpec stores the resourceversion of objects that sent to the edge.
   180  type ClusterObjectSyncStatus struct {
   181  	// Required: ObjectResourceVersion is the resourceversion of the object
   182  	// that was successfully sent to the edge node.
   183  	ObjectResourceVersion string `json:"objectResourceVersion,omitempty"`
   184  }
   185  ```
   186  
   187  #### The ObjectSync
   188  
   189  ```go
   190  type ClusterObjectSync struct {
   191  	metav1.TypeMeta   `json:",inline"`
   192  	metav1.ObjectMeta `json:"metadata,omitempty"`
   193  
   194  	Spec   ObjectSyncSpec   `json:"spec,omitempty"`
   195  	Status ObjectSyncStatus `json:"spec,omitempty"`
   196  }
   197  
   198  // ObjectSyncSpec stores the details of objects that sent to the edge.
   199  type ObjectSyncSpec struct {
   200      // Required: ObjectGroupVerion is the group and version of the object 
   201      // that was successfully sent to the edge node. 
   202      ObjectGroupVerion string `json:"objectGroupVerion,omitempty"`
   203  	// Required: ObjectKind is the type of the object
   204  	// that was successfully sent to the edge node.
   205  	ObjectKind string `json:"objectKind,omitempty"`
   206  	// Required: ObjectName is the name of the object
   207  	// that was successfully sent to the edge node.
   208  	ObjectName string `json:"objectName,omitempty"`
   209  }
   210  
   211  // ClusterObjectSyncSpec stores the resourceversion of objects that sent to the edge.
   212  type ObjectSyncStatus struct {
   213  	// Required: ObjectResourceVersion is the resourceversion of the object
   214  	// that was successfully sent to the edge node.
   215  	ObjectResourceVersion string `json:"objectResourceVersion,omitempty"`
   216  }
   217  ```
   218  
   219  ## Exception scenarios/Corner cases handling
   220  
   221  ### CloudCore restart
   222  
   223  - When cloudcore restarts or starts normally, it will check the resourceVersion to avoid sending old messages.
   224  
   225  - During cloudcore restart, if some objects are deleted, the delete event may lost at this time. 
   226  The SyncController will handle this situation. The object GC mechanism is needed here to ensure the deletion: 
   227  compare whether the objects stored in CRD exist in K8s. If not, then SyncController will generate & send a delete event
   228  to the edge and delete the object in CRD when ACK received.
   229  
   230  ### EdgeCore restart
   231  
   232  - When edgecore restarts or offline for a while, the node message queue will cache all the messages, 
   233  whenever the node is back online, the messages will be sent.
   234  
   235  - When the edge node is offline, cloudhub will stop sending messages and not retry until 
   236  the edge node is back online.
   237  
   238  ### EdgeNode deleted
   239  
   240  - When an edgenode is deleted from cloud, cloudcore will remove the corresponding message queue and store.
   241  
   242  ## Performance
   243  
   244  We need to run performance tests after introducing the reliability feature and publish the difference 
   245  in the results. Reliability is associated with a cost which a user needs to bear.
   246  
   247  The following are the optimizations already considered.
   248  
   249  ### Message queuing and merging duplicated ones
   250  
   251  As we propose to use Kubernetes workQueue to implement NodeMessageQueue: only message key will be queued.
   252  The message data is fetched only when it’s ready to be sent.
   253  
   254  When a message is already queued (with its index), follow-up same message (updates on a same k8s object, e.g. pod) 
   255  will only refresh the message body in cache. Thus, when cloudcore proceed the sending, the latest message data is 
   256  sent (no duplicated sending operations on a same message).
   257  
   258  ### Lazy creation of NodeMessageQueues
   259  
   260  The NodeMessageQueue will only be created when an edge node is first connected to cloudcore to save memory.
   261  
   262  ### Stop sending and retries when node disconnected
   263  
   264  When an edge node is offline, cloudcore will stop meaningless sending and retires, 
   265  cache the message and wait for resume when the node is back.
   266  
   267  In long term, we may release NodeMessageQueues that have been holding for a period
   268  of time (edge node kept offline long time)
   269  
   270  ## Implementation plan
   271  
   272  - Alpha: v1.2
   273  - Beta: TBD
   274  - GA: TBD
   275  
   276  Checkout the tracking issue for latest implementation details.