github.com/jingruilea/kubeedge@v1.2.0-beta.0.0.20200410162146-4bb8902b3879/docs/proposals/reliable-message-delivery.md (about) 1 --- 2 title: Reliable message delivery 3 authors: 4 - "@fisherxu" 5 - "@kevin-wangzefeng" 6 - "@rohitsardesai83" 7 approvers: 8 - "@kevin-wangzefeng" 9 - "@sids-b" 10 - "@kadisi" 11 creation-date: 2019-12-23 12 last-updated: 2019-12-23 13 status: Implememted 14 --- 15 16 # Reliable message delivery 17 18 ## Motivation 19 20 At present, the message delivery mechanism with ACK is not completed. Unstable networks 21 between cloud and edge can result in frequent disconnection of edge nodes. 22 If cloudcore or edgecore being restarted or offline for a while, and this can result in 23 loss of messages sent to edge nodes which can’t be temporarily reached. Without new event successfully 24 delivered to the edge, this will cause inconsistency between cloud and edge. 25 This proposal addresses this problem thus improve the reliable message delivery. 26 27 ### Goals 28 29 - Improve the reliable message delivery mechanism between cloud and edge. 30 31 ### Non-goals 32 - To provide HA / failover mechanism for cloudhub. 33 - To address secure communication. 34 - To address encryption of data stored. 35 36 ## Proposal 37 38 Currently all the messages from the controllers go via the channel queue (which uses beehive context for messaging) 39 to the cloudhub. The cloudhub then uses the configured protocol server (websocket/quic) to send the data to edge nodes. 40 The proposal is to introduce the node level sending message queues in cloudhub, and use the ACK message 41 returned from edge nodes to ensure the message delivery is in a reliable fashion. 42 43 ### Use Cases 44 45 - If cloudcore being restarted or offline for a while, whenever the cloudcore is back online, 46 send the latest event to the edge node (if there is any update to be sent). 47 - If edgenode being restarted or offline for a while, whenever the node is back online, 48 cloudcore will sent the latest event to make it up to date. 49 50 ## Design Details 51 52 ### Message Delivery Mechanisms 53 54 There are three types of message delivery mechanisms: 55 56 - At-Most-Once 57 - Exactly-Once 58 - At-Least-Once 59 60 The existing implementation (without this proposal) in KubeEdge is 61 the first approach “At-Most-Once”, which is unreliable. 62 63 The second approach “Exactly-Once” is very expensive and exhibits worst performance 64 although it provides guaranteed delivery with no message loss or duplication. 65 Since KubeEdge follows Kubernetes’ eventual consistency design principles, 66 it is not a problem for the edge to receive the same message repeatedly, as long as message is the latest one. 67 68 In this proposal, “At-Least-Once” is the proposed mechanism. 69 70 ### At-Least-Once Delivery 71 72 Shown below is a design using MessageQueue and ACKs to ensure that 73 the messages are delivered from the cloud to the edge. 74 75 <img src="../images/reliable-message-delivery/reliablemessage-workflow.PNG"> 76 77 - We use K8s CRD stores the latest resourceVersion of resource that has been sent 78 successfully to edge. When cloudcore restarts or starts normally, 79 it will check the resourceVersion to avoid sending old messages. 80 81 - EdgeController and devicecontroller send the messages to the Cloudhub, and MessageDispatcher will send messages 82 to corresponding NodeMessageQueue according to the node name in message. 83 84 - CloudHub will sequentially send data from the NodeMessageQueue to the corresponding edge node, 85 and will also store the message ID in an ACK channel. When the ACK message from the edge node received, 86 ACK channel will trigger to save the message resourceVersion to K8s as CRD, and send the next message. 87 88 - When the edgecore receives the message, it will first save the message to the local datastore and 89 then return an ACK message to the cloud. 90 91 - If cloudhub does not receive an ACK message within the interval, it will keep resending the message 5 times. 92 If all 5 retries fail, cloudhub will discard the event. SyncController will handling these failed events. 93 94 - Even if the edge node receives the message, the returned ACK message may lost during transmission. 95 In this case, cloudhub will send the message again and the edge can handle the duplicate message. 96 97 ### SyncController 98 99 SyncController will periodically compare the saved objects resourceVersion with the objects in K8s, 100 and then trigger the events such as retry and deletion. 101 102 When cloudhub add events to nodeMessageQueue, it will be compared with the corresponding object in nodeMessageQueue. 103 If the object in nodeMessageQueue is newer, it will directly discard these events. 104 105 <img src="../images/reliable-message-delivery/sync-controller.PNG"> 106 107 ### Message Queue 108 109 When each edge node successfully connects to the cloud, a message queue will be created, 110 which will cache all the messages sent to the edge node. 111 112 We use the [workQueue](https://github.com/kubernetes/client-go/blob/master/util/workqueue/rate_limiting_queue.go) and 113 [cacheStore](https://github.com/kubernetes/client-go/blob/master/tools/cache/store.go) from [kubernetes/client-go](https://github.com/kubernetes/client-go) 114 to implement the message queue and object storage. With Kubernetes workQueue, 115 duplicate events will be merged to improve the transmission efficiency. 116 117 - Add message to the queue: 118 119 ```go 120 key,_:=getMsgKey(&message) 121 nodeStore.Add(message) 122 nodeQueue.Add(message) 123 ```` 124 125 - Get the message from the queue: 126 127 ```go 128 key,_:=nodeQueue.Get() 129 msg,_,_:=nodeStore.GetByKey(key.(string)) 130 ``` 131 132 - Structure of the message key: 133 134 ```go 135 Key = resourceType/resourceNamespace/resourceName 136 ``` 137 138 ### ACK message Format 139 140 We will construct the following ACK message format: 141 142 ```go 143 AckMessage.ParentID = receivedMessage.ID 144 AckMessage.Operation = "response" 145 ``` 146 147 ### ReliableSync CRD 148 149 We use K8s CRD to save the resourceVersion of objects that have been successfully persisted to the edge. 150 151 We designed two types of CRD to save the resourceVersion. ClusterObjectSync is used to save the cluster 152 scoped object and ObjectSync is used to save the namesapce scoped object. 153 Their names consist of the related node name and object UUID. 154 155 #### The ClusterObjectSync 156 157 ```go 158 type ClusterObjectSync struct { 159 metav1.TypeMeta `json:",inline"` 160 metav1.ObjectMeta `json:"metadata,omitempty"` 161 162 Spec ClusterObjectSyncSpec `json:"spec,omitempty"` 163 Status ClusterObjectSyncStatus `json:"spec,omitempty"` 164 } 165 166 // ClusterObjectSyncSpec stores the details of objects that sent to the edge. 167 type ClusterObjectSyncSpec struct { 168 // Required: ObjectGroupVerion is the group and version of the object 169 // that was successfully sent to the edge node. 170 ObjectGroupVerion string `json:"objectGroupVerion,omitempty"` 171 // Required: ObjectKind is the type of the object 172 // that was successfully sent to the edge node. 173 ObjectKind string `json:"objectKind,omitempty"` 174 // Required: ObjectName is the name of the object 175 // that was successfully sent to the edge node. 176 ObjectName string `json:"objectName,omitempty"` 177 } 178 179 // ClusterObjectSyncSpec stores the resourceversion of objects that sent to the edge. 180 type ClusterObjectSyncStatus struct { 181 // Required: ObjectResourceVersion is the resourceversion of the object 182 // that was successfully sent to the edge node. 183 ObjectResourceVersion string `json:"objectResourceVersion,omitempty"` 184 } 185 ``` 186 187 #### The ObjectSync 188 189 ```go 190 type ClusterObjectSync struct { 191 metav1.TypeMeta `json:",inline"` 192 metav1.ObjectMeta `json:"metadata,omitempty"` 193 194 Spec ObjectSyncSpec `json:"spec,omitempty"` 195 Status ObjectSyncStatus `json:"spec,omitempty"` 196 } 197 198 // ObjectSyncSpec stores the details of objects that sent to the edge. 199 type ObjectSyncSpec struct { 200 // Required: ObjectGroupVerion is the group and version of the object 201 // that was successfully sent to the edge node. 202 ObjectGroupVerion string `json:"objectGroupVerion,omitempty"` 203 // Required: ObjectKind is the type of the object 204 // that was successfully sent to the edge node. 205 ObjectKind string `json:"objectKind,omitempty"` 206 // Required: ObjectName is the name of the object 207 // that was successfully sent to the edge node. 208 ObjectName string `json:"objectName,omitempty"` 209 } 210 211 // ClusterObjectSyncSpec stores the resourceversion of objects that sent to the edge. 212 type ObjectSyncStatus struct { 213 // Required: ObjectResourceVersion is the resourceversion of the object 214 // that was successfully sent to the edge node. 215 ObjectResourceVersion string `json:"objectResourceVersion,omitempty"` 216 } 217 ``` 218 219 ## Exception scenarios/Corner cases handling 220 221 ### CloudCore restart 222 223 - When cloudcore restarts or starts normally, it will check the resourceVersion to avoid sending old messages. 224 225 - During cloudcore restart, if some objects are deleted, the delete event may lost at this time. 226 The SyncController will handle this situation. The object GC mechanism is needed here to ensure the deletion: 227 compare whether the objects stored in CRD exist in K8s. If not, then SyncController will generate & send a delete event 228 to the edge and delete the object in CRD when ACK received. 229 230 ### EdgeCore restart 231 232 - When edgecore restarts or offline for a while, the node message queue will cache all the messages, 233 whenever the node is back online, the messages will be sent. 234 235 - When the edge node is offline, cloudhub will stop sending messages and not retry until 236 the edge node is back online. 237 238 ### EdgeNode deleted 239 240 - When an edgenode is deleted from cloud, cloudcore will remove the corresponding message queue and store. 241 242 ## Performance 243 244 We need to run performance tests after introducing the reliability feature and publish the difference 245 in the results. Reliability is associated with a cost which a user needs to bear. 246 247 The following are the optimizations already considered. 248 249 ### Message queuing and merging duplicated ones 250 251 As we propose to use Kubernetes workQueue to implement NodeMessageQueue: only message key will be queued. 252 The message data is fetched only when it’s ready to be sent. 253 254 When a message is already queued (with its index), follow-up same message (updates on a same k8s object, e.g. pod) 255 will only refresh the message body in cache. Thus, when cloudcore proceed the sending, the latest message data is 256 sent (no duplicated sending operations on a same message). 257 258 ### Lazy creation of NodeMessageQueues 259 260 The NodeMessageQueue will only be created when an edge node is first connected to cloudcore to save memory. 261 262 ### Stop sending and retries when node disconnected 263 264 When an edge node is offline, cloudcore will stop meaningless sending and retires, 265 cache the message and wait for resume when the node is back. 266 267 In long term, we may release NodeMessageQueues that have been holding for a period 268 of time (edge node kept offline long time) 269 270 ## Implementation plan 271 272 - Alpha: v1.2 273 - Beta: TBD 274 - GA: TBD 275 276 Checkout the tracking issue for latest implementation details.