github.com/jingruilea/kubeedge@v1.2.0-beta.0.0.20200410162146-4bb8902b3879/docs/proposals/reliable-message-delivery.md

github.com/jingruilea/kubeedge@v1.2.0-beta.0.0.20200410162146-4bb8902b3879/docs/proposals/reliable-message-delivery.md (about)

1 ---
2 title: Reliable message delivery
3 authors:
4 - "@fisherxu"
5 - "@kevin-wangzefeng"
6 - "@rohitsardesai83"
7 approvers:
8 - "@kevin-wangzefeng"
9 - "@sids-b"
10 - "@kadisi"
11 creation-date: 2019-12-23
12 last-updated: 2019-12-23
13 status: Implememted
14 ---
15
16 # Reliable message delivery
17
18 ## Motivation
19
20 At present, the message delivery mechanism with ACK is not completed. Unstable networks
21 between cloud and edge can result in frequent disconnection of edge nodes.
22 If cloudcore or edgecore being restarted or offline for a while, and this can result in
23 loss of messages sent to edge nodes which can’t be temporarily reached. Without new event successfully
24 delivered to the edge, this will cause inconsistency between cloud and edge.
25 This proposal addresses this problem thus improve the reliable message delivery.
26
27 ### Goals
28
29 - Improve the reliable message delivery mechanism between cloud and edge.
30
31 ### Non-goals
32 - To provide HA / failover mechanism for cloudhub.
33 - To address secure communication.
34 - To address encryption of data stored.
35
36 ## Proposal
37
38 Currently all the messages from the controllers go via the channel queue (which uses beehive context for messaging)
39 to the cloudhub. The cloudhub then uses the configured protocol server (websocket/quic) to send the data to edge nodes.
40 The proposal is to introduce the node level sending message queues in cloudhub, and use the ACK message
41 returned from edge nodes to ensure the message delivery is in a reliable fashion.
42
43 ### Use Cases
44
45 - If cloudcore being restarted or offline for a while, whenever the cloudcore is back online,
46 send the latest event to the edge node (if there is any update to be sent).
47 - If edgenode being restarted or offline for a while, whenever the node is back online,
48 cloudcore will sent the latest event to make it up to date.
49
50 ## Design Details
51
52 ### Message Delivery Mechanisms
53
54 There are three types of message delivery mechanisms:
55
56 - At-Most-Once
57 - Exactly-Once
58 - At-Least-Once
59
60 The existing implementation (without this proposal) in KubeEdge is
61 the first approach “At-Most-Once”, which is unreliable.
62
63 The second approach “Exactly-Once” is very expensive and exhibits worst performance
64 although it provides guaranteed delivery with no message loss or duplication.
65 Since KubeEdge follows Kubernetes’ eventual consistency design principles,
66 it is not a problem for the edge to receive the same message repeatedly, as long as message is the latest one.
67
68 In this proposal, “At-Least-Once” is the proposed mechanism.
69
70 ### At-Least-Once Delivery
71
72 Shown below is a design using MessageQueue and ACKs to ensure that
73 the messages are delivered from the cloud to the edge.
74
75 <img src="../images/reliable-message-delivery/reliablemessage-workflow.PNG">
76
77 - We use K8s CRD stores the latest resourceVersion of resource that has been sent
78 successfully to edge. When cloudcore restarts or starts normally,
79 it will check the resourceVersion to avoid sending old messages.
80
81 - EdgeController and devicecontroller send the messages to the Cloudhub, and MessageDispatcher will send messages
82 to corresponding NodeMessageQueue according to the node name in message.
83
84 - CloudHub will sequentially send data from the NodeMessageQueue to the corresponding edge node,
85 and will also store the message ID in an ACK channel. When the ACK message from the edge node received,
86 ACK channel will trigger to save the message resourceVersion to K8s as CRD, and send the next message.
87
88 - When the edgecore receives the message, it will first save the message to the local datastore and
89 then return an ACK message to the cloud.
90
91 - If cloudhub does not receive an ACK message within the interval, it will keep resending the message 5 times.
92 If all 5 retries fail, cloudhub will discard the event. SyncController will handling these failed events.
93
94 - Even if the edge node receives the message, the returned ACK message may lost during transmission.
95 In this case, cloudhub will send the message again and the edge can handle the duplicate message.
96
97 ### SyncController
98
99 SyncController will periodically compare the saved objects resourceVersion with the objects in K8s,
100 and then trigger the events such as retry and deletion.
101
102 When cloudhub add events to nodeMessageQueue, it will be compared with the corresponding object in nodeMessageQueue.
103 If the object in nodeMessageQueue is newer, it will directly discard these events.
104
105 <img src="../images/reliable-message-delivery/sync-controller.PNG">
106
107 ### Message Queue
108
109 When each edge node successfully connects to the cloud, a message queue will be created,
110 which will cache all the messages sent to the edge node.
111
112 We use the [workQueue](https://github.com/kubernetes/client-go/blob/master/util/workqueue/rate_limiting_queue.go) and
113 [cacheStore](https://github.com/kubernetes/client-go/blob/master/tools/cache/store.go) from [kubernetes/client-go](https://github.com/kubernetes/client-go)
114 to implement the message queue and object storage. With Kubernetes workQueue,
115 duplicate events will be merged to improve the transmission efficiency.
116
117 - Add message to the queue:
118
119 ```go
120 key,_:=getMsgKey(&message)
121 nodeStore.Add(message)
122 nodeQueue.Add(message)
123 ````
124
125 - Get the message from the queue:
126
127 ```go
128 key,_:=nodeQueue.Get()
129 msg,_,_:=nodeStore.GetByKey(key.(string))
130 ```
131
132 - Structure of the message key:
133
134 ```go
135 Key = resourceType/resourceNamespace/resourceName
136 ```
137
138 ### ACK message Format
139
140 We will construct the following ACK message format:
141
142 ```go
143 AckMessage.ParentID = receivedMessage.ID
144 AckMessage.Operation = "response"
145 ```
146
147 ### ReliableSync CRD
148
149 We use K8s CRD to save the resourceVersion of objects that have been successfully persisted to the edge.
150
151 We designed two types of CRD to save the resourceVersion. ClusterObjectSync is used to save the cluster
152 scoped object and ObjectSync is used to save the namesapce scoped object.
153 Their names consist of the related node name and object UUID.
154
155 #### The ClusterObjectSync
156
157 ```go
158 type ClusterObjectSync struct {
159 metav1.TypeMeta `json:",inline"`
160 metav1.ObjectMeta `json:"metadata,omitempty"`
161
162 Spec ClusterObjectSyncSpec `json:"spec,omitempty"`
163 Status ClusterObjectSyncStatus `json:"spec,omitempty"`
164 }
165
166 // ClusterObjectSyncSpec stores the details of objects that sent to the edge.
167 type ClusterObjectSyncSpec struct {
168 // Required: ObjectGroupVerion is the group and version of the object
169 // that was successfully sent to the edge node.
170 ObjectGroupVerion string `json:"objectGroupVerion,omitempty"`
171 // Required: ObjectKind is the type of the object
172 // that was successfully sent to the edge node.
173 ObjectKind string `json:"objectKind,omitempty"`
174 // Required: ObjectName is the name of the object
175 // that was successfully sent to the edge node.
176 ObjectName string `json:"objectName,omitempty"`
177 }
178
179 // ClusterObjectSyncSpec stores the resourceversion of objects that sent to the edge.
180 type ClusterObjectSyncStatus struct {
181 // Required: ObjectResourceVersion is the resourceversion of the object
182 // that was successfully sent to the edge node.
183 ObjectResourceVersion string `json:"objectResourceVersion,omitempty"`
184 }
185 ```
186
187 #### The ObjectSync
188
189 ```go
190 type ClusterObjectSync struct {
191 metav1.TypeMeta `json:",inline"`
192 metav1.ObjectMeta `json:"metadata,omitempty"`
193
194 Spec ObjectSyncSpec `json:"spec,omitempty"`
195 Status ObjectSyncStatus `json:"spec,omitempty"`
196 }
197
198 // ObjectSyncSpec stores the details of objects that sent to the edge.
199 type ObjectSyncSpec struct {
200 // Required: ObjectGroupVerion is the group and version of the object
201 // that was successfully sent to the edge node.
202 ObjectGroupVerion string `json:"objectGroupVerion,omitempty"`
203 // Required: ObjectKind is the type of the object
204 // that was successfully sent to the edge node.
205 ObjectKind string `json:"objectKind,omitempty"`
206 // Required: ObjectName is the name of the object
207 // that was successfully sent to the edge node.
208 ObjectName string `json:"objectName,omitempty"`
209 }
210
211 // ClusterObjectSyncSpec stores the resourceversion of objects that sent to the edge.
212 type ObjectSyncStatus struct {
213 // Required: ObjectResourceVersion is the resourceversion of the object
214 // that was successfully sent to the edge node.
215 ObjectResourceVersion string `json:"objectResourceVersion,omitempty"`
216 }
217 ```
218
219 ## Exception scenarios/Corner cases handling
220
221 ### CloudCore restart
222
223 - When cloudcore restarts or starts normally, it will check the resourceVersion to avoid sending old messages.
224
225 - During cloudcore restart, if some objects are deleted, the delete event may lost at this time.
226 The SyncController will handle this situation. The object GC mechanism is needed here to ensure the deletion:
227 compare whether the objects stored in CRD exist in K8s. If not, then SyncController will generate & send a delete event
228 to the edge and delete the object in CRD when ACK received.
229
230 ### EdgeCore restart
231
232 - When edgecore restarts or offline for a while, the node message queue will cache all the messages,
233 whenever the node is back online, the messages will be sent.
234
235 - When the edge node is offline, cloudhub will stop sending messages and not retry until
236 the edge node is back online.
237
238 ### EdgeNode deleted
239
240 - When an edgenode is deleted from cloud, cloudcore will remove the corresponding message queue and store.
241
242 ## Performance
243
244 We need to run performance tests after introducing the reliability feature and publish the difference
245 in the results. Reliability is associated with a cost which a user needs to bear.
246
247 The following are the optimizations already considered.
248
249 ### Message queuing and merging duplicated ones
250
251 As we propose to use Kubernetes workQueue to implement NodeMessageQueue: only message key will be queued.
252 The message data is fetched only when it’s ready to be sent.
253
254 When a message is already queued (with its index), follow-up same message (updates on a same k8s object, e.g. pod)
255 will only refresh the message body in cache. Thus, when cloudcore proceed the sending, the latest message data is
256 sent (no duplicated sending operations on a same message).
257
258 ### Lazy creation of NodeMessageQueues
259
260 The NodeMessageQueue will only be created when an edge node is first connected to cloudcore to save memory.
261
262 ### Stop sending and retries when node disconnected
263
264 When an edge node is offline, cloudcore will stop meaningless sending and retires,
265 cache the message and wait for resume when the node is back.
266
267 In long term, we may release NodeMessageQueues that have been holding for a period
268 of time (edge node kept offline long time)
269
270 ## Implementation plan
271
272 - Alpha: v1.2
273 - Beta: TBD
274 - GA: TBD
275
276 Checkout the tracking issue for latest implementation details.