github.com/tickoalcantara12/micro/v3@v3.0.0-20221007104245-9d75b9bcbab9/docs/blog/_posts/2019-12-05-building-a-microservices-network.md (about)

     1  ---
     2  author: Milos, Jake and Asim
     3  layout:	post
     4  title:	Building a global services network using Go, QUIC and Micro
     5  date:	2019-12-05 09:00:00
     6  ---
     7  
     8  Over the past 6 months we at [Micro](https://m3o.com/) have been hard at work developing a global service network to build, share and collaborate on microservices.
     9  
    10  In this post we're going to share some of the technical details, the design decisions we made, challenges we faced and ultimately how we have succeeded in building the microservices network.
    11  
    12  ## Motivations
    13  
    14  The power of collaborative development has largely been restricted to trusted environments within organisations. When done right, these private in-house platforms unlock incredible productivity and compounding value with every new service added.They provide an always-on runtime and known developer workflow for engineers to collaborate on and deliver new features to their customers. 
    15  
    16  Historically, this has been quite difficult to achieve outside of organisations. When developers decide to work on new services they often have to deal with a lot of unnecessary work when it comes to making the services available to others to consume and collaborate on. Public cloud providers are too complex and the elaborate setups when hosting things yourself don’t make things easier either. At [Micro](https://m3o.com/) we felt this pain and decided to do something about it. We built a microservices network!
    17  
    18  The micro network looks to solve these problems using a shared global network for micro services. Let’s see how we’ve made this dream a reality!
    19  
    20  ## Design
    21  
    22  The micro network is a globally distributed network based on [go-micro](https://go-micro.dev), a Go microservices framework which enables developers to build services quickly without dealing with the complexity of distributed systems. Go Micro provides strongly opinionated interfaces that are pluggable but also come with sane defaults. This allows Go Micro services to be built once and deployed anywhere, with zero code changes.
    23  
    24  The micro network leverages five of the core primitives: registry, transport, broker, client and server. Our default implementations can be found in each package in the [go-micro](https://github.com/micro/go-micro) framework. Community maintained plugins live in the [go-plugins](https://github.com/micro/go-plugins) repo.
    25  
    26  The micro "network" is an overloaded term, referring both to the global network over which services discover and communicate with each other and the underpinning system consisting of peer nodes whom connect to each and establish the routes over which services communicate.
    27  
    28  The network abstracts away the low level details of distributed system communication at large scale, across any cloud or machine, and allows anyone to build services together without thinking about where they are running. This essentially enables large scale sharing of resources and more importantly microservices.
    29  
    30  There are four fundamental concepts that make the micro network possible. These are entirely new and built into [Go Micro](https://go-micro.dev/) as of the last 6 months:
    31  
    32  - **Tunnel** - point to point tunnelling
    33  - **Proxy** - transparent rpc proxying
    34  - **Router** - route aggregation and advertising
    35  - **Network** - multi-cloud networking built on the above three
    36  
    37  Each of these components is just like any other [Go Micro](https://go-micro.dev/) component - pluggable, with an out of the box default implementation to get started. In our case the micro network it was important that the defaults worked at scale across the world.
    38  
    39  Let’s dig into the details.
    40  
    41  ### Tunnel
    42  
    43  From a high level view the micro network is an overlay network that spans the internet. All micro network nodes maintain secure tunnel connections between each other to enable the secure communication between the services running in the network. Go Micro provides a default tunnel implementation using the QUIC protocol along with custom session management.
    44  
    45  We chose QUIC because it provides some excellent properties especially when it comes to dealing with high latency networks, an important property when dealing with [running services in large distributed networks](https://eng.uber.com/employing-quic-protocol/). QUIC runs over UDP, but by adding some connection based semantics it supports reliable packet delivery. QUIC also supports multiple streams without [head of line blocking](https://en.wikipedia.org/wiki/Head-of-line_blocking) and it’s designed to work with encryption natively. Finally, QUIC runs in userspace, not in the kernel space on conventional systems, so it can provide both a performance and extra security, too.
    46  
    47  Micro tunnel uses [quic-go](https://github.com/lucas-clemente/quic-go) which is the most complete Go implementation of QUIC that we could find at the inception of the micro network. We are aware quic-go is a work in progress and that it can occasionally break, but we are happy to pay the early adopter cost as we believe QUIC will become the defacto standard internet communication protocol in the future, enabling large scale networks such as the micro network.
    48  
    49  Let’s look at the Go Micro tunnel interface:
    50  
    51  ```go
    52  // Tunnel creates a gre tunnel on top of the go-micro/transport.
    53  // It establishes multiple streams using the Micro-Tunnel-Channel header
    54  // and Micro-Tunnel-Session header. The tunnel id is a hash of
    55  // the address being requested.
    56  type Tunnel interface {
    57  	// Address the tunnel is listening on
    58  	Address() string
    59  	// Connect connects the tunnel
    60  	Connect() error
    61  	// Close closes the tunnel
    62  	Close() error
    63  	// Links returns all the links the tunnel is connected to
    64  	Links() []Link
    65  	// Dial to a tunnel channel
    66  	Dial(channel string, opts ...DialOption) (Session, error)
    67  	// Accept connections on a channel
    68  	Listen(channel string, opts ...ListenOption) (Listener, error)
    69  }
    70  ```
    71  
    72  It may look fairly familiar to Go developers. With Go Micro we’ve tried to maintain common interfaces in line with distributed systems development while stepping in at a lower layer to solve some of the nitty gritty details.
    73  
    74  Most of the interface methods should hopefully be self-explanatory, but you might be wondering about channels and sessions. Channels are much like addresses, providing a way to segment different message streams over the tunnel. Listeners listen on a given channel and return a unique session when a client dials into the channel. The session is used to communicate between peers on the same tunnel channel. The Go Micro tunnel provides different communication semantics too. You can choose to use either unicast or multicast. 
    75  
    76  <img src="https://m3o.com/docs/images/session.svg"  alt="" />
    77  
    78  In addition tunnels enable bidirectional connections; sessions can be dialled or listened from either side. This enables the reversal of connections so anything behind a [NAT](https://en.wikipedia.org/wiki/Network_address_translation) or without a public IP can become a server.
    79  
    80  ### Router
    81  
    82  Micro router is a critical component of the micro network. It provides the network’s routing plane. Without the router, we wouldn’t know where to send messages. It constructs a routing table based on the local service registry (a component of Go Micro). The routing table maintains the routes to the services available on the local network. With the tunnel its then also able to process messages from any other datacenter or network enabling global routing by default.
    83  
    84  Our default routing table implementation uses a simple Go in memory map, but as with all things in Go Micro, the router and routing table are both pluggable. As we scale we’re thinking about alternative implementations and even the possibility of switching dynamically based on the size of networks.
    85  
    86  The Go Micro router interface is as follows:
    87  
    88  ```go
    89  // Router is an interface for a routing control plane
    90  type Router interface {
    91  	// The routing table
    92  	Table() Table
    93  	// Advertise advertises routes to the network
    94  	Advertise() (<-chan *Advert, error)
    95  	// Process processes incoming adverts
    96  	Process(*Advert) error
    97  	// Solicit advertises the whole routing table to the network
    98  	Solicit() error
    99  	// Lookup queries routes in the routing table
   100  	Lookup(...QueryOption) ([]Route, error)
   101  	// Watch returns a watcher which tracks updates to the routing table
   102  	Watch(opts ...WatchOption) (Watcher, error)
   103  }
   104  ```
   105  
   106  When the router starts it automatically creates a watcher for its local registry. The micro registry emits events any time services are created, updated or deleted. The router processes these events and then applies actions to its routing table accordingly. The router itself advertises the routing table events which you can think of as a cut down version of the registry solely concerned with routing of requests where as the registry provides more feature rich information like api endpoints.
   107  
   108  These routes are propagated as events to other routers on both the local and global network and applied by every router to their own routing table. Thus maintaining the global network routing plane.
   109  
   110  Here’s a look at a typical route:
   111  
   112  ```go
   113  // Route is a network route
   114  type Route struct {
   115  	// Service is destination service name
   116  	Service string
   117  	// Address is service node address
   118  	Address string
   119  	// Gateway is route gateway
   120  	Gateway string
   121  	// Network is the network name
   122  	Network string
   123  	// Router is router id
   124  	Router string
   125  	// Link is networks link
   126  	Link string
   127  	// Metric is the route cost
   128  	Metric int64
   129  }
   130  ```
   131  
   132  What we’re primarily concerned with here is routing by service name first, finding its address if its local or a gateway if we have to go through some remote endpoint or different network. We also want to know what type of Link to use e.g whether routing through our tunnel, Cloudflare Argo tunnel or some other network implementation. And then most importantly the metric a.k.a. the cost of routing to that node. We may have many routes and we want to take routes with optimal cost to ensure lowest latency. This doesn’t always mean your request is sent to the local network though! Imagine a situation when the service running on your local network is overloaded. We will always pick the route with the lowest cost no matter where the service is running.
   133  
   134  ### Proxy
   135  
   136  We’ve already discussed the tunnel - how messages get from point to point, and routing - detailing how to find where the services are, but then the question really is how do services actually make use of this? For this we really need a proxy.
   137  
   138  It was important to us when building the micro network that we build something that was native to micro and capable of understanding our routing protocol. Building another VPN or IP based networking solution was not our goal. Instead we wanted to facilitate communication between services.
   139  
   140  When a service needs to communicate with other services in the network it uses micro proxy.
   141   
   142  The proxy is a native RPC proxy implementation built on the Go Micro `Client` and `Server` interfaces. It encapsulates the core means of communication for our services and provides a forwarding mechanism for requests based on service name and endpoints. Additionally it has the ability to also act as a messaging exchange for asynchronous communication since Go Micro supports both request/response and pub/sub communication. This is native to Go Micro and a powerful building block for request routing.
   143  
   144  The interface itself is straightforward and encapsulates the complexity of proxying.
   145  
   146  ```go
   147  // Proxy can be used as a proxy server for go-micro services
   148  type Proxy interface {
   149  	// ProcessMessage handles inbound messages
   150  	ProcessMessage(context.Context, server.Message) error
   151  	// ServeRequest handles inbound requests
   152  	ServeRequest(context.Context, server.Request, server.Response) error
   153  }
   154  ```
   155  
   156  The proxy receives RPC requests and routes them to an endpoint. It asks the router for the location of the service (caching as needed) and decides based on the `Link` field in the routing table whether to send the request locally or over the tunnel across the global network. The value of the `Link` field is either `“local"` (for local services) or `“network"` if the service is accessible only via the network.
   157  
   158  Like everything else, the proxy is something we built standalone that can work between services in one datacenter but also across many when used in conjunction with the tunnel and router.
   159  
   160  And finally arriving at the pièce de résistance. The network interface.
   161  
   162  ### Network
   163  
   164  Network nodes are the magic that ties all the core components together. Enabling the ability to build a truly global service network. It was really important when creating the network interface that it fit inline with our existing assumptions and understanding about Go Micro and distributed systems development. We really wanted to embrace the existing interfaces of the framework and design something with symmetry in regards to a Service. 
   165  
   166  What we arrived at was something very similar to the [micro.Service](https://github.com/micro/go-micro/blob/master/micro.go#L16) interface itself
   167  
   168  
   169  ```go
   170  // Network is a micro network
   171  type Network interface {
   172  	// Node is network node
   173  	Node
   174  	// Name of the network
   175  	Name() string
   176  	// Connect starts the resolver and tunnel server
   177  	Connect() error
   178  	// Close stops the tunnel and resolving
   179  	Close() error
   180  	// Client is micro client
   181  	Client() client.Client
   182  	// Server is micro server
   183  	Server() server.Server
   184  }
   185  
   186  // Node is a network node
   187  type Node interface {
   188  	// Id is node id
   189  	Id() string
   190  	// Address is node bind address
   191  	Address() string
   192  	// Peers returns node peers
   193  	Peers() []Node
   194  	// Network is the network node is in
   195  	Network() Network
   196  }
   197  ```
   198  
   199  As you can see, a `Network` has a Name, Client and Server, much like a `Service`, so it provides a similar method of communication. This means we can reuse a lot of the existing code base, but it also goes much further. A `Network` includes the concept of a `Node` directly in the interface, one which has peers and whom may belong to the same network or others. This means is networks are peer-to-peer while Services are largely focused on Client/Server. On a day to day basis developers stay focused on building services but these when built to communicate globally need to operate across networks made up of identical peers.
   200  
   201  Our networks have the ability to behave as peers which route for others but also may provide some sort of service themselves. In this case it's mostly routing related information.
   202  
   203  So how does it all work together?
   204  
   205  Networks have a list of peer nodes to talk to. In the case of the default implementation the peer list comes from the registry with other network nodes with the same name (the name of the network itself). When a node starts it “connects" to the network by establishing its tunnel, resolving the nodes and then connecting to them. Once they’ve connected the nodes peer over two multicast sessions, one for peer announcements and the other for route advertisements. As these propagate the network begins to converge on identical routing information building a full mesh that allows for routing of services from any node to the other.
   206  
   207  The nodes maintain keepalives, periodically advertise the full routing table and flush any events as they occur. Our core network nodes make use of multiple resolvers to find each other, including DNS and the local registry. In the case of peers that join our network, we’ve configured them to use a http resolver which gets geo-steered via Cloudflare anycast DNS and global load balanced to the most local region. From there they pull a list of nodes and connect to the ones with the lowest metric. They then repeat the same song and dance as above to continue the growth of the network and participate in service routing.
   208  
   209  Each node maintains its own network graph based on the peer messages it receives. Peer messages contain the graph of each peer up to 3 hops which enables the ability for every node to build a local view of the network. Peers ignore anything with more than a 3 hop radius. This is to avoid potential performance problems.
   210  
   211  We mentioned a little something about peer and route advertisements. So what message do the network nodes actually exchange? First, the network embeds the router interface through which it advertises its local routes to other network nodes. These routes are then propagated across the whole network, much like the internet. The node itself receives route advertisements from its peers and applies the advertised changes to its routing own routing table. The message types are “solicit" to ask for routes and “advert" for updates broadcast.
   212  
   213  Network nodes send “connect" messages on start and “close" on exit. For their lifetime they are periodically broadcasting “peer" messages so that others can discover them and they all can build the network topology.
   214  
   215  When the network is created and converges, services are then capable of sending messages across it. When a service on the network needs to communicate with some other service on the network it sends a request to the network node. The micro network node embeds micro proxy and thus has the ability to forward the request through network or locally if it deems so more fit based on the metrics it retrieves after looking up the routes in the routing table.
   216  
   217  This as a whole forms our micro services network.
   218  
   219  ## Challenges
   220  
   221  Building a global services is not without its challenges. We encountered many from the initial design phase right through to the present day of dealing with broken nodes, bad actors, event storms and more.
   222  
   223  ### Initial Implementation
   224  
   225  The actual task we’d set out to accomplish was pretty monumental and we’d underestimated how much effort it would take even in an MVP phase of the first implementation.
   226  
   227  Every time we attempted to go from design diagram to implementing code we found ourselves stuck. In theory everything made sense but no matter how many times we attempted to write code things just didn’t click.
   228  
   229  We wrote 3-4 implementations that were essentially thrown away before figuring out the best approach was to make local networking work first and then slowly carve out specific problems to solve. So proxying, following by routing and then a network interface. Eventually when these pieces were in place we could get to multi-cloud or global networking by implementing a tunnel to handle the heavy lifting.
   230  
   231  <center>
   232  <img src="https://m3o.com/images/it-works.jpg" style="width: 80%; height: auto;" />
   233  </center>
   234  
   235  Once again, the lesson is to keep it simple, but where the task itself is complex, break it down into steps you can actually keep simple independently and then piece back together in practice.
   236  
   237  ### Multipoint Tunneling
   238  
   239  One of the most complex pieces of code we had to write was the tunnel. It's still not where we’d like it to be but it was pretty important to us to write this from the ground up so we’d have a good understanding of how we wanted to operate globally but also have full control over the foundations of the system.
   240  
   241  The complexity in writing network protocols really came to light in this effort, from trying to NOT reimplement tcp, crypto or anything else but also find a path to a working solution. In the end we were able to create a subset of commands which formed enough of a bidirectional and multi-session based tunnel over QUIC. We left most of the heavy lifting to QUIC but we also needed the ability to do multicast.
   242  
   243  For us it didn’t make sense to just rely on unicast, considering the async and pubsub based semantics built into Go Micro we felt pretty adamant it needed to be part of core network routing. So with that sessions needed to be reimplemented on top of QUIC.
   244  
   245  We’ll spare you the gory details but what’s really clear to us is that writing state machines and reliable connection protocol code is not where we want to spend the majority of our time. We have a fairly resilient working solution but our hope is to replace this with something far better in the future.
   246  
   247  ### Event Storms
   248  
   249  When things work they work and when they break they break badly. For us everything came crashing down when we started to encounter broadcast storms caused by services being recycled. When a service is created the service registry fires a create event and when it’s shutting down it automatically deregisters from service registry which fires a delete event. Maybe you can see where this is going. As services cycled in our network they’d generate these events which leads to the routers generating new route events which are then propagated every 5 seconds to every other node on the network. 
   250  
   251  This sounds ok if the network converges and they stop propagating events but in our case the sequence of events are observed and applied at random time intervals on every node. This in essence can lead to a broadcast storm which never stops. Understanding and resolving this is an incredibly difficult task.
   252  
   253  For us this really led to research in BGP internet routing in which they’ve defined flap detection algorithms to handle this. We’ve read a few whitepapers to get familiar with the concepts and hacked up a simple flap detection algorithm in the router. 
   254  
   255  At its core, the flap detection assigns a numerical cost to every route event. When a route event occurs it’s cost gets incremented. If the same event happens multiple times within a certain period of time and the accumulated cost reaches a predefined threshold the event is immediately suppressed. Suppressed events are not processed by router, but are kept in a memory cache for a redefined period of time. Meanwhile the cost of the event decays with time whilst at the same time it can keep on growing if the event keeps on flapping. If the event drops below another threshold the event is unsuppressed and can be processed by the routers. If the event remains suppressed for longer than a predefined time period it’s discarded. 
   256  
   257  The picture below depicts how the decaying actually works.
   258  
   259  <center>
   260  <img src="https://m3o.com/assets/images/flap-detection.png" style="width: 80%; height: auto;" />
   261  </center>
   262  
   263  <small>source: http://linuxczar.net/blog/2016/01/31/flap-detection/</small>
   264  
   265  This had a huge effect on the issues we had been experiencing in the network. The nodes were no longer hammered with crazy event storms and the network stabilised and continued to work without any interruptions. Happy days!
   266  
   267  ## Architecture
   268  
   269  Our overall goal is to build a micro services network that manages not only communication but all aspects of running services, governance, and more. To accomplish this we started by addressing networking from the ground up for Go Micro services. Not just to communicate locally within one private network but to have the ability to do so across many networks. 
   270  
   271  For this purpose we’ve created a global multi-cloud network that enables communication from anywhere, with anyone. This is fundamental to the architecture of the micro services network. 
   272  
   273  Our next goal will be to tackle the runtime aspects so that we offer the ability to host services without the need to manage them. This could be imagined as the basis of a serverless microservices platform which we’re looking to launch soon.
   274  
   275  The platform is designed to be open. Anyone should be able to run services on the platform or join the global network using their own node. What’s more, you can even pick up the open source code and build their own private networks or join theirs to our public one.
   276  
   277  <center>
   278    <img src="https://github.com/micro/development/raw/f4c77580acac228c522623c217575fb266d2d4ab/images/arch.jpg" style="width: 80%; height: auto;" />
   279  </center>
   280  <br>
   281  
   282  What we think is pretty cool and rather unique about the micro network is the network nodes themselves are just regular micro services like any other. Because we built everything using Go Micro they behave just like any other service. In fact what’s even more exciting is that literally *everything is a service* in the micro network. 
   283  
   284  This holds true for all the individual components that make up the network. If you don’t want to run full network nodes, you can also run individual components of the network separately as standalone micro services such as the tunnel, router and proxy. All the components register themselves with local registry via which they can be discovered.
   285  
   286  ## Eventual success
   287  
   288  On 29th August 2019 around 4PM we sent the first successful RPC request between our laptops across the internet using the micro network.
   289  
   290  <center>
   291    <img src="https://m3o.com/assets/images/success.jpg" style="width: 80%; height: auto;" />
   292  </center>
   293  <br>
   294  
   295  Since then we have squashed a lot of bugs and deployed the network nodes across the globe.
   296  At the moment we are running the micro network in 4 cloud providers across 4 geographical regions with 3 nodes in each region.
   297  
   298  <center>
   299    <img src="https://m3o.com/assets/images/radar.png" style="width: 80%; height: auto;" />
   300  </center>
   301  
   302  ## Usage
   303  
   304  If you're interested in testing out micro and the network just do the following.
   305  
   306  ```go
   307  # enable go modules
   308  export GO111MODULE=on
   309  
   310  # download micro
   311  go get github.com/tickoalcantara12/micro@master
   312  
   313  # connect to the network
   314  micro --peer
   315  ```
   316  
   317  Now you're connected to the network. Start to explore what's there.
   318  
   319  ```go
   320  # List the services in the network
   321  micro network services
   322  
   323  # See which nodes you're connected to
   324  micro network connections
   325  
   326  # List all the nodes in your network graph
   327  micro network nodes
   328  
   329  # See what the metrics look like to different service routes
   330  micro network routes
   331  ```
   332  
   333  So what does a micro network developer workflow look like? Developers write their Go code using the [Go Micro](https://github.com/micro/go-micro) framework and once they’re ready they can make their services available on the network either directly from their laptop or from anywhere the micro network node runs (more on what micro network node is later).
   334  
   335  Here is an example of a simple service written using `go-micro`:
   336  
   337  ```go
   338  package main
   339  
   340  import (
   341  	"context"
   342  	"log"
   343  	"time"
   344  
   345  	hello "github.com/micro/examples/greeter/srv/proto/hello"
   346  	"github.com/micro/go-micro"
   347  )
   348  
   349  type Say struct{}
   350  
   351  func (s *Say) Hello(ctx context.Context, req *hello.Request, rsp *hello.Response) error {
   352  	log.Print("Received Say.Hello request")
   353  	rsp.Msg = "Hello " + req.Name
   354  	return nil
   355  }
   356  
   357  func main() {
   358  	service := micro.NewService(
   359  		micro.Name("helloworld"),
   360  	)
   361  
   362  	// optionally setup command line usage
   363  	service.Init()
   364  
   365  	// Register Handlers
   366  	hello.RegisterSayHandler(service.Server(), new(Say))
   367  
   368  	// Run server
   369  	if err := service.Run(); err != nil {
   370  		log.Fatal(err)
   371  	}
   372  }
   373  ```
   374  
   375  Once you launch the service it automatically registers with service registry and becomes instantly accessible to everyone on the network to consume and collaborate on. All of this is completely transparent to developers. No need to deal with low level distributed systems cruft!
   376  
   377  We’re already running a greeter service in the network so why not try giving it a call.
   378  
   379  ```
   380  # enable proxying through the network
   381  export MICRO_PROXY=go.micro.network
   382  
   383  # call a service
   384  micro call go.micro.srv.greeter Say.Hello '{"name": "John"}'
   385  ```
   386  
   387  It works!
   388  
   389  ## Conclusion
   390  
   391  Building distributed systems is difficult, but it turns out building the networks they communicate over is an equally, if not more difficult, problem. The classic fallacy, [the network is reliable](https://queue.acm.org/detail.cfm?id=2655736), continues to hold, as we found while building the micro network. However what’s also clear is that our world and most technology thrives through the use of networks. They underpin the very fabric of all that we’ve come to know. Our goal with the micro network is to create a new type of foundation for the open services of the future. Hopefully this post shed some light on the technical accomplishments and challenges of building such a thing.
   392  
   393  <br />
   394  To learn more check out the [website](https://m3o.com), follow us on [twitter](https://twitter.com/m3ocloud) or 
   395  join the [slack](https://m3o.com/slack) community. We are hiring!
   396