github.com/outbrain/consul@v1.4.5/website/source/intro/vs/nagios-sensu.html.md (about) 1 --- 2 layout: "intro" 3 page_title: "Consul vs. Nagios, Sensu" 4 sidebar_current: "vs-other-nagios-sensu" 5 description: |- 6 Nagios and Sensu are both tools built for monitoring. They are used to quickly notify operators when an issue occurs. 7 --- 8 9 # Consul vs. Nagios, Sensu 10 11 Nagios and Sensu are both tools built for monitoring. They are used 12 to quickly notify operators when an issue occurs. 13 14 Nagios uses a group of central servers that are configured to perform 15 checks on remote hosts. This design makes it difficult to scale Nagios, 16 as large fleets quickly reach the limit of vertical scaling, and Nagios 17 does not easily scale horizontally. Nagios is also notoriously 18 difficult to use with modern DevOps and configuration management tools, 19 as local configurations must be updated when remote servers are added 20 or removed. 21 22 Sensu has a much more modern design, relying on local agents to run 23 checks and pushing results to an AMQP broker. A number of servers 24 ingest and handle the result of the health checks from the broker. This model 25 is more scalable than Nagios, as it allows for much more horizontal scaling 26 and a weaker coupling between the servers and agents. However, the central broker 27 has scaling limits and acts as a single point of failure in the system. 28 29 Consul provides the same health checking abilities as both Nagios and Sensu, 30 is friendly to modern DevOps, and avoids the scaling issues inherent in the 31 other systems. Consul runs all checks locally, like Sensu, avoiding placing 32 a burden on central servers. The status of checks is maintained by the Consul 33 servers, which are fault tolerant and have no single point of failure. 34 Lastly, Consul can scale to vastly more checks because it relies on edge-triggered 35 updates. This means that an update is only triggered when a check transitions 36 from "passing" to "failing" or vice versa. 37 38 In a large fleet, the majority of checks are passing, and even the minority 39 that are failing are persistent. By capturing changes only, Consul reduces 40 the amount of networking and compute resources used by the health checks, 41 allowing the system to be much more scalable. 42 43 An astute reader may notice that if a Consul agent dies, then no edge triggered 44 updates will occur. From the perspective of other nodes, all checks will appear 45 to be in a steady state. However, Consul guards against this as well. The 46 [gossip protocol](/docs/internals/gossip.html) used between clients and servers 47 integrates a distributed failure detector. This means that if a Consul agent fails, 48 the failure will be detected, and thus all checks being run by that node can be 49 assumed failed. This failure detector distributes the work among the entire cluster 50 while, most importantly, enabling the edge triggered architecture to work.