github.com/dgraph-io/dgraph@v1.2.8/wiki/content/dgraph-compared-to-other-databases/index.md (about) 1 +++ 2 title = "Dgraph compared to other databases" 3 +++ 4 5 This page attempts to draw a comparison between Dgraph and other popular graph databases/datastores. The summaries that follow are brief descriptions that may help a person decide if Dgraph will suit their needs. 6 7 # Batch based 8 Batch based graph processing frameworks provide a very high throughput to do periodic processing of data. This is useful to convert graph data into a shape readily usable by other systems to then serve the data to end users. 9 10 ## Pregel 11 * [Pregel](https://kowshik.github.io/JPregel/pregel_paper.pdf), is a system for large-scale graph processing by Google. You can think of it as equivalent to MapReduce/Hadoop. 12 * Pregel isn't designed to be exposed directly to users, i.e. run with real-time updates and execute arbitrary complexity queries. Dgraph is designed to be able to respond to arbitrarily complex user queries in low latency and allow user interaction. 13 * Pregel can be used along side Dgraph for complementary processing of the graph, to allow for queries which would take over a minute to run via Dgraph, or produce too much data to be consumed by clients directly. 14 15 --- 16 17 # Database 18 Graph databases optimize internal data representation to be able to do graph operations efficiently. 19 20 ## Neo4j 21 [Neo4j](https://neo4j.com/) is the most popular graph database according to [db-engines.com](http://db-engines.com/en/ranking/graph+dbms) and has been around since 2007. Dgraph is a much newer graph database built to scale to Google web scale and for serious production usage as the primary database. 22 23 ### Language 24 25 Dgraph supports [GraphQL+-]({{< relref "query-language/index.md#graphql">}}), 26 a variation of [GraphQL](https://graphql.org/), a query language created by 27 Facebook. 28 GraphQL+-, as GraphQL itself, allows results to be produced as subgraph rather than lists. 29 Schema validation is also useful to ensure data correctness during both input and output. 30 31 ### Scalability 32 33 Neo4j runs on a single server. The enterprise version of Neo4j only runs 34 universal data replicas. As the data scales, this requires user to vertically 35 scale their servers. [Vertical scaling is expensive.][vert] 36 37 Dgraph has a distributed architecture. You can split your data among many Dgraph 38 servers to distribute it horizontally. As you add more data, you can just add 39 more commodity hardware to serve it. Dgraph bakes more performance features like 40 reducing network calls in a cluster and a highly concurrent execution of 41 queries, to achieve a high query throughput. Dgraph does consistent replication 42 of each shard, which makes it crash resilient, and protects users from server 43 downtime. 44 45 [vert]: https://blog.openshift.com/best-practices-for-horizontal-application-scaling/ 46 47 ### Transactions 48 49 Both systems provide ACID transactions. Neo4j supports ACID transactions in its 50 single server architecture. Dgraph, despite being a distributed and consistently 51 replicated system, supports ACID transactions with snapshot isolation. 52 53 ### Replication 54 55 Neo4j's universal data replication is only available to users who purchase their 56 [enterprise license][neo4je]. At Dgraph, we consider horizontal scaling and 57 consistent replication the basic necessities of any application built today. 58 Dgraph not only would automatically shard your data, it would move data around 59 to rebalance these shards, so users achieve the best machine utilization and 60 query latency possible. 61 62 Dgraph is consistently replicated. Any read followed by a write would be visible 63 to the client, irrespective of which replica it hit. In short, we achieve 64 linearizable reads. 65 66 [neo4je]: https://neo4j.com/subscriptions/#editions 67 68 ***For a more thorough comparison of Dgraph vs Neo4j, you can read our [blog](https://open.dgraph.io/post/benchmark-neo4j)*** 69 70 --- 71 72 # Datastore 73 Graph datastores act like a graph layer above some other SQL/NoSQL database to do the data management for them. This other database is the one responsible for backups, snapshots, server failures and data integrity. 74 75 ## Cayley 76 * Both [Cayley](https://cayley.io/) and Dgraph are written primarily in Go language and inspired from different projects at Google. 77 * Cayley acts like a graph layer, providing a clean storage interface that could be implemented by various stores, for, e.g., PostGreSQL, RocksDB for a single machine, MongoDB to allow distribution. In other words, Cayley hands over data to other databases. While Dgraph uses [Badger](https://github.com/dgraph-io/badger), it assumes complete ownership over the data and tightly couples data storage and management to allow for efficient distributed queries. 78 * Cayley's design suffers from high fan-out issues. In that, if intermediate steps cause a lot of results to be returned, and the data is distributed, it would result in many network calls between Cayley and the underlying data layer. Dgraph's design minimizes the number of network calls, to reduce the number of servers it needs to touch to respond to a query. This design produces better and predictable query latencies in a cluster, even as cluster size increases. 79 80 ***For a comparison of query and data loading benchmarks for Dgraph vs Cayley, you can read [Differences between Dgraph and Cayley](https://discuss.dgraph.io/t/differences-between-dgraph-and-cayley/23/3)***.