github.com/dgraph-io/dgraph@v1.2.8/wiki/content/dgraph-compared-to-other-databases/index.md

github.com/dgraph-io/dgraph@v1.2.8/wiki/content/dgraph-compared-to-other-databases/index.md (about)

1 +++
2 title = "Dgraph compared to other databases"
3 +++
4
5 This page attempts to draw a comparison between Dgraph and other popular graph databases/datastores. The summaries that follow are brief descriptions that may help a person decide if Dgraph will suit their needs.
6
7 # Batch based
8 Batch based graph processing frameworks provide a very high throughput to do periodic processing of data. This is useful to convert graph data into a shape readily usable by other systems to then serve the data to end users.
9
10 ## Pregel
11 * [Pregel](https://kowshik.github.io/JPregel/pregel_paper.pdf), is a system for large-scale graph processing by Google. You can think of it as equivalent to MapReduce/Hadoop.
12 * Pregel isn't designed to be exposed directly to users, i.e. run with real-time updates and execute arbitrary complexity queries. Dgraph is designed to be able to respond to arbitrarily complex user queries in low latency and allow user interaction.
13 * Pregel can be used along side Dgraph for complementary processing of the graph, to allow for queries which would take over a minute to run via Dgraph, or produce too much data to be consumed by clients directly.
14
15 ---
16
17 # Database
18 Graph databases optimize internal data representation to be able to do graph operations efficiently.
19
20 ## Neo4j
21 [Neo4j](https://neo4j.com/) is the most popular graph database according to [db-engines.com](http://db-engines.com/en/ranking/graph+dbms) and has been around since 2007. Dgraph is a much newer graph database built to scale to Google web scale and for serious production usage as the primary database.
22
23 ### Language
24
25 Dgraph supports [GraphQL+-]({{< relref "query-language/index.md#graphql">}}),
26 a variation of [GraphQL](https://graphql.org/), a query language created by
27 Facebook.
28 GraphQL+-, as GraphQL itself, allows results to be produced as subgraph rather than lists.
29 Schema validation is also useful to ensure data correctness during both input and output.
30
31 ### Scalability
32
33 Neo4j runs on a single server. The enterprise version of Neo4j only runs
34 universal data replicas. As the data scales, this requires user to vertically
35 scale their servers. [Vertical scaling is expensive.][vert]
36
37 Dgraph has a distributed architecture. You can split your data among many Dgraph
38 servers to distribute it horizontally. As you add more data, you can just add
39 more commodity hardware to serve it. Dgraph bakes more performance features like
40 reducing network calls in a cluster and a highly concurrent execution of
41 queries, to achieve a high query throughput. Dgraph does consistent replication
42 of each shard, which makes it crash resilient, and protects users from server
43 downtime.
44
45 [vert]: https://blog.openshift.com/best-practices-for-horizontal-application-scaling/
46
47 ### Transactions
48
49 Both systems provide ACID transactions. Neo4j supports ACID transactions in its
50 single server architecture. Dgraph, despite being a distributed and consistently
51 replicated system, supports ACID transactions with snapshot isolation.
52
53 ### Replication
54
55 Neo4j's universal data replication is only available to users who purchase their
56 [enterprise license][neo4je]. At Dgraph, we consider horizontal scaling and
57 consistent replication the basic necessities of any application built today.
58 Dgraph not only would automatically shard your data, it would move data around
59 to rebalance these shards, so users achieve the best machine utilization and
60 query latency possible.
61
62 Dgraph is consistently replicated. Any read followed by a write would be visible
63 to the client, irrespective of which replica it hit. In short, we achieve
64 linearizable reads.
65
66 [neo4je]: https://neo4j.com/subscriptions/#editions
67
68 ***For a more thorough comparison of Dgraph vs Neo4j, you can read our [blog](https://open.dgraph.io/post/benchmark-neo4j)***
69
70 ---
71
72 # Datastore
73 Graph datastores act like a graph layer above some other SQL/NoSQL database to do the data management for them. This other database is the one responsible for backups, snapshots, server failures and data integrity.
74
75 ## Cayley
76 * Both [Cayley](https://cayley.io/) and Dgraph are written primarily in Go language and inspired from different projects at Google.
77 * Cayley acts like a graph layer, providing a clean storage interface that could be implemented by various stores, for, e.g., PostGreSQL, RocksDB for a single machine, MongoDB to allow distribution. In other words, Cayley hands over data to other databases. While Dgraph uses [Badger](https://github.com/dgraph-io/badger), it assumes complete ownership over the data and tightly couples data storage and management to allow for efficient distributed queries.
78 * Cayley's design suffers from high fan-out issues. In that, if intermediate steps cause a lot of results to be returned, and the data is distributed, it would result in many network calls between Cayley and the underlying data layer. Dgraph's design minimizes the number of network calls, to reduce the number of servers it needs to touch to respond to a query. This design produces better and predictable query latencies in a cluster, even as cluster size increases.
79
80 ***For a comparison of query and data loading benchmarks for Dgraph vs Cayley, you can read [Differences between Dgraph and Cayley](https://discuss.dgraph.io/t/differences-between-dgraph-and-cayley/23/3)***.