github.com/metacurrency/holochain@v0.1.0-alpha-26.0.20200915073418-5c83169c9b5b/holochain.tex (about) 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 2 % writeLaTeX Example: Academic Paper Template 3 % 4 % Source: http://www.writelatex.com 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 6 \documentclass[twocolumn,showpacs,% 7 nofootinbib,aps,superscriptaddress,% 8 eqsecnum,prd,notitlepage,showkeys,10pt]{revtex4-1} 9 10 \usepackage{amssymb} 11 \usepackage{amsmath} 12 \usepackage{graphicx} 13 \usepackage{dcolumn} 14 \usepackage{hyperref} 15 \usepackage{draftwatermark} 16 \usepackage{enumitem} 17 \usepackage{setspace} % for \onehalfspacing and \singlespacing macros 18 \usepackage{etoolbox} 19 \AtBeginEnvironment{quote}{\singlespace\vspace{-\topsep}\small} 20 \AtEndEnvironment{quote}{\vspace{-\topsep}\endsinglespace} 21 \usepackage{url} 22 \SetWatermarkText{Draft} 23 \SetWatermarkColor[gray]{0.9} 24 \SetWatermarkScale{5} 25 26 27 \newtheorem{itemlet}{} 28 29 \begin{document} 30 31 % macros 32 \newcommand\todo[1]{\textcolor{red}{#1}} 33 \newcommand\term[1]{\textbf{\textit{#1}}} 34 \newcommand{\sbtc}{$\Omega_{\text{bitcoin}}$} 35 \newcommand{\sgit}{$\Omega_{\text{git}}$} 36 \newcommand{\shc}{$\Omega_{\text{hc}}$} 37 \newcommand{\hcdna}{$\text{DNA}$} 38 \newcommand{\hcid}{\iota} 39 \newcommand{\dhtget}{\texttt{get}} 40 \newcommand{\dhtput}{\texttt{put}} 41 \newcommand{\dhtstate}{\Delta} 42 \newcommand{\dhtfns}{F_\mathrm{DHT}} 43 \newcommand{\sysfns}{F_\mathrm{sys}} 44 \newcommand{\appfns}{F_\mathrm{app}} 45 \newcommand{\hcdht}{DHT_\text{hc}} 46 \newcommand{\chain}{\mathcal{X}} 47 \newcommand{\eqbang}{\stackrel{!}{=}} 48 49 \title{Holochain \\ 50 \small scalable agent-centric distributed computing\\DRAFT(ALPHA 1) -- 2/15/2018} 51 52 \author{Eric Harris-Braun, Nicolas Luck, Arthur Brock} 53 54 \affiliation{Ceptr, LLC} 55 56 \begin{abstract} 57 ABSTRACT : We present a scalable, agent-centric distributed computing platform. We use a formalism to characterize distributed systems, show how it applies to some existing distributed systems, and demonstrate the benefits of shifting from a data-centric to an agent-centric model. We present a detailed formal specification of the Holochain system, along with an analysis of its systemic integrity, capacity for evolution, total system computational complexity, implications for use-cases, and current implementation status. 58 59 \end{abstract} 60 61 \maketitle 62 63 \section{Introduction} 64 \label{sec:intro} 65 66 Distributed computing platforms have achieved a new level of viability with the advent of two foundational cryptographic tools: secure hashing algorithms, and public-key encryption. These have provided solutions to key problems in distributed computing: verifiable, tamper-proof data for sharing state across nodes in the distributed system and confirmation of data provenance via digital signature algorithms. The former is achieved by hash-chains, where monotonic data-stores are rendered intrinsically tamper-proof (and thus confidently sharable across nodes) by including hashes of previous entries in subsequent entries. The latter is achieved by combining cryptographic encryption of hashes of data and using the public keys themselves as the addresses of agents, thus allowing other agents in the system to mathematically verify the data's source. 67 68 Though hash-chains help solve the problem of independently acting agents reliably sharing state, we see two very different approaches in their use which have deep systemic consequences. These approaches are demonstrated by two of today's canonical distributed systems: \begin{enumerate} 69 \item git\footnote{https://git-scm.com/about}: In git, all nodes can update their hash-chains as they see fit. The degree of overlapping shared state of chain entries (known as commit objects) across all nodes is not managed by git but rather explicitly by action of the agent making pull requests and doing merges. We call this approach \term{agent-centric} because of its focus on allowing nodes to share independently evolving data realities. 70 \item Bitcoin\footnote{https://bitcoin.org/bitcoin.pdf}: In Bitcoin (and blockchain in general), the ``problem" is understood to be that of figuring out how to choose one block of transactions among the many variants being experienced by the mining nodes (as they collect transactions from clients in different orders), and committing that single variant to the single globally shared chain. We call this approach \term{data-centric} because of its focus on creating a single shared data reality among all nodes. 71 \end{enumerate} 72 73 We claim that this fundamental original stance results directly in the most significant limitation of the blockchain: scalability. This limitation is widely known \footnote{add various sources} and many solutions have been offered \footnote{more footnotes here}. Holochain offers a way forward by directly addressing the root data-centric assumptions of the blockchain approach. 74 75 \section{Prior Work} 76 This paper builds largely on recent work in cryptographic distributed systems and distributed hash tables and multi-agent systems. 77 78 Ethereum: Wood \cite{yellowpaper}, DHT: \cite{kademlia} Benet \cite{ipfs} 79 80 81 \todo{TODO: discussion and more references here} 82 83 \section{Distributed Systems} 84 \label{sec:data-centric-systems} 85 86 \subsection{Formalism} 87 \label{sec:formalism} 88 89 We define a simple generalized model of a distributed system $\Omega$ using hash-chains as follows: 90 \begin{enumerate} 91 \item Let $N$ be the set of elements $\{n_1,n_2,\dots n_n\}$ participating in the system. Call the elements of $N$ \term{nodes} or \term{agents}. 92 \item Let each node $n$ consist of a set $S_n$ with elements $\{\sigma_1,\sigma_2,\dots\}$. Call the elements of $S_n$ the \term{state} of node $n$. For the purposes of this paper we assume $\forall \sigma_i \in S_n : \sigma_i = \{ \chain_i, D_i\}$ with $\chain_i$ being a \term{hash-chain} and $D$ a set of non-hash chain \term{data elements}. 93 \item Let $H$ be a cryptographically secure hash function. 94 \item Let there be a \term{state transition function}: 95 \begin{equation} 96 \tau(\sigma_i, t) = (\tau_\chain(\chain_i, t), \tau_D(D_i,t)) 97 \end{equation} 98 where: 99 \begin{enumerate} 100 \item 101 $\tau_\chain(\chain_i,t) = \chain_{i+1}$ where 102 \begin{equation} 103 \begin{split} 104 \chain_{i+1} &= \chain_{i} \cup \{x_{i+1}\} \\ 105 &= \{x_1, \dots, x_i, x_{i+1}\} 106 \end{split} 107 \end{equation} 108 with 109 \begin{equation} 110 \begin{split} 111 x_{i+1} &= \{h,t\} \\ 112 h &= \{ H(t),y\} \\ 113 y &= \{H(x_j) | j<i\} 114 \end{split} 115 \end{equation} 116 117 Call $h$ a \term{header} and note how the sequence of headers creates a chain (tree, in the general case) by linking each header to the previous header(s) and the transaction. 118 \item $D_\text{i+1}=\tau_\mathrm{D}(\sigma_i,t)$ 119 120 \end{enumerate} 121 \item Let $V(t,v)$ be a function that takes $t$, along with extra validation data $v$, verifies the validity of $t$ and only if valid calls a transition function for $t$. Call $V$ a \term{validation} function. 122 \label{formalism:validation} 123 \item Let $I(t)$ be a function that takes a transaction $t$, evaluates it using a function $V$, and if valid, uses $\tau$ to transform $S$. Call $I$ the \term{input} or \term{stimulus} function. 124 \item Let $P(x)$ be a function that can create transactions $t$ and trigger functions $V$ and $\tau$, and $P$ itself is triggered by state changes or the passage of time. Call $P$ the \term{processing} function. 125 \item Let $C$ be a channel that allows all nodes in $N$ to communicate and over which each node has a unique address $A_n$. Call $C$ and the nodes that communicate on it the \term{network}. 126 \item Let $E(i)$ be a function that changes functions $V,I,P$. Call $E$ the \term{evolution} function. 127 \end{enumerate} 128 129 Explanation: this formalism allows us to model separately key aspects of agents. 130 131 First we separate the agent's state into a cryptographically secured hash-chain part $\chain$ and another part that holds arbitrary data $D$. Then we split the process of updating the state into two steps: 1) the validation of new transactions $t$ through the validation function $V(t,v)$, and 2) the actual change of internal state $S$ (as either $\chain$ or $D$) through the state transition functions $\tau_\chain$ and $\tau_D$. Finally, we distinguish between 1) state transitions triggered by external events, stimuli, received through $I(t)$, and 2) a node's internal processing $P(x)$ that also results in calling $V$ and $\tau$ with an internally created transaction. 132 133 We define some key properties of distributed systems: 134 \begin{enumerate} 135 \item Call a set of nodes in $N$ for which any of the functions $T,V,P$ and $E$ have the properties of being both reliably known and also known to be identical for that set of nodes: \term{trusted} nodes with respect to the functions so known. 136 \item Call a channel $C$ with the property that messages in transit can be trusted to arrive exactly as sent: \term{secure}. 137 \item Call a channel $C$ on which the address $A_n$ of a node $n$ is $A_n=H(pk_n)$, where $pk_n$ is the public key of the node $n$, and on which all messages include a digital signature of the message signed by sender: \term{authenticated}. 138 139 \item Call a data element that is accessible by its hash \term{content addressable}. 140 \end{enumerate} 141 For the purposes of this paper we assume untrusted nodes, i.e., independently acting agents solely under their own control, and an insecure channel. We do this because the very \textit{raison d'\^etre} of the cryptographic tools mentioned above is to allow individual nodes to trust the whole system under this assumption. The cryptography immediately makes visible in the state data when any other node in the system uses a version of the functions different from itself. This property is often referred to as a \term{trustless} system. However, because it simply means that the locus of trust has been shifted to the state data, rather than other nodes, we refer to it as systemic reliance on \term{intrinsic data integrity}. See \ref{sec:integrity} for a detailed discussion on trust in distributed systems. 142 143 \subsection{Data-Centric and Agent-Centric Systems} 144 \label{sec:data-agent} 145 146 Using this definition, Bitcoin can be understood as that system \sbtc where: 147 \begin{enumerate} 148 \item $\forall n,m \in N: \chain_n\eqbang\chain_m$ where $\eqbang$ means \textit{is enforced}. 149 \item $V(e,v)$ $e$ is a block and $v$ is the output from the ``proof-of-work" hash-crack algorithm, and $V$ confirms the validity of $v$, the structure and validity of $e$ according to the double-spend rules\footnote{pointer here}. 150 \item $I(t,n)$ accepts transactions from clients and adds them to $D$ (the \textit{mempool}) to build a block for later use in triggering $V()$. 151 \item $P(i)$ is the \textit{mining} process including the ``proof-of-work" algorithm and composes with $V()$ and $\tau_\chain$ when the hash is cracked. 152 \item $E(i)$ is not formally defined but can be mapped informally to a decision by humans operating the nodes to install new versions of the Bitcoin software. 153 \end{enumerate} 154 155 The first point establishes the central aspect of Bitcoin's (and Blockchain applications' in general) strategy for solving or avoiding problems otherwise encountered in decentralized systems, and that is by trying to maintain a network state in which all nodes \textbf{should} have the same (local) chain. 156 157 By contrast, for \sgit there is no such constraint on any $\chain_n$, $\chain_m$ in nodes $n$ and $m$ matching, as git's core intent is to allow different agents act autonomously and divergently on a shared code-base, which would be impossible if the states always had to match. 158 159 Through the lens of the formalism some other aspects of \sgit can be understood as follows: 160 \begin{enumerate} 161 \item the validation function $V(e,v)$ by default only checks the structural validity of $e$ as a commit object not it's content (though note that git does also support signing of commits which is also part of the validation) 162 \item the stimulus function $I(t)$ for \sgit\ consists of the set of git commands available to the user 163 \item the state transition function $\tau_\chain$ is the internal git function that adds a commit object and $\tau_\textrm{D}$ is the git function that adds code to the \texttt{index} triggered by \texttt{add} 164 \item $E$ is, similarly to \sbtc, not formally defined for \sgit. 165 \end{enumerate} 166 167 We leave a more in depth application of the formalism to \sgit\ as an excercise for the reader, however we underscore that the core difference between \sbtc\ and \sgit\ lies in the formers constraint of $\forall n,m \in N: \chain_n\eqbang\chain_m$. One direct consequence of this for \sbtc\ is that as the size of $\mathcal{X}_n$ grows, necessarily all nodes of \sbtc\ must grow in size, whereas this is not necessarily the case for \sgit\, and in it lies the core of Bitcoin's scalability issues. 168 169 It's not surprising that a data-centric approach was used for Bitcoin. This comes from the fact that its stated intent was to create digitally transferable ``coins," i.e., to model in a distributed digital system that property of matter known as location. On centralized computer systems this doesn't even appear as a problem because centralized systems have been designed to allow us to think from a data-centric perspective. They allow us to believe in a kind of data objectivity, as if data exists, like a physical object sitting someplace having a location. They allow us to think in terms of an absolute frame - as if there \textit{is} a correct truth about data and/or time sequence, and suggests that ``consensus" should converge on this truth. In fact, this is not a property of information. Data exists always from the vantage point of an observer. It is this fact that makes digitally transferable ``coins" a \textit{hard problem} in distributed systems which consist entirely of multiple vantage points by definition. 170 171 In the distributed world, events don't happen in the same sequence for all observers. For Blockchain specifically, this is the heart of the matter: choosing which block, from all the nodes receiving transactions in different orders, to use for the ``consensus," i.e., what single vantage point to enforce on all nodes. Blockchains don't record a universal ordering of events -- they manufacture a single authoritative ordering of events -- by stringing together a tiny fragment of local vantage points into one global record that has passed validation rules. 172 173 The use of the word consensus seems at best dubious as a description of a systemic requirement that all nodes carry identical values of $\chain_n$. Especially when the algorithm for ensuring that sameness is essentially a digital lottery powered by expensive computation of which the primary design feature is to randomize which node gets to run $V_n$ such that no node has preference to which $e$ gets added to $\chain_n$. 174 175 The term consensus, as normally used, implies deliberation with regard to differences and work on crafting a perspective that holds for all parties, rather than simply selecting one party's dataset at random. In contrast, as a more agent-centric distributed system, git's \textit{merge} command provides for a processes more recognizable as consensus, however it's not automated. 176 177 Perhaps a more accurate term for the hash-crack algorithm applied in \sbtc\ would be ``proof-of-luck" and for the process itself simply sameness, not consensus. If you start from a data-centric viewpoint, which naturally throws out the ``experience" of all agents in favor of just one, it's much harder to design them to engage in processes that actually have the real-world properties of consensus. If the constraint of keeping all nodes' states the same were adopted consciously as a fit for a specific purpose, this would not be particularly problematic. Unfortunately the legacy of this data-centric viewpoint has been held mostly unconsciously and is adopted by more generalized distributed computing systems, for which the intent doesn't specifically include the need to model ``digital matter" with universally absolute location. While having the advantages of conceptual simplicity, it also immediately creates scalability issues, but worse, it makes it hard to take advantages inherent in the agent-centric approach. 178 179 \section{Generalized Distributed Computation} 180 \label{sec:dist-comp} 181 182 The previous section described a general formalism for distributed systems and compared git to Bitcoin as an example of an agent-centric vs. a data-centric distributed system. Neither of these systems, however, provides generalized computation in the sense of being a framework for writing computer programs or creating applications. So, lets add the following constraints to formalism~\ref{sec:formalism} as follows: 183 184 \begin{enumerate} 185 \item With respect to a machine $M$, some values of $S_n$ can be interpreted as: executable code and the results of code execution, and they may be accessible to $M$ and the code. Call such values the \term{machine state}. 186 \item $\exists t$ and nodes $n$ such that $I_n(t)$ will trigger execution of that code. Call such transaction values \term{calls}. 187 \end{enumerate} 188 189 \subsection{Ethereum} 190 Ethereum\footnote{https://github.com/ethereum/wiki/wiki/White-Paper} provides the current premier example of generalized distributed computing using the Blockchain model. The Ethereum approach comes from an ontology of replicating the data certainty of single physical computer, on top of the stratum of a bunch of distributed nodes using the blockchain strategy of creating a single data reality in a cryptographic chain, but commiting computations, instead of just monetary transactions as in bitcoin, into the blocks. 191 192 This approach does live up to the constraints listed above as described by Wood \cite{yellowpaper} where the bulk of that paper can be understood as a specification of a validation function $V_n()$ and the described state transition function $\sigma_\text{t+1} \equiv \Upsilon(\sigma,T)$ as a specification of how constraints above are met. 193 194 Unfortunately the data-centric legacy inherited by Ethereum from the blockchain model, is immediately observable in its high compute cost\footnote{link to our benchmarkng} and difficulty in scaling\footnote{find a scholarly article }. 195 196 197 \subsection{Holochain} 198 \label{holochain} 199 We now proceed to describe an agent-centric distributed generalized computing system, where nodes can still confidently participate in the system as whole even though they are not constrained to maintaining the same chain state as all other nodes. 200 201 In broad strokes: a Holochain application consists of a network of agents maintaining a unique source chain of their transactions, paired with a shared space implemented as a validating, monotonic, sharded, distributed hash table (DHT) where every node enforces validation rules on that data in the DHT as well as providing provenance of data from the source chains where it originated. 202 203 Using our formalism, a Holochain based application \shc is defined as: 204 205 \begin{enumerate} 206 207 \item Call $\chain_n$ the \term{source chain} of $n$. 208 209 \item Let $M$ be a virtual machine used to execute code. 210 211 \item Let the initial entry of all $\chain_n$ in $N$ be identical and consist in the set \hcdna $\{e_1,e_2,\dots,f_1,f_2,\dots,p_1,p_2,\dots\}$ where $e_x$ are definitions of entry types that can be added to the chain, $f_x$ are functions defined as executable on $M$ (which we also refer to as the set $\appfns = \{app_1,app_2,\dots\}$), and $p_x$ are system properties which among other things declare the expected operating parameters of the application being specificed. For example the resilience factor as defined below is set as one such property. 212 213 \item Let $\hcid_n$ be the second entry of all $\chain_n$ and be a set of the form $\{p,i\}$ where $p$ is the public key and $i$ is identifying information appropriate to the use of this particular \shc. Note that though this entry is of the same format for all $\chain_n$ it's content is not the same. Call this entry the \term{agent identity} entry. 214 215 \item $\forall e_x \in DNA$ let there be an $app_x \in \appfns$ which can be used to validate transactions that involve entries of type $e_x$. Call this set $F_\mathrm{v}$ or the \term{application validation functions}. 216 217 \item Let there be a function $V_\mathrm{sys}(ex,e,v)$ which checks that $e$ is of the form specified by the entry definition for $e_x \in$ \hcdna. Call this function the \term{system entry validation function}. 218 219 \item Let the overall validation function $V(e,v) \equiv \bigvee_x F_\mathrm{v}(e_x)(v) \wedge V_\mathrm{sys}(e_x,e,v)$. 220 221 \item Let $F_\mathrm{I}$ be a subset of $\appfns$ distinct from $F_\mathrm{v}$ such that $\forall f_x(t) \in F_\mathrm{I}$ there exists a $t$ to $I(t)$ that will trigger $f_x(t)$. Call the functions in $F_\mathrm{I}$ the \term{exposed functions}. 222 223 \item Call any functions in $\appfns$ not in $F_\mathrm{v}$ or $F_\mathrm{I}$ \term{internal functions} and allow them to be called by other functions. 224 \item Let the channel $C$ be \term{authenticated}. 225 226 \item Let $DHT$ define a distributed hash table on an authenticated channel as follows: 227 \begin{enumerate} 228 229 \item Let $\dhtstate$ be a set $\{\delta_1,\delta_2,\dots\}$ where $\delta_x$ is a set $\{key,value\}$ where $key$ is always the hash $H(value)$ of $value$. Call $\dhtstate$ the \term{DHT state}. 230 \item Let $\dhtfns$ be the set of functions $\{dht_\text{put},dht_\text{get}\}$ where: 231 \begin{enumerate} 232 \item $dht_\text{put}(\delta_\text{key,value})$ adds $\delta_\text{key,value}$ to $\dhtstate$ 233 \item $dht_\text{get}(key) = value$ of $\delta_\text{key,value}$ in $\dhtstate$ 234 \end{enumerate} 235 \item \label{routable} Assume $x,y \in N$ and $\delta_i \in \dhtstate_x$ but $\delta_i \notin \dhtstate_y$. Allow that when $y$ calls $dht_\text{get}(key)$, $\delta_i$ will be retrieved from $x$ over channel $X$ and added to $\dhtstate_y$. 236 \end{enumerate} 237 DHT are sufficiently mature that there are a number of ways to ensure property \ref{routable}. For our current alpha version we use a modified version of \cite{kademlia} as implemented in \cite{libp2p}. 238 239 \item Let $\hcdht$ augment $DHT$ as follows: 240 \begin{enumerate} 241 242 \item $\forall \delta_\text{key,value} \in \dhtstate$ constrain $value$ to be of an entry type as defined in \hcdna. Furthmore, enforce that any function call $dht_x(y)$ which modifies $\dhtstate$ also uses $F_\mathrm{v}(y)$ to validate $y$ and records whether it is valid. Note that this validation phase may include contacting the source nodes involved in generating $y$ to gather more information about the context of the transaction, see \ref{sec:membandprov}. 243 244 \item Enforce that all elements of $\dhtstate$ only be changed monotonically, that is, elements $\delta$ can only be added to $\dhtstate$ not removed. 245 246 \item Include in $\dhtfns$ the functions defined in \ref{apdx:dhtfn}. 247 248 \item Allow the sets $\delta \in \dhtstate$ to also include more elements as defined in \ref{apdx:dhtfn}. 249 250 \item Let $d(x,y)$ be a \textit{symmetric} and \textit{unidirectional} distance metric within the hash space defined by $H$, as for example the XOR metric defined in \cite{kademlia}. Note that this metric can be applied between entries and nodes alike since the addresses of both are values of the same hash function $H$ (i.e. $\delta_{key}=H(\delta_{value})$ and $A_n=H(pk_n)$). 251 252 \item Let $r$ be a parameter of $\hcdht$ to be set dependent on the characteristics deemed beneficial for maintaining multiple copies of entries in the $DHT$ for the given application. 253 Call $r$ the \term{resilience factor}. 254 255 \label{dht:metrics} 256 \item Allow that each node can maintain a set $M = \{m_n, \dots \}$ of metrics $m_n$ about other nodes, where each $m_n$ contains both a node's direct experience of $n$ with respect to that metric, as well as the experience of other nodes of $n$. Enforce that one such metric kept is \textit{•}{uptime} which keeps track of the percentage of time a node is experienced to be available. Call the process of nodes sharing these metrics \term{gossip} and refer to \ref{sec:gossip} for details. 257 258 \item Enforce that $\forall \delta \in \dhtstate_n$ each node $n$ maintains a set $V_\delta = \{n_1,\dots,n_q\}$ of $q$ closest nodes to $\delta$ as seen from $n$, which are \textit{expected by n} to also hold $\delta$. Resiliency is maintained by taking into account node uptimes and choosing the value of $q$ so that: 259 \begin{equation} 260 \sum_{i=0}^q uptime(n_i)\geq r 261 \end{equation} 262 whith $uptime(n) \in [0,1]$. 263 264 Call the union of such sets $V_\delta$, from a given node's perspective, the \term{overlap list} and also note that $q\geq r$. 265 266 \item \label{hc:shards} Allow every node $n$ to discard every $\delta_x \in \dhtstate_n$ if the number of closer (with regards to $d(x,y)$) nodes is greater than $q$ 267 (i.e. if other nodes are able to construct their $V_\delta$ sets without including $n$, which in turn means there are enough other nodes responsible for holding $\delta$ in their $\Delta_m$ to have the system meet the resilience set by $r$ even without $n$ participating in storing $\delta$). 268 Note that this results in the network adapting to changes in topology and DHT state migrations by regulating the number of network-wide redundant copies of all $\delta_i\in\dhtstate$ to match $r$ according to node uptime. 269 270 \end{enumerate} 271 272 Call $\hcdht$ a \term{validating}, \term{monotonic}, \term{sharded} DHT. 273 274 \item $\forall n \in N$ assume $n$ implements $\hcdht$, that is: $\dhtstate$ is a subset of $D$ (the non hash-chain state data), and $\dhtfns$ are available to $n$, though note that these functions are NOT directly available to the functions $\appfns$ defined in \hcdna. 275 276 \item Let $\sysfns$ be the set of functions $\{sys_\text{commit},sys_\text{get}, \dots\}$ where: 277 \begin{enumerate} 278 \item $sys_\text{commit}(e)$ uses the system validation function $V(e,v)$ to add $e$ to $\chain$, and if successful calls $dht_\text{put}(H(e),e)$. 279 \item $sys_\text{get}(k) = dht_\text{get}(k)$. 280 \item see additional system functions defined in \ref{apdx:sysfn}. 281 \end{enumerate} 282 283 \item Allow the functions in $\appfns$ defined in the \hcdna\ to call the functions in $\sysfns$. 284 \item Let $m$ be an arbitrary message. Include in $\sysfns$ the function $sys_\text{send}(A_\text{to},m)$ which when called on $n_\text{from}$ will trigger the function $app_\text{receive}(A_\text{from},m)$ in the \hcdna\ on the node $n_\text{to}$. Call this mechanism \term{node-to-node messaging}. 285 \item \label{private} Allow that the definition of entries in \hcdna\ can mark entry types as \term{private}. Enforce that if an entry $\sigma_x$ is of such a type then $\sigma_x \notin \dhtstate$. Note however that entries of such type can be sent as node-to-node messages. 286 \item Let the system processing function $P(i)$ be a set of functions in $\appfns$ to be registered in the system as callbacks based on various criteria, e.g. notification of rejected puts to the DHT, passage of time, etc. 287 \end{enumerate} 288 289 \subsection{Systemic Integrity Through Validation} 290 \label{sec:integrity} 291 292 The appeal of the data-centric approach to distributed computing comes from the fact that if you can prove that all nodes reliably have the same data then that provides strong general basis from which to prove the integrity of the system as a whole. In the case of Bitcoin, the $\chain$ holds the transactions and the unspent transaction outputs, which allows nodes to verify future transactions against double-spend. In the case of Ethereum, $\chain$ holds what ammounts to pointers to machine state. Proving the consistency across all nodes of those data sets is fundamental to the integrity of those systems. 293 294 However, because we have started with the assumption (see \ref{sec:formalism}) of distributed systems of independently acting agents, any \textit{proof} of $\forall n,m \in N: \chain_n\eqbang\chain_m$ in a blockchain based system is better understood as a \textit{choice} (hence our use of the $\eqbang$), in that nodes use their agency to decide when to stop interacting with other nodes based on detecting that the $\chain$ state no longer matches. This might also be called ``proof by enforcement," and is also appropriately known as a \term{fork} because essentially it results in partitioning of the network. 295 296 The heart of the matter has to do with the trust any single agent has is in the system. In \cite{yellowpaper} Section 1.1 (Driving Factors) we read: 297 \begin{quote} 298 Overall, I wish to provide a system such that users can be guaranteed that no matter with which other individuals, systems or organizations they interact, they can do so with absolute confidence in the possible outcomes and how those outcomes might come about. 299 \end{quote} 300 301 The idea of ``absolute confidence" here seems important, and we attempt to understand it more formally and generally for distributed systems. 302 303 \begin{enumerate} 304 \item Let $\Psi_\alpha$ be a measure of the confidence an agent has in various aspects of the system it participates in, where $0 \leq \Psi \leq 1$, 0 represents no confidence, and 1 represents absolute confidence. 305 \item Let $R_n = \{\alpha_1,\alpha_2,...\dots\}$ define a set of aspects about the system with which an agent $n \in N$ measures confidence. Call $R_n$ the \term{requirements} of $n$ with respect to $\Omega$. 306 \item Let $\varepsilon_n(\alpha)$ be a thresholding function for node $n \in N$ with respect to $\alpha$ such that when $\Psi_\alpha < \varepsilon(\alpha)$ then $n$ will either stop participating in the system, or reject the participation of others (resulting in a fork). 307 308 \item Let $R_\mathrm{A}$ and Let $R_\mathrm{C}$ be partitions of $R$ where 309 \begin{equation} 310 \begin{split} 311 \forall \alpha \in R_A:\varepsilon(\alpha)=1\\ 312 \forall \alpha \in R_C:\varepsilon(\alpha)<1 313 \end{split} 314 \end{equation} 315 so any value of $\Psi \neq 1$ is rejected in $R_\mathrm{A}$ and any value $\Psi < \varepsilon(\alpha)$ is rejected in $R_\mathrm{C}$. Call $R_\mathrm{A}$ the \term{absolute requirements} and $R_\mathrm{C}$ the \term{considered requirements}. 316 \end{enumerate} 317 318 So we have formally separated system characteristics that we have absolute confidence in ($R_A$) from those we only have considered confidence in ($R_C$). Still unclear is how to measure a concrete confidence level $\Psi_\alpha$. In real-world contexts and for real-world decisions, confidence is mainly dependent on an (human) agent's vantage point, set of data at hand, and maybe even intuition. Thus we find it more adequate to call it a soft criteria. In order to comprehend this concept objectively and relate it to the notion conveyed by Woods in the quote above, we proceed by defining the measure of confidence of an aspect $\alpha$ as the conditional probability of it being the case in a given context: 319 \begin{equation} 320 \Psi_\alpha \equiv \mathcal{P}(\alpha | \mathcal{C}) 321 \end{equation} 322 where the context $\mathcal{C}$ models all other information available to the agent, including basic and intuitive assumptions. 323 324 Consider the fundamental example of cryptographically signed messages with asymetric keys as applied throughout the field of cryptographic systems (basically what coins the term crypto-currency). The central aspect in this context 325 we call $\alpha_{signature}$ which provides us with the ability to \textit{know with certainty} that a given message's real author $Author_{real}$ is the same agent indicated solely via locally available data in the message's meta information through the cryptographic signature $Author_{local}$. We gain this confidence because we deem it \textit{very hard} for any agent not in possession of the private key to create a valid signature for a given message. 326 \begin{equation} 327 \alpha_{signature} \equiv Author_{real} = Author_{local} 328 \end{equation} 329 330 The appeal of this aspect is that we can check authorship locally, i.e., without the need of a 3rd party or direct trusted communication channel to the real author. 331 But, the confidence in this aspect of a certain cryptographic system depends on the context $\mathcal{C}$: 332 \begin{equation} 333 \Psi_{signature} = \mathcal{P}(Author_{real} = Author_{local} | \mathcal{C}) 334 \end{equation} 335 336 If we constrain the context to remove the possibility of an adversary gaining access to an agent's private key and also exclude the possible (future) existence of computing devices or algorithms that could easily calculate or brute force the key, we might then assign a (constructed) confidence level of 1, i.e., ``absolute confidence". Without such constraints on $\mathcal{C}$, we must admit that $\Psi_{signature}<1$, which real world events, for instance the Mt.Gox hack from 2014\footnote{"Most or all of the missing bitcoins were stolen straight out of the Mt. Gox hot wallet over time, beginning in late 2011" \cite{mt-gox}}, make clear. 337 338 We aim to describe these relationships in such detail in order to point out that any set $R_A$ of \textit{absolute requirements} can't reach beyond trivial statements - statements about the content and integrity of the local state of the agent itself. Following Descarte's way of questioning the confidence in every thought, we project his famous statement \textit{cogito ergo sum} into the reference frame of multi-agent systems by stating: \textbf{Agents can only have honest confidence in the fact that they perceive a certain stimulus to be present and whether any particular abstract a priori model matches that stimulus without contradiction,} i.e., that an agent sees a certain piece of data and that it \textit{is possible to interpret it in a certain way}. Every conclusion being drawn a posteriori through the application of sophisticated models of the context is dependent on assumptions about the context that are inherent to the model. This is the heart of the agent-centric outlook, and what we claim must always be taken into account in the design of decentralized multi-agent systems, as it shows that any aspect of the system as a whole that includes assumptions about other agents and non-local events must be in $R_C$, i.e., have an a priori confidence of $\Psi<1$. Facing this truth about multi-agent systems, we find little value in trying to force an absolute truth $\forall n,m \in N: \chain_n\eqbang\chain_m$ and we instead frame the problem as: 339 \\ 340 \begin{quote} 341 We wish to provide generalized means by which decentralized multi-agent systems can be built so that: 342 \begin{enumerate} 343 \item fit-for-purpose solutions can be applied in order to optimize for application contextualized confidences $\Psi_\alpha$, 344 \item violation of any threshold $\varepsilon(\alpha)$ through the actions of other agents can be detected and managed by any agent, such that 345 \item the system integrity is maintained at any point in time or, when not, there is a path to regain it (see \ref{sec:evo}). 346 \end{enumerate} 347 \end{quote} 348 349 We perceive the agent-centric solution to these requirements to be the holographic management of system-integrity within every agent/node of the system through application specific validation routines. These sets of validation rules lie at the heart of every decentralized application, and they vary across applications according to context. Every agent carefully keeps track of their representation of that portion of reality that is of importance to them - within the context of a given application that has to manage the trade-off between having high confidence thresholds $\varepsilon(\alpha)$ and a low need for resources and complexity. 350 351 For example, consider two different use cases of transactions: 352 \begin{enumerate} 353 \item receipt of an email message where we are trying to validate it as spam or not and 354 \item commit of monetary transaction where we are trying to validate it against double-spend. 355 \end{enumerate} 356 These contexts have different consequences that an agent may wish to evaluate differently and may be willing to expend differing levels of resources to validate. We designed Holochain to allow such validation functions to be set contextually per application and expose these contexts explicitly. Thus, one could conceivably build a Holochain application that deliberately makes choices in its validation functions to implement either all or partial characteristics of Blockchains. Holochain, therefore, can be understood as a framework that opens up a spectrum of decentralized application architectures in which Blockchain happens to be one specific instance at one end of this spectrum. 357 358 In the following sections we will show what categories of validation algorithms exist and how these can be stacked on top of each other in order to build decentralized systems that are able to maintain integrity without introducing an absolute truth every agent would be forced to accept or consider. 359 360 \subsubsection{Intrinsic Data Integrity} 361 \label{sec:intrinsic} 362 Every application but the most low-level routines utilize non-trivial, structured data types. 363 Structured implies the existence of a model describing how to interpret raw bits as an instance of a type and how pieces of the structure relate to each other. 364 Often, this includes certain assumptions about the set of possible values. 365 Certain value combinations might not be meaningful or violate the intrinsic integrity of this data type. 366 367 Consider the example of a cryptographically signed message $m=\{body, signature, author\}$, 368 where $author$ is given in the form of their public key. 369 This data type conveys the assumption that the three elements $body$, $signature$ and $author$ correspond to each other 370 as constrained by the cryptographic algorithm that is assumed to be determined through the definition of this type. 371 The intrinsic data integrity of a given instance can be validated just by looking at the data itself and checking the signature by 372 applying the cryptographic algorithm that constitutes the central part of the type's a priori model. 373 The validation yields a result $\in \{true,false\}$ which means that the confidence in the intrinsic data integrity is absolute, i.e. $\Psi_{intrinsic}=1$. 374 375 Generally, \textbf{we define the intrinsic data integrity} of a transaction type $\phi$ as an aspect 376 $\alpha_{\phi,intrinsic}\in R_A$, expressed through the existence of a deterministic and local 377 validation function $V_\alpha(t)$ for transactions $t\in\phi$ that does not depend on any other inputs 378 but $t$ itself. 379 380 Note how the intrinsic data integrity of the message example above does not make any assumptions about any message's real author, as the aspect $\alpha_{signature}$ from the previous section does. 381 With this definition, we focus on aspects that don't make any claims about system properties non-local to the agent under consideration, which roots the sequence of inferences that constitutes the validity and therefore confidence of a system's high-level aspects and integrity in consistent environmental inputs. 382 383 \subsubsection{Membranes \& Provenance} 384 \label{sec:membandprov} 385 386 Distributed systems must rely on mechanisms to restrict participation by nodes in processes that without such restriction would compromise systemic integrity. 387 Systems where the restrictions are based on the nodes' identity, whether that be as declared by type or authority, or collected from the history of the nodes' behaviors, are know as \term{permissioned} \cite{CaaS}. 388 Systems where these restrictions are not based on properties of the nodes themselves are known as \term{permissionless}. 389 In permissionless multi-agent systems, a principle threat to systemic integrity comes from \textit{Sybil-Attacks} \cite{sybil}, where an adversary tries to 390 overcome the system's validation rules by spawning a large number of compromised 391 nodes.\\ 392 393 However, for both permissioned and permissionless systems, mechanisms exists to gate 394 participation. 395 \label{mebandprov:membfn} 396 Formally: \\ 397 398 Let $M(n,\phi,z)$ be a binary function that evaluates whether transactions of type $\phi$ submitted by $n\in N$ are to be accepted, and where $z$ is any arbitrary extra information needed to make that evaluation. Call $M$ the \term{membrane} function, and note that it will be a component of the validation function $V(t,v)$ from the initial formalism\ref{formalism:validation}.\\ 399 400 In the case of \sbtc\ and $\Omega_{ethereum}$, $M$ ignores the value of $n$ and makes its determination solely on whether $z$ demonstrates the ``proof" in proof-of-\textit{X} be it \textit{work} or \textit{stake} which is a sufficient gating to protect against Sybil-Attacks. 401 402 Giving up the data-centric fallacy of forcing one absolute truth $\forall n,m \in N: \chain_n\eqbang\chain_m$ reveals that we can't discard transaction provenance. 403 Agent-centric distributed systems instead must rely on two central facts about data: 404 \begin{enumerate} 405 \item it originates from a source and 406 \item its historical sequence is local to that source. 407 \end{enumerate} 408 For this reason, \shc\ splits the system state data into two parts: 409 \begin{enumerate} 410 \item each node is responsible to maintain its own entire $\chain_n$ or \term{source chain} and be ready to confirm that state to other nodes when asked and 411 \item all nodes are responsible to share portions of other nodes' transactions and those transactions' meta data in their \textbf{DHT shard} - meta data includes validity status, source, and optionally the source's chain headers which provide historical sequence. 412 \end{enumerate} 413 414 Thus, the DHT provides distributed access to others' transactions and their evaluations of the validity of those transactions. 415 This resembles how knowledge gets constructed within social fields and through interaction with others, as described by the sociological theory of \textit{social constructivism}. 416 417 The properties of the DHT in conjunction with the hash function provide us with a 418 deterministically defined set of nodes, i.e., a neighborhood for every transaction. 419 One cannot easily construct a transaction such that it lands in a given neighborhood. 420 Formally: 421 \begin{equation} 422 \begin{split} 423 \forall t\in\dhtstate: \exists \eta: \mathcal{H}\rightarrow N^r\\ 424 \eta(H(t))=(n_1, n_2, \dots, n_r) 425 \end{split} 426 \end{equation} 427 where the function $\eta$ maps from the range $\mathcal{H}$ of the hash function $H$ 428 to the $r$ nodes that keep the $r$ redundant shards of the given transaction $t$ (see \ref{hc:shards}). 429 430 Having the list of nodes $\eta(H(t))$ allows an agent to compare third-party viewpoints regarding $t$, with its own and that of the transaction's source(s). 431 The randomization of the hash function $H$ ensures that those viewpoints represent 432 an unbiased sample. 433 $r$ can be adjusted depending on the application's constraints and the chosen trade-off between costs and system integrity. 434 These properties provide sufficient infrastructure to create system integrity 435 by detecting nodes that don't play by the rules - like changing the history or 436 content of their source chain. 437 In appendix \ref{apdx:trust} we detail tooling appropriate for different contexts, 438 including ones where detailed analysis of source chain history is required - 439 for example financial transaction auditing. 440 441 Depending on the application's domain, neighborhoods could become vulnerable to Sybil-Attacks because a sufficiently large percentage of compromised nodes could introduce bias into the sample used by an agent to evaluate a given transaction. 442 Holochain allows applications to handle Sybil-Attacks through domain specific 443 membrane functions. 444 Because we chose to inherently model agency within the system, 445 permission can be granted or declined in a programmatic and decentralized manner 446 thus allowing applications to appropriately land on the spectrum between permissioned and permissionless. 447 448 In appendix \ref{apdx:membranes}, we provide some membrane schemes that can be 449 chosen either for the outer 450 membrane of that application that nodes have to cross in order to talk to 451 any other node within the application or for any secondary membrane inside 452 the application. 453 That latter means that nodes could join permissionless and participate in aspects 454 of the application that are not integrity critical without further condition 455 but need to provide certain criteria in order to pass the membrane into application crucial 456 validation. 457 458 Thus, Holochain applications maintain systemic integrity without introducing 459 consensus and therefore (computationally expensive) absolute truth because 1) any single node uses provenance to independently verify any single transaction with the sources involved in that transaction and 2) because each Holochain application runs independently of all others, they are inherently permissioned by application specific rules for joining and continuing participation in that application's network. 460 These both provide the benefit that any given Holochain application can tune the expense of that validation to a contextually appropriate level. 461 462 463 \subsubsection{Gossip \& World Model} 464 \label{sec:gossip} 465 466 So far, we have focused on those parts of the validation function $V$ used to verify elments of $\chain$. However, maintaining system integrity in distributed systems also requires that nodes have mechanisms sharing information about nodes that have broken the validation rules so that they can be excluded from participation. There exist, additionally, forms of bad-acting that do not live in the content of a transaction but in the patterns of transacting that are detrimental to the system, for example, denial of service attacks. 467 468 Holochain uses gossip for nodes to share information about their own experience of the behavior of other nodes. Informally we call this information the node's \term{world model}. In this section we describe the nature of Holochain's gossip protocols and how they build and maintain a node's world model. 469 470 In \ref{dht:metrics} we described one such part of the world model, the \textit{uptime} metric and how it is used for maintaing redundant copies of entries. In \ref{mebandprov:membfn} we defined a membrane function that determines if a node shall accept a transaction and allowed that function to take arbitrary data $z$. The main source of that data comes from this world model. 471 472 473 More formally: 474 475 \begin{enumerate} 476 \item Recall that each node maintains a set $M$ of metrics $m$ about other nodes it knows about. Note that in terms of our formalism, this world model is part of each node's non-chain state data $D$. 477 \item Let $m$ be a tuple of tuples: $((\mu,c)_\text{self},(\mu,c)_\text{others})$ which record an experience $\mu$ of a node with respect to a given metric and a confidence $c$ of that exprience, both as directly experienced or as "hearsay" recieved from other nodes. 478 \item Allow a class of entries stored in $\chain_n$ be used also as a metric $m_w$ which act as a signed declaration of the experience of $n$ regarding some other node. Call such entries \term{warrants}. These warrants allow us to use the standard tooling of Holochain to make provenance based, verifyable claims about other nodes in the network, which propagate orthogonally from the usual DHT methods, via gossip to nodes that need to "hear" about these claims so as to make decisions about interacting with nodes. 479 \item $\forall m \in M$ let the function $G_\text{with}(m)$ return a set of nodes important for a node to gossip \textbf{with} defined by a probabilistc weighting that information recieved from those nodes will result in changing $m_\text{other}$. 480 \item $\forall m \in M$ let the function $G_\text{about}(m)$ return a set of nodes important for a node to gossip \textbf{about} defined by the properties of $m$. 481 \item Define subsets of $G_\text{with}(m)$ according to a correlation with what it means to have low vs. high confidence value $c$: 482 \begin{enumerate} 483 \item \textbf{Pull}: consisting of nodes about which a low confidence means a need for more frequent gossip to raise a node's confidence. Such nodes would include those for which, with respect to the given node, hold its published entries, hold entries it is also responsible for holding, are close the then node (i.e. in its lowest k-bucket), and which it relies on for routing (i.e. a subset of each k-bucket) 484 \item \textbf{Push}: consisting of nodes about which a high confidence implies a need for more frequent gossip to spread the information about that node. Such nodes would include ones for which a given node has high confidence is a bad actor, i.e. it has directly experienced bad acting, or has recevied bad actor gossipe from nodes that it has high confidence in being able to make that bad actor evaluation. 485 \end{enumerate} 486 \item \todo{TODO: describe a gossip trigger function based on the pull vs. pull distinction that demostrates when gossip happens} 487 \end{enumerate} 488 489 The computational costs of gossip depend on the set of metrics that a particular application needs to keep track of to maintain system integrity. For an application with a very strong membership membrane perhaps only $uptime$ metrics are necessary to gossip about to balance resillience. But this too may depend on apriori knowledge of the nodes involved in the application. Applications with very loose membership membranes may have a substantial number of metrics and complex membrane functions using those metrics which may require substantial compute effort. The Holochain design intentionally leaves these parameters only loosly specificed so that applications can be built fit for purpose. 490 491 \subsubsection{CALM \& Logical Monotonicity} 492 \todo{TODO: description of CALM in multi-agent systems, and how it works in our case} 493 494 495 \section{Complexity In Distributed Systems} 496 \label{sec:complexity} 497 498 In this section we discuss the complexity of our proposed architecture for decentralized systems and compare it to the increasingly adopted Blockchain pattern. 499 500 Formally describing the complexity of decentralized multi-agent systems is a non-trivial task for which more complex approaches have been suggested (\cite{multi-agent-complex}). 501 This might be the reason why there happens to be unclarity and misunderstandings within communities discussing complexity and scalability of Bitcoin for example {\cite{bitcoin-complex}}. 502 503 In order to be able to have a ball-park comparison between our approach and the current status quo in decentralized application architecture, we proceed by modeling the worst-case time complexity both for a single node $\Omega_{SystemNode}$ as well as for the whole system $\Omega_{System}$ and both as functions of the number of state transitions (i.e., transactions) $n$ and the number of nodes in the system $m$. 504 505 \subsection{Bitcoin} 506 Let $\Omega_{Bitcoin}$ be the Bitcoin network, $n$ be the number of transactions and $m$ be the number full validating nodes (i.e., \textit{miners}\footnote{For the sake of simplicity and focusing on a lower bound of the system's complexity, we are neglecting all nodes that are not crucial for the operation of the network, such as light-clients and clients not involved in the process of validation}) within $\Omega_{Bitcoin}$. 507 508 For every new transaction being issued, any given node will have to check the transaction's signature (among other checks, see. \cite{bitcoin-protocol}) and especially check if this transaction's output is not used in any other transaction to reject double-spendings, resulting in a time complexity of 509 \begin{equation} 510 c+n 511 \end{equation} 512 per transaction. The time complexity in big-O notation per node as a function of the number of transactions is therefore: 513 \begin{equation} 514 \Omega_{BitcoinNode}\in O(n^2) 515 \end{equation} 516 The complexity handled by one Bitcoin node does not \footnote{not inherently - that is more participants will result in more transactions but we model both values as separate parameters} depend on $m$ the number of total nodes of the system. But since every node has to validate exactly the same set of transactions, the system's time complexity as a function of number of transactions and number of nodes results as 517 \begin{equation} 518 \Omega_{Bitcoin}\in O(n^2m) 519 \end{equation} 520 521 Note that this quadratic time complexity of Bitcoin's transaction validation process is what creates its main bottleneck as this reduces the network's gossip bandwidth since every node has to validate every transaction before passing it along. In order to still have an average transaction at least flood through $90\%$ of the network, block size and time can't be pushed beyond 4MB and 12s respectively, according to \cite{scaling}. 522 523 \subsection{Ethereum} 524 Let $\Omega_{Ethereum}$ be the Ethereum main network, $n$ be the number of transactions and $m$ the number of full-clients within in the network. 525 526 The time complexity of processing a single transaction on a single node is a function of the code that has its execution being triggered by the given transaction plus a constant: 527 \begin{equation} 528 c+f_{tx_i}(n,m) 529 \end{equation} 530 Similarly to Bitcoin and as a result of the Blockchain design decision to maintain one single state ($\forall n,m \in N: \chain_n\eqbang\chain_m$, \textit{``This is to be avoided at all costs as the uncertainty that would ensue would likely kill all confidence in the entire system."} \cite{yellowpaper}), every node has to process every transaction being sent resulting in a time complexity per node as 531 \begin{equation} 532 c+\sum_{i=0}^n f_{tx_i}(n,m) 533 \end{equation} 534 that is 535 \begin{equation} 536 \Omega_{EthereumNode} \in O(n \cdot f_{avg}(n,m)) 537 \end{equation} 538 whereas users are incentivized to hold the average complexity $f_{avg}(n,m)$ 539 of the code being run by Ethereum small 540 since execution has to be payed for in gas and which is due to restrictions such as the \textit{block gas limit}. 541 In other words, because of the complexity $\sum_{i=0}^n f_{tx_i}(n,m)$ being burdened upon all nodes of the system, other systemic properties have to keep users from running complex code on Ethereum so as to not bump into the network's limits. 542 543 Again, since every node has to process the same set of all transactions, the time complexity of the whole system then is that of one node multiplied by $m$: 544 \begin{equation} 545 \Omega_{Ethereum} \in O(nm\cdot f_{tx_i}(n,m)) 546 \end{equation} 547 548 \subsection{Blockchain} 549 \label{sec:complex:blockchain} 550 Both examples of Blockchain systems above do need a non-trivial computational overhead in order to work at all: the proof-of-work, hash-crack process also called \textit{mining}. Since this overhead is not a function of either the number of transactions nor directly of the number of nodes, it is often omitted in complexity analysis. With the total energy consumption of all Bitcoin miners today being greater than the country of Iceland \cite{mining-consumption}, neglecting the complexity of Blockchain's consensus algorithm seems like a silly mistake. 551 552 Blockchains set the block time, the average time between two blocks, as a fixed parameter that the system keeps in homeostasis by adjusting the hash-crack's difficulty according to the network's total hash-rate. For a given network with a given set of mining nodes and a given total hash-rate, the complexity of the hash-crack is constant. But as the system grows and more miners come on-line, which increases the networks total hash-rate, the difficulty needs to increase in order to keep the average block time constant. 553 554 With this approach, the benefit of a higher total hash-rate $x_{HR}$ is an increased difficulty of an adversary to influence the system by creating biased blocks (which would render this party able to do double-spend attacks). That is why Blockchains have to subsidize mining, depending on a high $x_{HR}$ as to make it economically impossible for an attacker to overpower the trusted miners. 555 556 So, there is a direct relationship between the network's total trusted hash-rate and its level of security against mining power attacks. 557 This means that the confidence $\Psi_{Blockchain}$ any agent can have in the integrity of the system is a function of the system's hash-rate $x_{HR}$, and more precisely, the cost/work $cost(x_{HR})$ needed to provide it. 558 Looking only at a certain transaction $t$ and given any hacker acts economically rationally only, the confidence in $t$ being added to all $\chain_n$ has an upper bound in 559 \begin{equation} 560 \Psi_{Blockchain}(t) < min\left(1, \frac{cost(x_{HR})}{value(t)}\right) 561 \end{equation} 562 563 In order to keep this confidence unconstrained by the mining process and therefore the architecture of Blockchain itself, $cost(x_{HR})$ (which includes the setup of mining hardware as well as the energy consumption) has to grow linearly with the value exchanged within the system. 564 565 \subsection{Holochain} 566 Let $\Omega_{HC}$ be a given Holochain system, let $n$ be the sum of all public\footnote{private (see:\ref{private}) state transitions, i.e., that are confined to a local $\chain_n$, are completely within the scope of a node's agency and don't affect other parts of the system directly and can therefore be omitted for the complexity analysis of $\Omega_{HC}$ as a distributed system} (i.e., \textit{put} to the DHT) state transitions (\textit{transactions}), let all agents in $\Omega_{HC}$ trigger in total, and let $m$ be the number of agents (= nodes) in the system. 567 568 Putting a new entry to the DHT involves finding a node that is responsible for holding that specific entry, which in our case according to \cite{kademlia} has a time complexity of \begin{equation} 569 c+\lceil{log(m)}\rceil. 570 \end{equation} 571 After receiving the state transition data, this node will gossip with its $q$ neighbors which will result in $r$ copies of this state transition entry being stored throughout the system - on $r$ different nodes. Each of these nodes has to validate this entry which is an application specific logic of which the complexity we shall call $v(n, m)$. 572 573 Combined, this results in a system-wide complexity per state transition as given with 574 \begin{equation} 575 \underbrace{c+\lceil{log(m)}\rceil}_{DHT lookup} 576 + q + r \cdot 577 \underbrace{v(n,m)}_{validation} 578 \end{equation} 579 which implies the following whole system complexity in $O$-notation 580 \begin{equation} 581 \Omega_{Holochain} \in O(n\cdot(log(m) + v(n,m)) 582 \end{equation} 583 584 Now, this is the overall system complexity. In order to enable comparison, we reason that in the case of Holochain without loss of generality (i.e., dependent on the specific Holochain application), the load of the whole system is shared equally by all nodes. Without further assumptions, for any given state transition, the probability of it originating at a certain node is $\frac{1}{m}$, so the term for the lookup complexity needs to be divided by $m$ to describe the average lookup complexity per node. Other than in Blockchain systems where every node has to see every transaction, for the vast majority of state transitions one particular node is not involved at all. The stochastic closeness of the node's public key's hash with the entry's hash is what triggers the node's involvement. We assume the hash function $H$ to show a uniform distribution of hash values which results in the probability of a certain node being one of the $r$ nodes that cannot discard this entry to be $\frac{1}{m}$ times $r$. The average time complexity being handled by an average node then is 585 \begin{equation} 586 \Omega_{HolochainNode} \in 587 O\left(\frac{n}{m}\cdot\left(log(m) + v(n,m)\right)\right) 588 \end{equation} 589 Note that the factor $\frac{n}{m}$ represents the average number of state transactions per node (i.e., the load per node) and that though this is a highly application specific value, it is an \textit{a priori }expected lower bound since nodes have to process at least the state transitions they produce themselves. 590 591 The only overhead that is added by the architecture of this decentralized system is the node look-up with its complexity of $log(m)$. 592 593 The unknown and also application specific complexity $v(n,m)$ of the validation routines is what could drive up the whole system's complexity still. And indeed it is conceivable to think of Holochain applications with a lot of complexity within their validation routines. It is basically possible to mimic Blockchain's consensus validation requirement by enforcing that a validating node communicates with all other nodes before adding an entry to the DHT. It could as well only be half of all nodes. And there surely is a host of applications with only little complexity - or specific state transitions within an application that involve only little complexity. \textit{In a Holochain app one can put the complexity where it is needed and keep the rest of the system fast and scalable.} 594 595 In section \ref{sec:usecases} we proceed by providing real-world use cases and showing how non-trivial Holochain applications can be built that get along with a validation complexity of $O(1)$, resulting in a total time complexity per node in $O(log(m))$ and a high enough confidence in integrity without introducing proof-of-work at all. 596 597 \section{Use Cases} 598 \label{sec:usecases} 599 600 Now we present a few use cases of applications built on Holochain, considering the context of the use case and how it affects both complexity and evaluation of integrity and thus validation design. 601 602 \subsection{Social Media} 603 Consider a simple implementation of micro-blogging using Holochain where: 604 \begin{enumerate} 605 \item $F_\mathrm{I}=\{f_\mathrm{post}(text,node),f_\mathrm{follow}(node),f_\mathrm{read}(text)\}$ and 606 607 \item $F_\mathrm{V}=\{f_\mathrm{isOriginator}\}$ 608 \end{enumerate} 609 \todo{describe $O(1)$ complexity} 610 611 \subsection{Identity} 612 \todo{DPKI} 613 \subsection{Money} 614 \todo{mutual-credit vs. coins} 615 where the complexity of the transaction is higher, complexity may be $O(n^2)$ or $O(log(n))$ see holo currency white paper: \cite{holocurrency} 616 617 618 \section{Implementation} 619 \label{sec:implementation} 620 621 At the time of this writing we have a fully operational implementation of system as described in this paper, that includes two separate virtual machines for writing \hcdna\ functions in JavaScript, or Lisp, along with proof-of-concept implementations of a number of applications including a twitter clone, a slack-like chat system, DPKI, and a set mix-in libraries useful for building applications. 622 623 \begin{enumerate} 624 \item 30k+ lines of go code. 625 \item DHT: customized version of libp2p/ipfs's kademlia implementation. 626 \item Network Transport: libp2p including end-to-end encryption. 627 \item Javascript Virtual Machine: otto \\\url{https://github.com/robertkrimen/otto}. 628 \item Lisp Virtual Machines: zygomys \\\url{https://github.com/glycerine/zygomys}. 629 \end{enumerate} 630 631 Additionally we have created a benchmarking suite to examine the processing, bandwidth and storage used in various scenarios, and compared these with Ethereum applications in similar scenarios. These can be seen here: \\\url{https://github.com/holochain/benchmarks} 632 633 We have yet to implement scalability tests for large scale applications, but it is in our roadmap.\todo{TODO} 634 635 \appendix 636 637 \section{$DHT_\text{hc}$} 638 \label{apdx:dhtfn} 639 \begin{enumerate} 640 641 \item $dht_\text{putLink}(base,link,tag)$ where $base$ and $link$ are keys and where $tag$ is an arbitrary string, which associates the tuple \{link,tag\} with the key $base$. 642 \item $dht_\text{getLinks}(base,tag)$ where $base$ is a key keys and where $tag$ is an arbitrary string, which returns the set of links on $base$ identified by $tag$. 643 \item $dht_\text{mod}(key,newkey)$ where $key$ and $newkey$ are keys, which adds $newkey$ as a modifier of $\sigma_\text{key} \in \dhtstate$ and calls $dht_\text{putLink}(key,newkey,``replacedby")$. 644 \item $dht_\text{del}(key)$ where $key$ is a key, and marks $\sigma_\text{key} \in \dhtstate$ as deleted. 645 \item \todo{modification to $dht_\text{get}$ re mod \& del}. 646 \end{enumerate} 647 648 \section{$\sysfns$} 649 \label{apdx:sysfn} 650 \begin{enumerate} 651 \item \todo{all the other sys functions...} 652 \end{enumerate} 653 654 \section{Patterns of Trust Management} 655 \label{apdx:trust} 656 657 Tools in Holochain available to app developers for use in Considered Requirements, some of which are also used at the system level and globally parameterized for an application: 658 \begin{enumerate} 659 \item Countersigning \todo{TODO} 660 \item Notaries \todo{TODO -- ``The network is the notary."} 661 \item Publish Headers \todo{e.g. for chain-rollback detection} 662 \item Source-chain examination. \todo{TODO} 663 \item Blocked-lists. \todo{e.g. DDOS, spam, etc} 664 \item ... \todo{more here...} 665 \end {enumerate} 666 667 \section{Membranes} 668 \label{apdx:membranes} 669 670 \begin{itemize} 671 \item \textit{Invitation}\\ 672 One of the most natural approaches for membrane crossing in a space in which 673 agents provide identity is to rely on invitation by agents that are already 674 in the membrane. This could be invitation: 675 \begin{itemize} 676 \item by anyone 677 \item by an admin (that could either be set in the application's DNA or a 678 variable shared within the DHT - both could be mutable or constant) 679 \item by multiple users (applying social triangulation) 680 \end{itemize} 681 \item \textit{Proof-of-Identity / Reputation}\\ 682 Given the presence of other applications/chains, these can be used to attach the 683 identity and its reputation in that chain to the agent that wants to join. 684 Since this seems to be a crucial pillar of the ecosystem of Holochain 685 applications, we plan to deliver a system-level application called DPKI 686 (distributed public key infrastructure) that will function as the main 687 identity and reputation platform. 688 A prototype of this app was already developed prior to the writing of 689 this paper. 690 \item \textit{Proof-of-Presence}\\ 691 Use of notarized national documents/passports/identity cards within the agent 692 entry (second entry in $\chain$). 693 \item \textit{Proof-of-Service}\\ 694 Cryptographic proof of delivery of a service / hosting of an application. 695 We intend to leverage this technique with our distributed cloud hosting 696 application \textbf{Holo}, which we will build on top of Holochain. 697 See our Holo Hosting white paper for much more detail \cite{hosting-wp}. 698 \item \textit{Proof-of-Work}\\ 699 If the application's requirement is not anonymity, other than the 700 cryptographic hash-cracking work applied in most of the Blockchains, 701 this could also be useful work that new members are asked to contribute 702 to the community 703 or a puzzle to proof domain knowledge. Examples are: 704 \begin{itemize} 705 \item Test for knowledge about local maps to proof citizenship 706 \item DNA sequencing 707 \item Protein folding 708 \item SETI 709 \item Publication of scientific article 710 \end{itemize} 711 \item \textit{Proof-of-Stake / Payment}\\ 712 Depost or payment to have agent certified. 713 \item \textit{Immune System}\\ 714 Blacklisting of nodes that don't play by the application rules. 715 716 \end{itemize} 717 718 \begin{acknowledgments} 719 720 We thank Steve Sawin for his review of this paper, \LaTeXe support and so much more...\dots 721 722 \end{acknowledgments} 723 724 \bibliographystyle{alpha} 725 \begin{thebibliography}{9} 726 727 \bibitem[DUPONT]{dupont} 728 Quinn DuPont. 729 \textit{Experiments in Algorithmic Governance: A history and ethnography of “The DAO,” a failed Decentralized Autonomous Organization} 730 \\\url{http://www.iqdupont.com/assets/documents/DUPONT-2017-Preprint-Algorithmic-Governance.pdf} 731 732 \bibitem[EIP-150]{yellowpaper} 733 Gavin Wood. 734 \textit{Ethereum: A Secure Decentralised Generalised Transaction Ledger}. 735 \\\url{http://yellowpaper.io/} 736 737 \bibitem[Kademlia]{kademlia} 738 Petar Maymounkov and David Mazieres 739 \textit{Kademlia: A Peer-to-peer Information System Base on the XOR Metric} 740 \\\url{https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf} 741 742 \bibitem[Zhang13]{dht} 743 Zhang, H., Wen, Y., Xie, H., Yu, N. 744 \textit{ 745 Distributed Hash Table 746 Theory, Platforms and Applications} 747 748 \bibitem[Croman et al 16]{scaling} 749 Kyle Croman, Christian Decker, Ittay Eyal, Adem Efe Gencer, Ari Juels, Ahmed Kosba, Andrew Miller, Prateek Saxena, Elaine Shi, Emin Gün Sirer, Dawn Song, Roger Wattenhofer, 750 \textit{On Scaling Blockchains}, 751 Financial Cryptography and Data Security, 752 Springer Verlag 2016 753 754 \bibitem[Bitcoin Reddit]{bitcoin-complex} 755 /u/mike\_hearn, /u/awemany, /u/nullc et al. 756 \\\url{https://www.reddit.com/r/Bitcoin/comments/3a5f1v/mike_hearn_on_those_who_want_all_scaling_to_be/csa7exw/?context=3&st=j8jfak3q&sh=6e445294} 757 Reddit discussion 758 2015 759 760 \bibitem[Marir2014]{multi-agent-complex} 761 Marir, Toufik and Mokhati, Farid and Bouchelaghem-Seridi, Hassina and Tamrabet, Zouheyr", 762 \textit{Complexity Measurement of Multi-Agent Systems"}, 763 Multiagent System Technologies: 12th German Conference, MATES 2014, Stuttgart, Germany, September 23-25, 2014. Proceedings, 764 Springer International Publishing 765 2014 766 \\\url{https://doi.org/10.1007/978-3-319-11584-9_13} 767 768 \bibitem[Coppock17]{mining-consumption} 769 Mark Coppock 770 \textit{THE WORLD’S CRYPTOCURRENCY MINING USES MORE ELECTRICITY THAN ICELAND} 771 \\\url{https://www.digitaltrends.com/computing/bitcoin-ethereum-mining-use-significant-electrical-power/} 772 773 \bibitem[BitcoinWiki]{bitcoin-protocol} 774 \textit{Bitcoin Protocol} 775 \\\url{https://en.bitcoin.it/wiki/Protocol_rules#.22tx.22_messages} 776 Bitcoin Wiki 777 778 \bibitem[IPFS]{ipfs} 779 Juan Benet 780 \textit{IPFS - Content Addressed, Versioned, P2P File System (DRAFT 3)} 781 \\\url{https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf} 782 783 \bibitem[LibP2P]{libp2p} 784 Juan Benet, David Dias 785 \textit{libp2p Specification} 786 \\\url{https://github.com/libp2p/specs} 787 788 \bibitem[Oxford]{provenance} 789 Oxford 790 Online dictionary 791 \\\url{https://en.oxforddictionaries.com/definition/provenance} 792 793 \bibitem[Douceur02]{sybil} 794 Douceur, John R. (2002). 795 "The Sybil Attack" 796 \\\url{https://www.microsoft.com/en-us/research/publication/the-sybil-attack/?from=http%3A%2F%2Fresearch.microsoft.com%2Fpubs%2F74220%2Fiptps2002.pdf} 797 International workshop on Peer-To-Peer Systems. Retrieved 23 April 2016. 798 799 800 \bibitem[HoloCurrency]{currency-wp} 801 Arthur Brock and Eric Harris-Braun 2017 802 \textit{Holo: Cryptocurrency Infrastructure 803 for Global Scale and Stable Value} 804 \\\url{https://holo.host/holo-currency-wp/} 805 806 \bibitem[Nilsson15]{mt-gox} 807 Nilsson, Kim (19 April 2015). 808 \textit{The missing MtGox bitcoins". Retrieved 10 December 2015.} 809 \\\url{http://blog.wizsec.jp/2015/04/the-missing-mtgox-bitcoins.html} 810 811 \bibitem[Swanson15]{CaaS} 812 Tim Swanson 813 \textit{Consensus-as-a-service: a brief report on the emergence of permissioned, distributed ledger systems} 814 April 6, 2015 815 \\\url{https://pdfs.semanticscholar.org/f3a2/2daa64fc82fcda47e86ac50d555ffc24b8c7.pdf} 816 817 \end{thebibliography} 818 819 820 \end{document}