github.com/holochain/holochain-proto@v0.1.0-alpha-26.0.20200915073418-5c83169c9b5b/holochain.tex

github.com/holochain/holochain-proto@v0.1.0-alpha-26.0.20200915073418-5c83169c9b5b/holochain.tex (about)

     1  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     2  % writeLaTeX Example: Academic Paper Template
     3  %
     4  % Source: http://www.writelatex.com
     5  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     6  \documentclass[twocolumn,showpacs,%
     7    nofootinbib,aps,superscriptaddress,%
     8    eqsecnum,prd,notitlepage,showkeys,10pt]{revtex4-1}
     9  
    10  \usepackage{amssymb}
    11  \usepackage{amsmath}
    12  \usepackage{graphicx}
    13  \usepackage{dcolumn}
    14  \usepackage{hyperref}
    15  \usepackage{draftwatermark}
    16  \usepackage{enumitem}
    17  \usepackage{setspace} % for \onehalfspacing and \singlespacing macros
    18  \usepackage{etoolbox}
    19  \AtBeginEnvironment{quote}{\singlespace\vspace{-\topsep}\small}
    20  \AtEndEnvironment{quote}{\vspace{-\topsep}\endsinglespace}
    21  \usepackage{url}
    22  \SetWatermarkText{Draft}
    23  \SetWatermarkColor[gray]{0.9}
    24  \SetWatermarkScale{5}
    25  
    26  
    27  \newtheorem{itemlet}{}
    28  
    29  \begin{document}
    30  
    31  % macros
    32  \newcommand\todo[1]{\textcolor{red}{#1}}
    33  \newcommand\term[1]{\textbf{\textit{#1}}}
    34  \newcommand{\sbtc}{$\Omega_{\text{bitcoin}}$}
    35  \newcommand{\sgit}{$\Omega_{\text{git}}$}
    36  \newcommand{\shc}{$\Omega_{\text{hc}}$}
    37  \newcommand{\hcdna}{$\text{DNA}$}
    38  \newcommand{\hcid}{\iota}
    39  \newcommand{\dhtget}{\texttt{get}}
    40  \newcommand{\dhtput}{\texttt{put}}
    41  \newcommand{\dhtstate}{\Delta}
    42  \newcommand{\dhtfns}{F_\mathrm{DHT}}
    43  \newcommand{\sysfns}{F_\mathrm{sys}}
    44  \newcommand{\appfns}{F_\mathrm{app}}
    45  \newcommand{\hcdht}{DHT_\text{hc}}
    46  \newcommand{\chain}{\mathcal{X}}
    47  \newcommand{\eqbang}{\stackrel{!}{=}}
    48  
    49  \title{Holochain \\
    50  \small scalable agent-centric distributed computing\\DRAFT(ALPHA 1) -- 2/15/2018}
    51  
    52  \author{Eric Harris-Braun, Nicolas Luck, Arthur Brock}
    53  
    54  \affiliation{Ceptr, LLC}
    55  
    56  \begin{abstract}
    57  ABSTRACT : We present a scalable, agent-centric distributed computing platform.  We use a  formalism to characterize distributed systems, show how it applies to some existing distributed systems, and demonstrate the benefits of shifting from a data-centric to an agent-centric model. We present a detailed formal specification of the Holochain system, along with an analysis of its systemic integrity, capacity for evolution, total system computational complexity, implications for use-cases, and current implementation status.
    58  
    59  \end{abstract}
    60  
    61  \maketitle
    62  
    63  \section{Introduction}
    64  \label{sec:intro}
    65  
    66  Distributed computing platforms have achieved a new level of viability with the advent of two foundational cryptographic tools: secure hashing algorithms, and public-key encryption.  These have provided solutions to key problems in distributed computing: verifiable, tamper-proof data for sharing state across nodes in the distributed system and confirmation of data provenance via digital signature algorithms.  The former is achieved by hash-chains, where monotonic data-stores are rendered intrinsically tamper-proof (and thus confidently sharable across nodes) by including hashes of previous entries in subsequent entries.  The latter is achieved by combining cryptographic encryption of hashes of data and using the public keys themselves as the addresses of agents, thus allowing other agents in the system to mathematically verify the data's source.
    67  
    68  Though hash-chains help solve the problem of independently acting agents reliably sharing state, we see two very different approaches in their use which have deep systemic consequences.  These approaches are demonstrated by two of today's canonical distributed systems: \begin{enumerate}
    69  \item git\footnote{https://git-scm.com/about}:  In git, all nodes can update their hash-chains as they see fit.  The degree of overlapping shared state of chain entries (known as commit objects) across all nodes is not managed by git but rather explicitly by action of the agent making pull requests and doing merges.  We call this approach \term{agent-centric} because of its focus on allowing nodes to share independently evolving data realities.
    70  \item Bitcoin\footnote{https://bitcoin.org/bitcoin.pdf}: In Bitcoin (and blockchain in general), the ``problem" is understood to be that of figuring out how to choose one block of transactions among the many variants being experienced by the mining nodes (as they collect transactions from clients in different orders), and committing that single variant to the single globally shared chain. We call this approach \term{data-centric} because of its focus on creating a single shared data reality among all nodes.
    71  \end{enumerate}
    72  
    73  We claim that this fundamental original stance results directly in the most significant limitation of the blockchain: scalability.  This limitation is widely known \footnote{add various sources} and many solutions have been offered \footnote{more footnotes here}.  Holochain offers a way forward by directly addressing the root data-centric assumptions of the blockchain approach.
    74  
    75  \section{Prior Work}
    76  This paper builds largely on recent work in cryptographic distributed systems and distributed hash tables and multi-agent systems.
    77  
    78  Ethereum: Wood \cite{yellowpaper}, DHT: \cite{kademlia}  Benet \cite{ipfs}
    79  
    80  
    81  \todo{TODO: discussion and more references here}
    82  
    83  \section{Distributed Systems}
    84  \label{sec:data-centric-systems}
    85  
    86  \subsection{Formalism}
    87  \label{sec:formalism}
    88  
    89  We define a simple generalized model of a distributed system $\Omega$ using hash-chains as follows:
    90  \begin{enumerate}
    91  \item Let $N$ be the set of elements $\{n_1,n_2,\dots n_n\}$ participating in the system. Call the elements of $N$ \term{nodes} or \term{agents}.
    92  \item Let each node $n$ consist of a set $S_n$ with elements $\{\sigma_1,\sigma_2,\dots\}$. Call the elements of $S_n$ the \term{state} of node $n$. For the purposes of this paper we assume $\forall \sigma_i \in S_n : \sigma_i = \{ \chain_i, D_i\}$ with $\chain_i$ being a \term{hash-chain} and $D$  a set of non-hash chain \term{data elements}.
    93  \item Let $H$ be a cryptographically secure hash function.
    94  \item Let there be a \term{state transition function}:
    95  \begin{equation}
    96  \tau(\sigma_i, t) = (\tau_\chain(\chain_i, t), \tau_D(D_i,t))
    97  \end{equation}
    98  where:
    99  \begin{enumerate}
   100  \item
   101  $\tau_\chain(\chain_i,t) = \chain_{i+1}$ where
   102  \begin{equation}
   103  \begin{split}
   104   \chain_{i+1} &= \chain_{i} \cup \{x_{i+1}\} \\
   105    &= \{x_1, \dots, x_i, x_{i+1}\}
   106  \end{split}
   107  \end{equation}
   108  with
   109  \begin{equation}
   110  \begin{split}
   111  x_{i+1} &= \{h,t\} \\
   112  h &= \{ H(t),y\} \\
   113  y &= \{H(x_j) | j<i\}
   114  \end{split}
   115  \end{equation}
   116  
   117  Call $h$ a \term{header} and note how the sequence of headers creates a chain (tree, in the general case) by linking each header to the previous header(s) and the transaction.
   118  \item $D_\text{i+1}=\tau_\mathrm{D}(\sigma_i,t)$
   119  
   120  \end{enumerate}
   121  \item Let $V(t,v)$ be a function that takes $t$, along with extra validation data $v$, verifies the validity of $t$ and only if valid calls a transition function for $t$. Call $V$ a \term{validation} function.
   122  \label{formalism:validation}
   123  \item Let $I(t)$ be a function that takes a transaction $t$, evaluates it using a function $V$, and if valid, uses $\tau$ to transform $S$. Call $I$ the \term{input} or \term{stimulus} function.
   124  \item Let $P(x)$ be a function that can create transactions $t$ and trigger functions $V$ and $\tau$, and $P$ itself is triggered by state changes or the passage of time.  Call $P$ the \term{processing} function.
   125  \item Let $C$ be a channel that allows all nodes in $N$ to communicate and over which each node has a unique address $A_n$. Call $C$ and the nodes that communicate on it the \term{network}.
   126  \item Let $E(i)$ be a function that changes functions $V,I,P$.  Call $E$ the \term{evolution} function.
   127  \end{enumerate}
   128  
   129  Explanation: this formalism allows us to model separately key aspects of agents.
   130  
   131  First we separate the agent's state into a cryptographically secured hash-chain part $\chain$ and another part that holds arbitrary data $D$. Then we split the process of updating the state into two steps: 1) the validation of new transactions $t$ through the validation function $V(t,v)$, and 2) the actual change of internal state $S$ (as either $\chain$ or $D$) through the state transition functions $\tau_\chain$ and $\tau_D$. Finally, we distinguish between 1) state transitions triggered by external events, stimuli, received through $I(t)$, and 2) a node's internal processing $P(x)$ that also results in calling $V$ and $\tau$ with an internally created transaction.
   132  
   133  We define some key properties of distributed systems:
   134  \begin{enumerate}
   135  \item Call a set of nodes in $N$ for which any of the functions $T,V,P$ and $E$ have the properties of being both reliably known and also known to be identical for that set of nodes: \term{trusted} nodes with respect to the functions so known.
   136  \item Call a channel $C$ with the property that messages in transit can be trusted to arrive exactly as sent: \term{secure}.
   137  \item Call a channel $C$ on which the address $A_n$ of a node $n$ is $A_n=H(pk_n)$, where $pk_n$ is the public key of the node $n$, and on which all messages include a digital signature of the message signed by sender: \term{authenticated}.
   138  
   139  \item Call a data element that is accessible by its hash \term{content addressable}.
   140  \end{enumerate}
   141  For the purposes of this paper we assume untrusted nodes, i.e., independently acting agents solely under their own control, and an insecure channel.  We do this because the very \textit{raison d'\^etre} of the cryptographic tools mentioned above is to allow individual nodes to trust the whole system under this assumption.  The cryptography immediately makes visible in the state data when any other node in the system uses a version of the functions different from itself.  This property is often referred to as a \term{trustless} system.  However, because it simply means that the locus of trust has been shifted to the state data, rather than other nodes, we refer to it as systemic reliance on \term{intrinsic data integrity}. See \ref{sec:integrity} for a detailed discussion on trust in distributed systems.
   142  
   143  \subsection{Data-Centric and Agent-Centric Systems}
   144  \label{sec:data-agent}
   145  
   146  Using this definition, Bitcoin can be understood as that system \sbtc where:
   147  \begin{enumerate}
   148  \item $\forall n,m \in N: \chain_n\eqbang\chain_m$ where $\eqbang$ means \textit{is enforced}.
   149  \item $V(e,v)$ $e$ is a block and $v$ is the output from the ``proof-of-work" hash-crack algorithm, and $V$ confirms the validity of $v$, the structure and validity of $e$ according to the double-spend rules\footnote{pointer here}.
   150  \item $I(t,n)$ accepts transactions from clients and adds them to $D$ (the \textit{mempool}) to build a block for later use in triggering $V()$.
   151  \item $P(i)$ is the \textit{mining} process including the ``proof-of-work" algorithm and composes with $V()$ and $\tau_\chain$ when the hash is cracked.
   152  \item $E(i)$ is not formally defined but can be mapped informally to a decision by humans operating the nodes to install new versions of the Bitcoin software.
   153  \end{enumerate}
   154  
   155  The first point establishes the central aspect of Bitcoin's (and Blockchain applications' in general) strategy for solving or avoiding problems otherwise encountered in decentralized systems, and that is by trying to maintain a network state in which all nodes \textbf{should} have the same (local) chain.
   156  
   157  By contrast, for \sgit there is no such constraint on any $\chain_n$, $\chain_m$ in nodes $n$ and $m$ matching, as git's core intent is to allow different agents act autonomously and divergently on a shared code-base, which would be impossible if the states always had to match.
   158  
   159  Through the lens of the formalism some other aspects of \sgit can be understood as follows:
   160  \begin{enumerate}
   161  \item the validation function $V(e,v)$ by default only checks the structural validity of $e$ as a commit object not it's content (though note that git does also support signing of commits which is also part of the validation)
   162  \item the stimulus function $I(t)$ for \sgit\ consists of the set of git commands available to the user
   163  \item the state transition function $\tau_\chain$ is the internal git function that adds a commit object and $\tau_\textrm{D}$ is the git function that adds code to the \texttt{index} triggered by \texttt{add}
   164  \item $E$ is, similarly to \sbtc, not formally defined for \sgit.
   165  \end{enumerate}
   166  
   167  We leave a more in depth application of the formalism to \sgit\ as an excercise for the reader, however we underscore that the core difference between  \sbtc\ and \sgit\ lies in the formers constraint of $\forall n,m \in N: \chain_n\eqbang\chain_m$.  One direct consequence of this for \sbtc\ is that as the size of $\mathcal{X}_n$ grows, necessarily all nodes of \sbtc\ must grow in size, whereas this is not necessarily the case for \sgit\, and in it lies the core of Bitcoin's scalability issues.
   168  
   169  It's not surprising that a data-centric approach was used for Bitcoin.  This comes from the fact that its stated intent was to create digitally transferable ``coins," i.e., to model in a distributed digital system that property of matter known as location. On centralized computer systems this doesn't even appear as a problem because centralized systems have been designed to allow us to think from a data-centric perspective.  They allow us to believe in a kind of data objectivity, as if data exists, like a physical object sitting someplace having a location. They allow us to think in terms of an absolute frame - as if there \textit{is} a correct truth about data and/or time sequence, and suggests that ``consensus" should converge on this truth. In fact, this is not a property of information.  Data exists always from the vantage point of an observer.  It is this fact that makes digitally transferable ``coins" a \textit{hard problem} in distributed systems which consist entirely of multiple vantage points by definition.
   170  
   171  In the distributed world, events don't happen in the same sequence for all observers.  For Blockchain specifically, this is the heart of the matter: choosing which block, from all the nodes receiving transactions in different orders, to use for the ``consensus," i.e., what single vantage point to enforce on all nodes.  Blockchains don't record a universal ordering of events -- they manufacture a single authoritative ordering of events -- by stringing together a tiny fragment of local vantage points into one global record that has passed validation rules.
   172  
   173  The use of the word consensus seems at best dubious as a description of a systemic requirement that all nodes carry identical values of $\chain_n$.  Especially when the algorithm for ensuring that sameness is essentially a digital lottery powered by expensive computation of which the primary design feature is to randomize which node gets to run $V_n$ such that no node has preference to which $e$ gets added to $\chain_n$.
   174  
   175  The term consensus, as normally used, implies deliberation with regard to differences and work on crafting a perspective that holds for all parties, rather than simply selecting one party's dataset at random.  In contrast, as a more agent-centric distributed system, git's \textit{merge} command provides for a processes more recognizable as consensus, however it's not automated.
   176  
   177  Perhaps a more accurate term for the hash-crack algorithm applied in \sbtc\ would be ``proof-of-luck" and for the process itself simply sameness, not consensus.  If you start from a data-centric viewpoint, which naturally throws out the ``experience" of all agents in favor of just one, it's much harder to design them to engage in processes that actually have the real-world properties of consensus. If the constraint of keeping all nodes' states the same were adopted consciously as a fit for a specific purpose, this would not be particularly problematic.  Unfortunately the legacy of this data-centric viewpoint has been held mostly unconsciously and is adopted by more generalized distributed computing systems, for which the intent doesn't specifically include the need to model ``digital matter" with universally absolute location.  While having the advantages of conceptual simplicity, it also immediately creates scalability issues, but worse, it makes it hard to take advantages inherent in the agent-centric approach.
   178  
   179  \section{Generalized Distributed Computation}
   180  \label{sec:dist-comp}
   181  
   182  The previous section described a general formalism for distributed systems and compared git to Bitcoin as an example of an agent-centric vs. a data-centric distributed system.  Neither of these systems, however, provides generalized computation in the sense of being a framework for writing computer programs or creating applications. So, lets add the following constraints to formalism~\ref{sec:formalism} as follows:
   183  
   184  \begin{enumerate}
   185  \item With respect to a machine $M$, some values of $S_n$ can be interpreted as: executable code and the results of code execution, and they may be accessible to $M$ and the code.  Call such values the \term{machine state}.
   186  \item $\exists t$ and nodes $n$ such that $I_n(t)$ will trigger execution of that code. Call such transaction values \term{calls}.
   187  \end{enumerate}
   188  
   189  \subsection{Ethereum}
   190  Ethereum\footnote{https://github.com/ethereum/wiki/wiki/White-Paper} provides the current premier example of generalized distributed computing using the Blockchain model. The Ethereum approach comes from an ontology of replicating the data certainty of single physical computer, on top of the stratum of a bunch of distributed nodes using the blockchain strategy of creating a single data reality in a cryptographic chain, but commiting computations, instead of just monetary transactions as in bitcoin, into the blocks.
   191  
   192  This approach does live up to the constraints listed above as described by Wood \cite{yellowpaper} where the bulk of that paper can be understood as a specification of a validation function $V_n()$ and the described state transition function $\sigma_\text{t+1} \equiv \Upsilon(\sigma,T)$ as a specification of how constraints above are met.
   193  
   194  Unfortunately the data-centric legacy inherited by Ethereum from the blockchain model, is immediately observable in its high compute cost\footnote{link to our benchmarkng} and difficulty in scaling\footnote{find a scholarly article }.
   195  
   196  
   197  \subsection{Holochain}
   198  \label{holochain}
   199  We now proceed to describe an agent-centric distributed generalized computing system, where nodes can still confidently participate in the system as whole even though they are not constrained to maintaining the same chain state as all other nodes.
   200  
   201  In broad strokes: a Holochain application consists of a network of agents maintaining a unique source chain of their transactions, paired with a shared space implemented as a validating, monotonic, sharded, distributed hash table (DHT) where every node enforces validation rules on that data in the DHT as well as providing provenance of data from the source chains where it originated.
   202  
   203  Using our formalism, a Holochain based application \shc is defined as:
   204  
   205  \begin{enumerate}
   206  
   207  \item Call $\chain_n$ the \term{source chain} of $n$.
   208  
   209  \item Let $M$ be a virtual machine used to execute code.
   210  
   211  \item Let the initial entry of all $\chain_n$ in $N$ be identical and consist in the set \hcdna $\{e_1,e_2,\dots,f_1,f_2,\dots,p_1,p_2,\dots\}$ where $e_x$ are definitions of entry types that can be added to the chain, $f_x$ are functions defined as executable on $M$ (which we also refer to as the set $\appfns = \{app_1,app_2,\dots\}$), and $p_x$ are  system properties which among other things declare the expected operating parameters of the application being specificed.  For example the resilience factor as defined below is set as one such property.
   212  
   213  \item Let $\hcid_n$ be the second entry of all $\chain_n$ and be a set of the form $\{p,i\}$ where $p$ is the public key and $i$ is identifying information appropriate to the use of this particular \shc. Note that though this entry is of the same format for all $\chain_n$ it's content is not the same. Call this entry the \term{agent identity} entry.
   214  
   215  \item $\forall e_x \in DNA$ let there be an $app_x \in \appfns$ which can be used to validate transactions that involve entries of type $e_x$.  Call this set $F_\mathrm{v}$ or the \term{application validation functions}.
   216  
   217  \item Let there be a function $V_\mathrm{sys}(ex,e,v)$ which checks that $e$ is of the form specified by the entry definition for $e_x \in$ \hcdna.  Call this function the \term{system entry validation function}.
   218  
   219  \item Let the overall validation function $V(e,v) \equiv \bigvee_x  F_\mathrm{v}(e_x)(v) \wedge V_\mathrm{sys}(e_x,e,v)$.
   220  
   221  \item Let $F_\mathrm{I}$ be a subset of $\appfns$ distinct from $F_\mathrm{v}$ such that $\forall f_x(t) \in F_\mathrm{I}$ there exists a $t$ to $I(t)$ that will trigger $f_x(t)$. Call the functions in $F_\mathrm{I}$ the \term{exposed functions}.
   222  
   223  \item Call any functions in $\appfns$ not in $F_\mathrm{v}$ or $F_\mathrm{I}$ \term{internal functions} and allow them to be called by other functions.
   224  \item Let the channel $C$ be \term{authenticated}.
   225  
   226  \item Let $DHT$ define a distributed hash table on an authenticated channel as follows:
   227  \begin{enumerate}
   228  
   229  \item Let $\dhtstate$ be a set $\{\delta_1,\delta_2,\dots\}$ where $\delta_x$ is a set $\{key,value\}$ where $key$ is always the hash $H(value)$ of $value$.  Call $\dhtstate$ the \term{DHT state}.
   230  \item Let $\dhtfns$ be the set of functions $\{dht_\text{put},dht_\text{get}\}$ where:
   231  \begin{enumerate}
   232  \item $dht_\text{put}(\delta_\text{key,value})$ adds $\delta_\text{key,value}$ to $\dhtstate$
   233  \item $dht_\text{get}(key) = value$ of $\delta_\text{key,value}$ in $\dhtstate$
   234  \end{enumerate}
   235  \item \label{routable} Assume $x,y \in N$ and $\delta_i \in \dhtstate_x$ but $\delta_i \notin \dhtstate_y$. Allow that when $y$ calls $dht_\text{get}(key)$, $\delta_i$ will be retrieved from $x$ over channel $X$ and added to $\dhtstate_y$.
   236  \end{enumerate}
   237  DHT are sufficiently mature that there are a number of ways to ensure property \ref{routable}.  For our current alpha version we use a modified version of \cite{kademlia} as implemented in \cite{libp2p}.
   238  
   239  \item Let $\hcdht$ augment $DHT$ as follows:
   240  \begin{enumerate}
   241  
   242  \item $\forall \delta_\text{key,value} \in \dhtstate$ constrain $value$ to be of an entry type as defined in \hcdna.  Furthmore, enforce that any function call $dht_x(y)$ which modifies $\dhtstate$ also uses $F_\mathrm{v}(y)$ to validate $y$ and records whether it is valid.  Note that this validation phase may include contacting the source nodes involved in generating $y$ to gather more information about the context of the transaction, see \ref{sec:membandprov}.
   243  
   244  \item Enforce that all elements of $\dhtstate$ only be changed monotonically, that is, elements $\delta$ can only be added to $\dhtstate$ not removed.
   245  
   246  \item Include in $\dhtfns$ the functions defined in \ref{apdx:dhtfn}.
   247  
   248  \item Allow the sets $\delta \in \dhtstate$ to also include more elements as defined in  \ref{apdx:dhtfn}.
   249  
   250  \item Let $d(x,y)$ be a \textit{symmetric} and \textit{unidirectional} distance metric within the hash space defined by $H$, as for example the XOR metric defined in \cite{kademlia}. Note that this metric can be applied between entries and nodes alike since the addresses of both are values of the same hash function $H$ (i.e. $\delta_{key}=H(\delta_{value})$ and $A_n=H(pk_n)$).
   251  
   252  \item Let $r$ be a parameter of $\hcdht$ to be set dependent on the characteristics deemed beneficial for maintaining multiple copies of entries in the $DHT$ for the given application.
   253  Call $r$ the \term{resilience factor}.
   254  
   255  \label{dht:metrics}
   256  \item Allow that each node can maintain a set $M = \{m_n, \dots \}$ of metrics $m_n$ about other nodes, where each $m_n$ contains both a node's direct experience of $n$ with respect to that metric, as well as the experience of other nodes of $n$.  Enforce that one such metric kept is \textit{•}{uptime} which keeps track of the percentage of time a node is experienced to be available.   Call the process of nodes sharing these metrics \term{gossip} and refer to \ref{sec:gossip} for details.
   257  
   258  \item  Enforce that $\forall \delta \in \dhtstate_n$ each node $n$ maintains a set $V_\delta = \{n_1,\dots,n_q\}$ of $q$ closest nodes to $\delta$ as seen from $n$, which are \textit{expected by n} to also hold $\delta$. Resiliency is maintained by taking into account node uptimes and choosing the value of $q$ so that:
   259  \begin{equation}
   260  \sum_{i=0}^q uptime(n_i)\geq r
   261  \end{equation}
   262  whith $uptime(n) \in [0,1]$.
   263  
   264  Call the union of such sets $V_\delta$, from a given node's perspective, the \term{overlap list} and also note that $q\geq r$.
   265  
   266  \item \label{hc:shards} Allow every node $n$ to discard every $\delta_x \in \dhtstate_n$ if the number of closer (with regards to $d(x,y)$) nodes is greater than $q$
   267  (i.e. if other nodes are able to construct their $V_\delta$ sets without including $n$, which in turn means there are enough other nodes responsible for holding $\delta$ in their $\Delta_m$ to have the system meet the resilience set by $r$ even without $n$ participating in storing $\delta$).
   268  Note that this results in the network adapting to changes in topology and DHT state migrations by regulating the number of network-wide redundant copies of all $\delta_i\in\dhtstate$ to match $r$ according to node uptime.
   269  
   270  \end{enumerate}
   271  
   272  Call $\hcdht$ a \term{validating}, \term{monotonic}, \term{sharded} DHT.
   273  
   274  \item $\forall n \in N$ assume $n$ implements $\hcdht$, that is: $\dhtstate$ is a subset of $D$ (the non hash-chain state data), and $\dhtfns$ are available to $n$, though note that these functions are NOT directly available to the functions $\appfns$ defined in \hcdna.
   275  
   276  \item Let $\sysfns$ be the set of functions $\{sys_\text{commit},sys_\text{get}, \dots\}$ where:
   277  \begin{enumerate}
   278  \item $sys_\text{commit}(e)$ uses the system validation function $V(e,v)$ to add $e$ to $\chain$, and if successful calls $dht_\text{put}(H(e),e)$.
   279  \item $sys_\text{get}(k) = dht_\text{get}(k)$.
   280  \item see additional system functions defined in \ref{apdx:sysfn}.
   281  \end{enumerate}
   282  
   283  \item Allow the functions in $\appfns$ defined in the \hcdna\ to call the functions in $\sysfns$.
   284  \item Let $m$ be an arbitrary message. Include in $\sysfns$ the function $sys_\text{send}(A_\text{to},m)$ which when called on $n_\text{from}$ will trigger the function $app_\text{receive}(A_\text{from},m)$ in the \hcdna\ on the node $n_\text{to}$. Call this mechanism \term{node-to-node messaging}.
   285  \item \label{private} Allow that the definition of entries in \hcdna\ can mark entry types as \term{private}. Enforce that if an entry $\sigma_x$ is of such a type then $\sigma_x \notin \dhtstate$. Note however that entries of such type can be sent as node-to-node messages.
   286  \item Let the system processing function $P(i)$ be a set of functions in $\appfns$ to be registered in the system as callbacks based on various criteria, e.g. notification of rejected puts to the DHT, passage of time, etc.
   287  \end{enumerate}
   288  
   289  \subsection{Systemic Integrity Through Validation}
   290  \label{sec:integrity}
   291  
   292  The appeal of the data-centric approach to distributed computing comes from the fact that if you can prove that all nodes reliably have the same data then that provides strong general basis from which to prove the integrity of the system as a whole.  In the case of Bitcoin, the $\chain$ holds the transactions and the unspent transaction outputs, which allows nodes to verify future transactions against double-spend. In the case of Ethereum, $\chain$ holds what ammounts to pointers to machine state. Proving the consistency across all nodes of those data sets is fundamental to the integrity of those systems.
   293  
   294  However, because we have started with the assumption (see \ref{sec:formalism}) of distributed systems of independently acting agents, any \textit{proof} of  $\forall n,m \in N: \chain_n\eqbang\chain_m$ in a blockchain based system is better understood as a \textit{choice} (hence our use of the $\eqbang$),  in that nodes use their agency to decide when to stop interacting with other nodes based on detecting that the $\chain$ state no longer matches.  This might also be called  ``proof by enforcement," and is also appropriately known as a \term{fork} because essentially it results in partitioning of the network.
   295  
   296  The heart of the matter has to do with the trust any single agent has is in the system.  In \cite{yellowpaper} Section 1.1 (Driving Factors) we read:
   297  \begin{quote}
   298  Overall, I wish to provide a system such that users can be guaranteed that no matter with which other individuals, systems or organizations they interact, they can do so with absolute confidence in the possible outcomes and how those outcomes might come about.
   299  \end{quote}
   300  
   301  The idea of ``absolute confidence" here seems important, and we attempt to understand it more formally and generally for distributed systems.
   302  
   303  \begin{enumerate}
   304  \item Let $\Psi_\alpha$ be a measure of the confidence an agent has in various aspects of the system it participates in, where $0 \leq \Psi \leq 1$, 0 represents no confidence, and 1 represents absolute confidence.
   305  \item Let $R_n = \{\alpha_1,\alpha_2,...\dots\}$ define a set of aspects about the system with which an agent $n \in N$ measures confidence.  Call $R_n$ the \term{requirements} of $n$ with respect to $\Omega$.
   306  \item Let $\varepsilon_n(\alpha)$ be a thresholding function for node $n \in N$ with respect to $\alpha$ such that when $\Psi_\alpha < \varepsilon(\alpha)$ then $n$ will either stop participating in the system, or reject the participation of others (resulting in a fork).
   307  
   308  \item  Let $R_\mathrm{A}$ and Let $R_\mathrm{C}$ be partitions of $R$ where
   309  \begin{equation}
   310  \begin{split}
   311  \forall \alpha \in R_A:\varepsilon(\alpha)=1\\
   312  \forall \alpha \in R_C:\varepsilon(\alpha)<1
   313  \end{split}
   314  \end{equation}
   315  so any value of $\Psi \neq 1$ is rejected in $R_\mathrm{A}$ and any value $\Psi < \varepsilon(\alpha)$ is rejected in $R_\mathrm{C}$. Call $R_\mathrm{A}$ the \term{absolute requirements} and $R_\mathrm{C}$ the \term{considered requirements}.
   316  \end{enumerate}
   317  
   318  So we have formally separated system characteristics that we have absolute confidence in ($R_A$) from those we only have considered confidence in ($R_C$). Still unclear is how to measure a concrete confidence level $\Psi_\alpha$. In real-world contexts and for real-world decisions, confidence is mainly dependent on an (human) agent's vantage point, set of data at hand, and maybe even intuition. Thus we find it more adequate to call it a soft criteria. In order to comprehend this concept objectively and relate it to the notion conveyed by Woods in the quote above, we proceed by defining the measure of confidence of an aspect $\alpha$ as the conditional probability of it being the case in a given context:
   319  \begin{equation}
   320  \Psi_\alpha \equiv \mathcal{P}(\alpha | \mathcal{C})
   321  \end{equation}
   322  where the context $\mathcal{C}$ models all other information available to the agent, including basic and intuitive assumptions.
   323  
   324  Consider the fundamental example of cryptographically signed messages with asymetric keys as applied throughout the field of cryptographic systems (basically what coins the term crypto-currency). The central aspect in this context
   325  we call $\alpha_{signature}$ which provides us with the ability to \textit{know with certainty} that a given message's real author $Author_{real}$ is the same agent indicated solely via locally available data in the message's meta information through the cryptographic signature $Author_{local}$. We gain this confidence because we deem it \textit{very hard} for any agent not in possession of the private key to create a valid signature for a given message.
   326  \begin{equation}
   327  \alpha_{signature} \equiv Author_{real} = Author_{local}
   328  \end{equation}
   329  
   330  The appeal of this aspect is that we can check authorship locally, i.e., without the need of a 3rd party or direct trusted communication channel to the real author.
   331  But, the confidence in this aspect of a certain cryptographic system depends on the context $\mathcal{C}$:
   332  \begin{equation}
   333  \Psi_{signature} = \mathcal{P}(Author_{real} = Author_{local} | \mathcal{C})
   334  \end{equation}
   335  
   336  If we constrain the context to remove the possibility of an adversary gaining access to an agent's private key and also exclude the possible (future) existence of computing devices or algorithms that could easily calculate or brute force the key, we might then assign a (constructed) confidence level of 1, i.e., ``absolute confidence". Without such constraints on $\mathcal{C}$, we must admit that $\Psi_{signature}<1$, which real world events, for instance the Mt.Gox hack from 2014\footnote{"Most or all of the missing bitcoins were stolen straight out of the Mt. Gox hot wallet over time, beginning in late 2011" \cite{mt-gox}}, make clear.
   337  
   338  We aim to describe these relationships in such detail in order to point out that any set $R_A$ of \textit{absolute requirements} can't reach beyond trivial statements - statements about the content and integrity of the local state of the agent itself. Following Descarte's way of questioning the confidence in every thought, we project his famous statement \textit{cogito ergo sum} into the reference frame of multi-agent systems by stating: \textbf{Agents can only have honest confidence in the fact that they perceive a certain stimulus to be present and whether any particular abstract a priori model matches that stimulus without contradiction,} i.e., that an agent sees a certain piece of data and that it \textit{is possible to interpret it in a certain way}. Every conclusion being drawn a posteriori through the application of sophisticated models of the context is dependent on assumptions about the context that are inherent to the model. This is the heart of the agent-centric outlook, and what we claim must always be taken into account in the design of decentralized multi-agent systems, as it shows that any aspect of the system as a whole that includes assumptions about other agents and non-local events must be in $R_C$, i.e., have an a priori confidence of $\Psi<1$. Facing this truth about multi-agent systems, we find little value in trying to force an absolute truth $\forall n,m \in N: \chain_n\eqbang\chain_m$ and we instead frame the problem as:
   339  \\
   340  \begin{quote}
   341  We wish to provide generalized means by which decentralized multi-agent systems can be built so that:
   342  \begin{enumerate}
   343  \item fit-for-purpose solutions can be applied in order to optimize for application contextualized confidences $\Psi_\alpha$,
   344  \item violation of any threshold $\varepsilon(\alpha)$ through the actions of other agents can be detected and managed by any agent, such that
   345  \item the system integrity is maintained at any point in time or, when not, there is a path to regain it (see \ref{sec:evo}).
   346  \end{enumerate}
   347  \end{quote}
   348  
   349  We perceive the agent-centric solution to these requirements to be the holographic management of system-integrity within every agent/node of the system through application specific validation routines. These sets of validation rules lie at the heart of every decentralized application, and they vary across applications according to context. Every agent carefully keeps track of their representation of that portion of reality that is of importance to them - within the context of a given application that has to manage the trade-off between having high confidence thresholds $\varepsilon(\alpha)$ and a low need for resources and complexity.
   350  
   351  For example, consider two different use cases of transactions:
   352  \begin{enumerate}
   353  \item receipt of an email message where we are trying to validate it as spam or not and
   354  \item commit of monetary transaction where we are trying to validate it against double-spend.
   355  \end{enumerate}
   356  These contexts have different consequences that an agent may wish to evaluate differently and may be willing to expend differing levels of resources to validate. We designed Holochain to allow such validation functions to be set contextually per application and expose these contexts explicitly. Thus, one could conceivably build a Holochain application that deliberately makes choices in its validation functions to implement either all or partial characteristics of Blockchains. Holochain, therefore, can be understood as a framework that opens up a spectrum of decentralized application architectures in which Blockchain happens to be one specific instance at one end of this spectrum.
   357  
   358  In the following sections we will show what categories of validation algorithms exist and how these can be stacked on top of each other in order to build decentralized systems that are able to maintain integrity without introducing an absolute truth every agent would be forced to accept or consider.
   359  
   360  \subsubsection{Intrinsic Data Integrity}
   361  \label{sec:intrinsic}
   362  Every application but the most low-level routines utilize non-trivial, structured data types.
   363  Structured implies the existence of a model describing how to interpret raw bits as an instance of a type and how pieces of the structure relate to each other.
   364  Often, this includes certain assumptions about the set of possible values.
   365  Certain value combinations might not be meaningful or violate the intrinsic integrity of this data type.
   366  
   367  Consider the example of a cryptographically signed message $m=\{body, signature, author\}$,
   368  where $author$ is given in the form of their public key.
   369  This data type conveys the assumption that the three elements $body$, $signature$ and $author$ correspond to each other
   370  as constrained by the cryptographic algorithm that is assumed to be determined through the definition of this type.
   371  The intrinsic data integrity of a given instance can be validated just by looking at the data itself and checking the signature by
   372  applying the cryptographic algorithm that constitutes the central part of the type's a priori model.
   373  The validation yields a result $\in \{true,false\}$ which means that the confidence in the intrinsic data integrity is absolute, i.e. $\Psi_{intrinsic}=1$.
   374  
   375  Generally, \textbf{we define the intrinsic data integrity} of a transaction type $\phi$ as an aspect
   376  $\alpha_{\phi,intrinsic}\in R_A$, expressed through the existence of a deterministic and local
   377  validation function $V_\alpha(t)$ for transactions $t\in\phi$ that does not depend on any other inputs
   378  but $t$ itself.
   379  
   380  Note how the intrinsic data integrity of the message example above does not make any assumptions about any message's real author, as the aspect $\alpha_{signature}$ from the previous section does.
   381  With this definition, we focus on aspects that don't make any claims about system properties non-local to the agent under consideration, which roots the sequence of inferences that constitutes the validity and therefore confidence of a system's high-level aspects and integrity in consistent environmental inputs.
   382  
   383  \subsubsection{Membranes \& Provenance}
   384  \label{sec:membandprov}
   385  
   386  Distributed systems must rely on mechanisms to restrict participation by nodes in processes that without such restriction would compromise systemic integrity.
   387  Systems where the restrictions are based on the nodes' identity, whether that be as declared by type or  authority, or collected from the history of the nodes' behaviors, are know as \term{permissioned} \cite{CaaS}.
   388  Systems where these restrictions are not based on properties of the nodes themselves are known as \term{permissionless}.
   389  In permissionless multi-agent systems, a principle threat to systemic integrity comes from \textit{Sybil-Attacks} \cite{sybil}, where an adversary tries to
   390  overcome the system's validation rules by spawning a large number of compromised
   391  nodes.\\
   392  
   393  However, for both permissioned and permissionless systems, mechanisms exists to gate
   394  participation.
   395  \label{mebandprov:membfn}
   396  Formally: \\
   397  
   398  Let $M(n,\phi,z)$ be a binary function that evaluates whether transactions of type $\phi$ submitted by $n\in N$ are to be accepted, and where $z$ is any arbitrary extra information needed to make that evaluation.  Call $M$ the \term{membrane} function, and note that it will be a component of the validation function $V(t,v)$ from the initial formalism\ref{formalism:validation}.\\
   399  
   400  In the case of \sbtc\ and $\Omega_{ethereum}$, $M$ ignores the value of $n$ and makes its determination solely on whether $z$ demonstrates the ``proof" in proof-of-\textit{X} be it \textit{work} or \textit{stake} which is a sufficient gating to protect against Sybil-Attacks.
   401  
   402  Giving up the data-centric fallacy of forcing one absolute truth $\forall n,m \in N: \chain_n\eqbang\chain_m$ reveals that we can't discard transaction provenance.
   403  Agent-centric distributed systems instead must rely on two central facts about data:
   404  \begin{enumerate}
   405  \item it originates from a source and
   406  \item its historical sequence is local to that source.
   407  \end{enumerate}
   408  For this reason, \shc\ splits the system state data into two parts:
   409  \begin{enumerate}
   410  \item each node is responsible to maintain its own entire $\chain_n$ or \term{source chain} and be ready to confirm that state to other nodes when asked and
   411  \item all nodes are responsible to share portions of other nodes' transactions and those transactions' meta data in their \textbf{DHT shard} - meta data includes validity status, source, and optionally the source's chain headers which provide historical sequence.
   412  \end{enumerate}
   413  
   414  Thus, the DHT provides distributed access to others' transactions and their evaluations of the validity of those transactions.
   415  This resembles how knowledge gets constructed within social fields and through interaction with others, as described by the sociological theory of \textit{social constructivism}.
   416  
   417  The properties of the DHT in conjunction with the hash function provide us with a
   418  deterministically defined set of nodes, i.e., a neighborhood for every transaction.
   419  One cannot easily construct a transaction such that it lands in a given neighborhood.
   420  Formally:
   421  \begin{equation}
   422  \begin{split}
   423  \forall t\in\dhtstate: \exists \eta: \mathcal{H}\rightarrow N^r\\
   424  \eta(H(t))=(n_1, n_2, \dots, n_r)
   425  \end{split}
   426  \end{equation}
   427  where the function $\eta$ maps from the range $\mathcal{H}$ of the hash function $H$
   428  to the $r$ nodes that keep the $r$ redundant shards of the given transaction $t$ (see \ref{hc:shards}).
   429  
   430  Having the list of nodes $\eta(H(t))$ allows an agent to compare third-party viewpoints regarding $t$, with its own and that of the transaction's source(s).
   431  The randomization of the hash function $H$ ensures that those viewpoints represent
   432  an unbiased sample.
   433  $r$ can be adjusted depending on the application's constraints and the chosen trade-off between costs and system integrity.
   434  These properties provide sufficient infrastructure to create system integrity
   435  by detecting nodes that don't play by the rules - like changing the history or
   436  content of their source chain.
   437  In appendix \ref{apdx:trust} we detail tooling appropriate for different contexts,
   438  including ones where detailed analysis of source chain history is required -
   439  for example financial transaction auditing.
   440  
   441  Depending on the application's domain, neighborhoods could become vulnerable to Sybil-Attacks because a sufficiently large percentage of compromised nodes could introduce bias into the sample used by an agent to evaluate a given transaction.
   442  Holochain allows applications to handle Sybil-Attacks through domain specific
   443  membrane functions.
   444  Because we chose to inherently model agency within the system,
   445  permission can be granted or declined in a programmatic and decentralized manner
   446  thus allowing applications to appropriately land on the spectrum between permissioned and permissionless.
   447  
   448  In appendix \ref{apdx:membranes}, we provide some membrane schemes that can be
   449  chosen either for the outer
   450  membrane of that application that nodes have to cross in order to talk to
   451  any other node within the application or for any secondary membrane inside
   452  the application.
   453  That latter means that nodes could join permissionless and participate in aspects
   454  of the application that are not integrity critical without further condition
   455  but need to provide certain criteria in order to pass the membrane into application crucial
   456  validation.
   457  
   458  Thus, Holochain applications maintain systemic integrity without introducing
   459  consensus and therefore (computationally expensive) absolute truth because 1) any single node uses provenance to independently verify any single transaction with the sources involved in that transaction and 2) because each Holochain application runs independently of all others, they are inherently permissioned by application specific rules for joining and continuing participation in that application's network.
   460  These both provide the benefit that any given Holochain application can tune the expense of that validation to a contextually appropriate level.
   461  
   462  
   463  \subsubsection{Gossip \& World Model}
   464  \label{sec:gossip}
   465  
   466  So far, we have focused on those parts of the validation function $V$ used to verify elments of $\chain$.  However, maintaining system integrity in distributed systems also requires that nodes have mechanisms sharing information about nodes that have broken the validation rules so that they can be excluded from participation. There exist, additionally, forms of bad-acting that do not live in the content of a transaction but in the patterns of transacting that are detrimental to the system, for example, denial of service attacks.
   467  
   468  Holochain uses gossip for nodes to share information about their own experience of the behavior of other nodes.  Informally we call this information the node's \term{world model}. In this section we describe the nature of Holochain's gossip protocols and how they build and maintain a node's world model.
   469  
   470  In \ref{dht:metrics} we described one such part of the world model, the \textit{uptime} metric and how it is used for maintaing redundant copies of entries.  In \ref{mebandprov:membfn} we defined a membrane function that determines if a node shall accept a transaction and allowed that function to take arbitrary data $z$.  The main source of that data comes from this world model.
   471  
   472  
   473  More formally:
   474  
   475  \begin{enumerate}
   476  \item Recall that each node maintains a set $M$ of metrics $m$ about other nodes it knows about. Note that in terms of our formalism, this world model is part of each node's non-chain state data $D$.
   477  \item Let $m$ be a tuple of tuples: $((\mu,c)_\text{self},(\mu,c)_\text{others})$ which record an experience $\mu$ of a node with respect to a given metric and a confidence $c$ of that exprience, both as directly experienced or as "hearsay" recieved from other nodes.
   478  \item Allow a class of entries stored in $\chain_n$ be used also as a metric $m_w$ which act as a signed declaration of the experience of $n$ regarding some other node.  Call such entries \term{warrants}.  These warrants allow us to use the standard tooling of Holochain to make provenance based, verifyable claims about other nodes in the network, which propagate orthogonally from the usual DHT methods, via gossip to nodes that need to "hear" about these claims so as to make decisions about interacting with nodes.
   479  \item $\forall m \in M$ let the function $G_\text{with}(m)$ return a set of nodes important for a node to gossip \textbf{with} defined by a probabilistc weighting that information recieved from those nodes will result in changing $m_\text{other}$.
   480  \item $\forall m \in M$ let the function $G_\text{about}(m)$ return a set of nodes important for a node to gossip \textbf{about} defined by the properties of $m$.
   481  \item Define subsets of $G_\text{with}(m)$ according to a correlation with what it means to have low vs. high confidence value $c$:
   482  \begin{enumerate}
   483  \item \textbf{Pull}: consisting of nodes about which a low confidence means a need for more frequent gossip to raise a node's confidence.  Such nodes would include those for which, with respect to the given node, hold its published entries, hold entries it is also responsible for holding, are close the then node (i.e. in its lowest k-bucket), and which it relies on for routing (i.e. a subset of each k-bucket)
   484  \item \textbf{Push}: consisting of nodes about which a high confidence implies a need for more frequent gossip to spread the information about that node.  Such nodes would include ones for which a given node has high confidence is a bad actor, i.e. it has directly experienced bad acting, or has recevied bad actor gossipe from nodes that it has high confidence in being able to make that bad actor evaluation.
   485  \end{enumerate}
   486  \item \todo{TODO: describe a gossip trigger function based on the pull vs. pull distinction that demostrates when gossip happens}
   487  \end{enumerate}
   488  
   489  The computational costs of gossip depend on the set of metrics that a particular application needs to keep track of to maintain system integrity.  For an application with a very strong membership membrane perhaps only $uptime$ metrics are necessary to gossip about to balance resillience.  But this too may depend on apriori knowledge of the nodes involved in the application.  Applications with very loose membership membranes may have a substantial number of metrics and complex membrane functions using those metrics which may require substantial compute effort.  The Holochain design intentionally leaves these parameters only loosly specificed so that applications can be built fit for purpose.
   490  
   491  \subsubsection{CALM \& Logical Monotonicity}
   492  \todo{TODO: description of CALM in multi-agent systems, and how it works in our case}
   493  
   494  
   495  \section{Complexity In Distributed Systems}
   496  \label{sec:complexity}
   497  
   498  In this section we discuss the complexity of our proposed architecture for decentralized systems and compare it to the increasingly adopted Blockchain pattern.
   499  
   500  Formally describing the complexity of decentralized multi-agent systems is a non-trivial task for which more complex approaches have been suggested (\cite{multi-agent-complex}).
   501  This might be the reason why there happens to be unclarity and misunderstandings within communities discussing complexity and scalability of Bitcoin for example {\cite{bitcoin-complex}}.
   502  
   503  In order to be able to have a ball-park comparison between our approach and the current status quo in decentralized application architecture, we proceed by modeling the worst-case time complexity both for a single node $\Omega_{SystemNode}$ as well as for the whole system $\Omega_{System}$ and both as functions of the number of state transitions (i.e., transactions) $n$ and the number of nodes in the system $m$.
   504  
   505  \subsection{Bitcoin}
   506  Let $\Omega_{Bitcoin}$ be the Bitcoin network, $n$ be the number of transactions and $m$ be the number full validating nodes (i.e., \textit{miners}\footnote{For the sake of simplicity and focusing on a lower bound of the system's complexity, we are neglecting all nodes that are not crucial for the operation of the network, such as light-clients and clients not involved in the process of validation}) within $\Omega_{Bitcoin}$.
   507  
   508  For every new transaction being issued, any given node will have to check the transaction's signature (among other checks, see. \cite{bitcoin-protocol}) and especially check if this transaction's output is not used in any other transaction to reject double-spendings, resulting in a time complexity of
   509  \begin{equation}
   510  c+n
   511  \end{equation}
   512  per transaction. The time complexity in big-O notation per node as a function of the number of transactions is therefore:
   513  \begin{equation}
   514  \Omega_{BitcoinNode}\in O(n^2)
   515  \end{equation}
   516  The complexity handled by one Bitcoin node does not \footnote{not inherently - that is more participants will result in more transactions but we model both values as separate parameters} depend on $m$ the number of total nodes of the system. But since every node has to validate exactly the same set of transactions, the system's time complexity as a function of number of transactions and number of nodes results as
   517  \begin{equation}
   518  \Omega_{Bitcoin}\in O(n^2m)
   519  \end{equation}
   520  
   521  Note that this quadratic time complexity of Bitcoin's transaction validation process is what creates its main bottleneck as this reduces the network's gossip bandwidth since every node has to validate every transaction before passing it along. In order to still have an average transaction at least flood through $90\%$ of the network, block size and time can't be pushed beyond 4MB and 12s respectively, according to \cite{scaling}.
   522  
   523  \subsection{Ethereum}
   524  Let $\Omega_{Ethereum}$ be the Ethereum main network, $n$ be the number of transactions and $m$ the number of full-clients within in the network.
   525  
   526  The time complexity of processing a single transaction on a single node is a function of the code that has its execution being triggered by the given transaction plus a constant:
   527  \begin{equation}
   528  c+f_{tx_i}(n,m)
   529  \end{equation}
   530  Similarly to Bitcoin and as a result of the Blockchain design decision to maintain one single state ($\forall n,m \in N: \chain_n\eqbang\chain_m$, \textit{``This is to be avoided at all costs as the uncertainty that would ensue would likely kill all confidence in the entire system."} \cite{yellowpaper}), every node has to process every transaction being sent resulting in a time complexity per node as
   531  \begin{equation}
   532  c+\sum_{i=0}^n f_{tx_i}(n,m)
   533  \end{equation}
   534  that is
   535  \begin{equation}
   536  \Omega_{EthereumNode} \in O(n \cdot f_{avg}(n,m))
   537  \end{equation}
   538  whereas users are incentivized to hold the average complexity $f_{avg}(n,m)$
   539  of the code being run by Ethereum small
   540  since execution has to be payed for in gas and which is due to restrictions such as the \textit{block gas limit}.
   541  In other words, because of the complexity $\sum_{i=0}^n f_{tx_i}(n,m)$ being burdened upon all nodes of the system, other systemic properties have to keep users from running complex code on Ethereum so as to not bump into the network's limits.
   542  
   543  Again, since every node has to process the same set of all transactions, the time complexity of the whole system then is that of one node multiplied by $m$:
   544  \begin{equation}
   545  \Omega_{Ethereum} \in O(nm\cdot f_{tx_i}(n,m))
   546  \end{equation}
   547  
   548  \subsection{Blockchain}
   549  \label{sec:complex:blockchain}
   550  Both examples of Blockchain systems above do need a non-trivial computational overhead in order to work at all: the proof-of-work, hash-crack process also called \textit{mining}. Since this overhead is not a function of either the number of transactions nor directly of the number of nodes, it is often omitted in complexity analysis. With the total energy consumption of all Bitcoin miners today being greater than the country of Iceland \cite{mining-consumption}, neglecting the complexity of Blockchain's consensus algorithm seems like a silly mistake.
   551  
   552  Blockchains set the block time, the average time between two blocks, as a fixed parameter that the system keeps in homeostasis by adjusting the hash-crack's difficulty according to the network's total hash-rate. For a given network with a given set of mining nodes and a given total hash-rate, the complexity of the hash-crack is constant. But as the system grows and more miners come on-line, which increases the networks total hash-rate, the difficulty needs to increase in order to keep the average block time constant.
   553  
   554  With this approach, the benefit of a higher total hash-rate $x_{HR}$ is an increased difficulty of an adversary to influence the system by creating biased blocks (which would render this party able to do double-spend attacks). That is why Blockchains have to subsidize mining, depending on a high $x_{HR}$ as to make it economically impossible for an attacker to overpower the trusted miners.
   555  
   556  So, there is a direct relationship between the network's total trusted hash-rate and its level of security against mining power attacks.
   557  This means that the confidence $\Psi_{Blockchain}$ any agent can have in the integrity of the system is a function of the system's hash-rate $x_{HR}$, and more precisely, the cost/work $cost(x_{HR})$ needed to provide it.
   558  Looking only at a certain transaction $t$ and given any hacker acts economically rationally only, the confidence in $t$ being added to all $\chain_n$ has an upper bound in
   559  \begin{equation}
   560  \Psi_{Blockchain}(t) < min\left(1, \frac{cost(x_{HR})}{value(t)}\right)
   561  \end{equation}
   562  
   563  In order to keep this confidence unconstrained by the mining process and therefore the architecture of Blockchain itself, $cost(x_{HR})$ (which includes the setup of mining hardware as well as the energy consumption) has to grow linearly with the value exchanged within the system.
   564  
   565  \subsection{Holochain}
   566  Let $\Omega_{HC}$ be a given Holochain system, let $n$ be the sum of all public\footnote{private (see:\ref{private}) state transitions, i.e., that are confined to a local $\chain_n$, are completely within the scope of a node's agency and don't affect other parts of the system directly and can therefore be omitted for the complexity analysis of $\Omega_{HC}$ as a distributed system} (i.e., \textit{put} to the DHT) state transitions (\textit{transactions}), let all agents in $\Omega_{HC}$ trigger in total, and let $m$ be the number of agents (= nodes) in the system.
   567  
   568  Putting a new entry to the DHT involves finding a node that is responsible for holding that specific entry, which in our case according to \cite{kademlia} has a time complexity of \begin{equation}
   569  c+\lceil{log(m)}\rceil.
   570  \end{equation}
   571  After receiving the state transition data, this node will gossip with its $q$ neighbors which will result in $r$ copies of this state transition entry being stored throughout the system - on $r$ different nodes. Each of these nodes has to validate this entry which is an application specific logic of which the complexity we shall call $v(n, m)$.
   572  
   573  Combined, this results in a system-wide complexity per state transition as given with
   574  \begin{equation}
   575  \underbrace{c+\lceil{log(m)}\rceil}_{DHT lookup}
   576  + q + r \cdot
   577  \underbrace{v(n,m)}_{validation}
   578  \end{equation}
   579  which implies the following whole system complexity in $O$-notation
   580  \begin{equation}
   581  \Omega_{Holochain} \in O(n\cdot(log(m) + v(n,m))
   582  \end{equation}
   583  
   584  Now, this is the overall system complexity. In order to enable comparison, we reason that in the case of Holochain without loss of generality (i.e., dependent on the specific Holochain application), the load of the whole system is shared equally by all nodes. Without further assumptions, for any given state transition, the probability of it originating at a certain node is $\frac{1}{m}$, so the term for the lookup complexity needs to be divided by $m$ to  describe the average lookup complexity per node. Other than in Blockchain systems where every node has to see every transaction, for the vast majority of state transitions one particular node is not involved at all. The stochastic closeness of the node's public key's hash with the entry's hash is what triggers the node's involvement. We assume the hash function $H$ to show a uniform distribution of hash values which results in the probability of a certain node being one of the $r$ nodes that cannot discard this entry to be $\frac{1}{m}$ times $r$.  The average time complexity being handled by an average node then is
   585  \begin{equation}
   586  \Omega_{HolochainNode} \in
   587  O\left(\frac{n}{m}\cdot\left(log(m) + v(n,m)\right)\right)
   588  \end{equation}
   589  Note that the factor $\frac{n}{m}$ represents the average number of state transactions per node (i.e., the load per node) and that though this is a highly application specific value, it is an \textit{a priori }expected lower bound since nodes have to process at least the state transitions they produce themselves.
   590  
   591  The only overhead that is added by the architecture of this decentralized system is the node look-up with its complexity of $log(m)$.
   592  
   593  The unknown and also application specific complexity $v(n,m)$ of the validation routines is what could drive up the whole system's complexity still. And indeed it is conceivable to think of Holochain applications with a lot of complexity within their validation routines. It is basically possible to mimic Blockchain's consensus validation requirement by enforcing that a validating node communicates with all other nodes before adding an entry to the DHT. It could as well only be half of all nodes. And there surely is a host of applications with only little complexity - or specific state transitions within an application that involve only little complexity. \textit{In a Holochain app one can put the complexity where it is needed and keep the rest of the system fast and scalable.}
   594  
   595  In section \ref{sec:usecases} we proceed by providing real-world use cases and showing how non-trivial Holochain applications can be built that get along with a validation complexity of $O(1)$, resulting in a total time complexity per node in $O(log(m))$ and a high enough confidence in integrity without introducing proof-of-work at all.
   596  
   597  \section{Use Cases}
   598  \label{sec:usecases}
   599  
   600  Now we present a few use cases of applications built on Holochain, considering the context of the use case and how it affects both complexity and evaluation of integrity and thus validation design.
   601  
   602  \subsection{Social Media}
   603  Consider a simple implementation of micro-blogging using Holochain where:
   604  \begin{enumerate}
   605  \item $F_\mathrm{I}=\{f_\mathrm{post}(text,node),f_\mathrm{follow}(node),f_\mathrm{read}(text)\}$ and
   606  
   607  \item $F_\mathrm{V}=\{f_\mathrm{isOriginator}\}$
   608  \end{enumerate}
   609  \todo{describe $O(1)$ complexity}
   610  
   611  \subsection{Identity}
   612  \todo{DPKI}
   613  \subsection{Money}
   614  \todo{mutual-credit vs. coins}
   615  where the complexity of the transaction is higher, complexity may be $O(n^2)$ or $O(log(n))$ see holo currency white paper: \cite{holocurrency}
   616  
   617  
   618  \section{Implementation}
   619  \label{sec:implementation}
   620  
   621  At the time of this writing we have a fully operational implementation of system as described in this paper, that includes two separate virtual machines for writing \hcdna\ functions in JavaScript, or Lisp, along with proof-of-concept implementations of a number of applications including a twitter clone, a slack-like chat system, DPKI, and a set mix-in libraries useful for building applications.
   622  
   623  \begin{enumerate}
   624  \item 30k+ lines of go code.
   625  \item DHT: customized version of libp2p/ipfs's kademlia implementation.
   626  \item Network Transport: libp2p including end-to-end encryption.
   627  \item Javascript Virtual Machine: otto \\\url{https://github.com/robertkrimen/otto}.
   628  \item Lisp Virtual Machines: zygomys \\\url{https://github.com/glycerine/zygomys}.
   629  \end{enumerate}
   630  
   631  Additionally we have created a benchmarking suite to examine the processing, bandwidth and storage used in various scenarios, and compared these with Ethereum applications in similar scenarios.  These can be seen here: \\\url{https://github.com/holochain/benchmarks}
   632  
   633  We have yet to implement scalability tests for large scale applications, but it is in our roadmap.\todo{TODO}
   634  
   635  \appendix
   636  
   637  \section{$DHT_\text{hc}$}
   638  \label{apdx:dhtfn}
   639  \begin{enumerate}
   640  
   641  \item $dht_\text{putLink}(base,link,tag)$ where $base$ and $link$ are keys and where $tag$ is an arbitrary string, which associates the tuple \{link,tag\} with the key $base$.
   642  \item $dht_\text{getLinks}(base,tag)$ where $base$ is a key keys and where $tag$ is an arbitrary string, which returns the set of links on $base$ identified by $tag$.
   643  \item $dht_\text{mod}(key,newkey)$ where $key$ and $newkey$ are keys, which adds $newkey$ as a modifier of $\sigma_\text{key} \in \dhtstate$ and calls $dht_\text{putLink}(key,newkey,``replacedby")$.
   644  \item $dht_\text{del}(key)$ where $key$ is a key, and marks $\sigma_\text{key} \in \dhtstate$ as deleted.
   645  \item \todo{modification to $dht_\text{get}$ re mod \& del}.
   646  \end{enumerate}
   647  
   648  \section{$\sysfns$}
   649  \label{apdx:sysfn}
   650  \begin{enumerate}
   651  \item \todo{all the other sys functions...}
   652  \end{enumerate}
   653  
   654  \section{Patterns of Trust Management}
   655  \label{apdx:trust}
   656  
   657  Tools in Holochain available to app developers for use in Considered Requirements, some of which are also used at the system level and globally parameterized for an application:
   658  \begin{enumerate}
   659  \item Countersigning \todo{TODO}
   660  \item Notaries \todo{TODO -- ``The network is the notary."}
   661  \item Publish Headers  \todo{e.g. for chain-rollback detection}
   662  \item Source-chain examination.  \todo{TODO}
   663  \item Blocked-lists. \todo{e.g. DDOS, spam, etc}
   664  \item ... \todo{more here...}
   665  \end {enumerate}
   666  
   667  \section{Membranes}
   668  \label{apdx:membranes}
   669  
   670  \begin{itemize}
   671    \item \textit{Invitation}\\
   672    One of the most natural approaches for membrane crossing in a space in which
   673    agents provide identity is to rely on invitation by agents that are already
   674    in the membrane. This could be invitation:
   675    \begin{itemize}
   676      \item by anyone
   677      \item by an admin (that could either be set in the application's DNA or a
   678      variable shared within the DHT - both could be mutable or constant)
   679      \item by multiple users (applying social triangulation)
   680    \end{itemize}
   681    \item \textit{Proof-of-Identity / Reputation}\\
   682    Given the presence of other applications/chains, these can be used to attach the
   683    identity and its reputation in that chain to the agent that wants to join.
   684    Since this seems to be a crucial pillar of the ecosystem of Holochain
   685    applications, we plan to deliver a system-level application called DPKI
   686    (distributed public key infrastructure) that will function as the main
   687    identity and reputation platform.
   688    A prototype of this app was already developed prior to the writing of
   689    this paper.
   690    \item \textit{Proof-of-Presence}\\
   691    Use of notarized national documents/passports/identity cards within the agent
   692    entry (second entry in $\chain$).
   693    \item \textit{Proof-of-Service}\\
   694    Cryptographic proof of delivery of a service / hosting of an application.
   695    We intend to leverage this technique with our distributed cloud hosting
   696    application \textbf{Holo}, which we will build on top of Holochain.
   697    See our Holo Hosting white paper for much more detail \cite{hosting-wp}.
   698    \item \textit{Proof-of-Work}\\
   699    If the application's requirement is not anonymity, other than the
   700    cryptographic hash-cracking work applied in most of the Blockchains,
   701    this could also be useful work that new members are asked to contribute
   702    to the community
   703    or a puzzle to proof domain knowledge. Examples are:
   704    \begin{itemize}
   705      \item Test for knowledge about local maps to proof citizenship
   706      \item DNA sequencing
   707      \item Protein folding
   708      \item SETI
   709      \item Publication of scientific article
   710    \end{itemize}
   711    \item \textit{Proof-of-Stake / Payment}\\
   712    Depost or payment to have agent certified.
   713    \item \textit{Immune System}\\
   714    Blacklisting of nodes that don't play by the application rules.
   715  
   716  \end{itemize}
   717  
   718  \begin{acknowledgments}
   719  
   720  We thank Steve Sawin for his review of this paper, \LaTeXe support and so much more...\dots
   721  
   722  \end{acknowledgments}
   723  
   724  \bibliographystyle{alpha}
   725  \begin{thebibliography}{9}
   726  
   727  \bibitem[DUPONT]{dupont}
   728  Quinn DuPont.
   729  \textit{Experiments in Algorithmic Governance: A history and ethnography of “The DAO,” a failed Decentralized Autonomous Organization}
   730  \\\url{http://www.iqdupont.com/assets/documents/DUPONT-2017-Preprint-Algorithmic-Governance.pdf}
   731  
   732  \bibitem[EIP-150]{yellowpaper}
   733  Gavin Wood.
   734  \textit{Ethereum: A Secure Decentralised Generalised Transaction Ledger}.
   735  \\\url{http://yellowpaper.io/}
   736  
   737  \bibitem[Kademlia]{kademlia}
   738  Petar Maymounkov and David Mazieres
   739  \textit{Kademlia: A Peer-to-peer Information System Base on the XOR Metric}
   740  \\\url{https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf}
   741  
   742  \bibitem[Zhang13]{dht}
   743  Zhang, H., Wen, Y., Xie, H., Yu, N.
   744  \textit{
   745  Distributed Hash Table
   746  Theory, Platforms and Applications}
   747  
   748  \bibitem[Croman et al 16]{scaling}
   749  Kyle Croman, Christian Decker, Ittay Eyal, Adem Efe Gencer, Ari Juels, Ahmed Kosba, Andrew Miller, Prateek Saxena, Elaine Shi, Emin Gün Sirer, Dawn Song, Roger Wattenhofer,
   750  \textit{On Scaling Blockchains},
   751  Financial Cryptography and Data Security,
   752  Springer Verlag 2016
   753  
   754  \bibitem[Bitcoin Reddit]{bitcoin-complex}
   755  /u/mike\_hearn, /u/awemany, /u/nullc et al.
   756  \\\url{https://www.reddit.com/r/Bitcoin/comments/3a5f1v/mike_hearn_on_those_who_want_all_scaling_to_be/csa7exw/?context=3&st=j8jfak3q&sh=6e445294}
   757  Reddit discussion
   758  2015
   759  
   760  \bibitem[Marir2014]{multi-agent-complex}
   761  Marir, Toufik and Mokhati, Farid and Bouchelaghem-Seridi, Hassina and Tamrabet, Zouheyr",
   762  \textit{Complexity Measurement of Multi-Agent Systems"},
   763  Multiagent System Technologies: 12th German Conference, MATES 2014, Stuttgart, Germany, September 23-25, 2014. Proceedings,
   764  Springer International Publishing
   765  2014
   766  \\\url{https://doi.org/10.1007/978-3-319-11584-9_13}
   767  
   768  \bibitem[Coppock17]{mining-consumption}
   769  Mark Coppock
   770  \textit{THE WORLD’S CRYPTOCURRENCY MINING USES MORE ELECTRICITY THAN ICELAND}
   771  \\\url{https://www.digitaltrends.com/computing/bitcoin-ethereum-mining-use-significant-electrical-power/}
   772  
   773  \bibitem[BitcoinWiki]{bitcoin-protocol}
   774  \textit{Bitcoin Protocol}
   775  \\\url{https://en.bitcoin.it/wiki/Protocol_rules#.22tx.22_messages}
   776  Bitcoin Wiki
   777  
   778  \bibitem[IPFS]{ipfs}
   779  Juan Benet
   780  \textit{IPFS - Content Addressed, Versioned, P2P File System (DRAFT 3)}
   781  \\\url{https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf}
   782  
   783  \bibitem[LibP2P]{libp2p}
   784  Juan Benet, David Dias
   785  \textit{libp2p Specification}
   786  \\\url{https://github.com/libp2p/specs}
   787  
   788  \bibitem[Oxford]{provenance}
   789  Oxford
   790  Online dictionary
   791  \\\url{https://en.oxforddictionaries.com/definition/provenance}
   792  
   793  \bibitem[Douceur02]{sybil}
   794  Douceur, John R. (2002).
   795  "The Sybil Attack"
   796  \\\url{https://www.microsoft.com/en-us/research/publication/the-sybil-attack/?from=http%3A%2F%2Fresearch.microsoft.com%2Fpubs%2F74220%2Fiptps2002.pdf}
   797  International workshop on Peer-To-Peer Systems. Retrieved 23 April 2016.
   798  
   799  
   800  \bibitem[HoloCurrency]{currency-wp}
   801  Arthur Brock and Eric Harris-Braun 2017
   802  \textit{Holo: Cryptocurrency Infrastructure
   803  for Global Scale and Stable Value}
   804  \\\url{https://holo.host/holo-currency-wp/}
   805  
   806  \bibitem[Nilsson15]{mt-gox}
   807   Nilsson, Kim (19 April 2015).
   808   \textit{The missing MtGox bitcoins". Retrieved 10 December 2015.}
   809  \\\url{http://blog.wizsec.jp/2015/04/the-missing-mtgox-bitcoins.html}
   810  
   811  \bibitem[Swanson15]{CaaS}
   812  Tim Swanson
   813  \textit{Consensus-as-a-service: a brief report on the emergence of permissioned, distributed ledger systems}
   814  April 6, 2015
   815  \\\url{https://pdfs.semanticscholar.org/f3a2/2daa64fc82fcda47e86ac50d555ffc24b8c7.pdf}
   816  
   817  \end{thebibliography}
   818  
   819  
   820  \end{document}