github.com/kchristidis/fabric@v1.0.4-0.20171028114726-837acd08cde1/proposals/r1/Next-Ledger-Architecture-Proposal.md

github.com/kchristidis/fabric@v1.0.4-0.20171028114726-837acd08cde1/proposals/r1/Next-Ledger-Architecture-Proposal.md (about)

     1  **Draft** / **Work in Progress**
     2  
     3  This page documents a proposal for a future ledger architecture based on community feedback. All input is welcome as the goal is to make this a community effort.
     4  
     5  ##### Table of Contents  
     6  [Motivation](#motivation)  
     7  [API](#api)  
     8  [Point-in-Time Queries](#pointintime)  
     9  [Query Language](#querylanguage)
    10  
    11  
    12  ## <a name="motivation"></a> Motivation
    13  The motivation for exploring a new ledger architecture is based on community feedback. While the existing ledger is able to support some (but not all) of the below requirements, we wanted to explore what a new ledger would look like given all that has been learned. Based on many discussions in the community over Slack, GitHub, and the face to face hackathons, it is clear that there is a strong desire to support the following requirements:
    14  
    15  1. Point in time queries - The ability to query chaincode state at previous blocks and easily trace lineage **without** replaying transactions
    16  2. SQL like query language
    17  3. Privacy - The complete ledger may not reside on all committers
    18  4. Cryptographically secure ledger - Data integrity without consulting other nodes
    19  5. Support for consensus algorithms that provides immediate finality like PBFT
    20  6. Support consensus algorithms that require stochastic convergence like PoW, PoET
    21  7. Pruning - Ability to remove old transaction data as needed.
    22  8. Support separation of endorsement from consensus as described in the [Next Consensus Architecture Proposal](https://github.com/hyperledger/fabric/wiki/Next-Consensus-Architecture-Proposal). This implies that some peers may apply endorsed results to their ledger **without** executing transactions or viewing chaincode logic.
    23  9. API / Enginer separation. The ability to plug in different storage engines as needed.
    24  
    25  <a name="api"></a>
    26  ## API
    27  
    28  Proposed API in Go pseudocode
    29  
    30  ```
    31  package ledger
    32  
    33  import "github.com/hyperledger/fabric/protos/peer"
    34  
    35  // Encryptor is an interface that a ledger implementation can use for Encrypt/Decrypt the chaincode state
    36  type Encryptor interface {
    37  	Encrypt([]byte) []byte
    38  	Decrypt([]byte) []byte
    39  }
    40  
    41  // PeerMgmt is an interface that a ledger implementation expects from peer implementation
    42  type PeerMgmt interface {
    43  	// IsPeerEndorserFor returns 'true' if the peer is endorser for given chaincodeID
    44  	IsPeerEndorserFor(chaincodeID string) bool
    45  
    46  	// ListEndorsingChaincodes return the chaincodeIDs for which the peer acts as one of the endorsers
    47  	ListEndorsingChaincodes() []string
    48  
    49  	// GetEncryptor returns the Encryptor for the given chaincodeID
    50  	GetEncryptor(chaincodeID string) (Encryptor, error)
    51  }
    52  
    53  // In the case of a confidential chaincode, the simulation results from ledger are expected to be encrypted using the 'Encryptor' corresponding to the chaincode.
    54  // Similarly, the blocks returned by the GetBlock(s) method of the ledger are expected to have the state updates in the encrypted form.
    55  // However, internally, the ledger can maintain the latest and historical state for the chaincodes for which the peer is one of the endorsers - in plain text form.
    56  // TODO - Is this assumption correct?
    57  
    58  // General purpose interface for forcing a data element to be serializable/de-serializable
    59  type DataHolder interface {
    60  	GetData() interface{}
    61  	GetBytes() []byte
    62  	DecodeBytes(b []byte) interface{}
    63  }
    64  
    65  type SimulationResults interface {
    66  	DataHolder
    67  }
    68  
    69  type QueryResult interface {
    70  	DataHolder
    71  }
    72  
    73  type BlockHeader struct {
    74  }
    75  
    76  type PrunePolicy interface {
    77  }
    78  
    79  type BlockRangePrunePolicy struct {
    80  	FirstBlockHash string
    81  	LastBlockHash  string
    82  }
    83  
    84  // QueryExecutor executes the queries
    85  // Get* methods are for supporting KV-based data model. ExecuteQuery method is for supporting a rich datamodel and query support
    86  //
    87  // ExecuteQuery method in the case of a rich data model is expected to support queries on
    88  // latest state, historical state and on the intersection of state and transactions
    89  type QueryExecutor interface {
    90  	GetState(key string) ([]byte, error)
    91  	GetStateRangeScanIterator(startKey string, endKey string) (ResultsIterator, error)
    92  	GetStateMultipleKeys(keys []string) ([][]byte, error)
    93  	GetTransactionsForKey(key string) (ResultsIterator, error)
    94  
    95  	ExecuteQuery(query string) (ResultsIterator, error)
    96  }
    97  
    98  // TxSimulator simulates a transaction on a consistent snapshot of the as recent state as possible
    99  type TxSimulator interface {
   100  	QueryExecutor
   101  	StartNewTx()
   102  
   103  	// KV data model
   104  	SetState(key string, value []byte)
   105  	DeleteState(key string)
   106  	SetStateMultipleKeys(kvs map[string][]byte)
   107  
   108  	// for supporting rich data model (see comments on QueryExecutor above)
   109  	ExecuteUpdate(query string)
   110  
   111  	// This can be a large payload
   112  	CopyState(sourceChaincodeID string) error
   113  
   114  	// GetTxSimulationResults encapsulates the results of the transaction simulation.
   115  	// This should contain enough detail for
   116  	// - The update in the chaincode state that would be caused if the transaction is to be committed
   117  	// - The environment in which the transaction is executed so as to be able to decide the validity of the enviroment
   118  	//   (at a later time on a different peer) during committing the transactions
   119  	// Different ledger implementation (or configurations of a single implementation) may want to represent the above two pieces
   120  	// of information in different way in order to support different data-models or optimize the information representations.
   121  	// TODO detailed illustration of a couple of representations.
   122  	GetTxSimulationResults() SimulationResults
   123  	HasConflicts() bool
   124  	Clear()
   125  }
   126  
   127  type ResultsIterator interface {
   128  	// Next moves to next key-value. Returns true if next key-value exists
   129  	Next() bool
   130  	// GetKeyValue returns next key-value
   131  	GetResult() QueryResult
   132  	// Close releases resources occupied by the iterator
   133  	Close()
   134  }
   135  
   136  // OrdererLedger implements methods required by 'orderer ledger'
   137  type OrdererLedger interface {
   138  	Ledger
   139  	// CommitBlock adds a new block
   140  	CommitBlock(block *common.Block) error
   141  }
   142  
   143  // PeerLedger differs from the OrdererLedger in that PeerLedger locally maintain a bitmask
   144  // that tells apart valid transactions from invalid ones
   145  type PeerLedger interface {
   146  	Ledger
   147  	// GetTransactionByID retrieves a transaction by id
   148  	GetTransactionByID(txID string) (*pb.Transaction, error)
   149  	// GetBlockByHash returns a block given it's hash
   150  	GetBlockByHash(blockHash []byte) (*common.Block, error)
   151  	// NewTxSimulator gives handle to a transaction simulator.
   152  	// A client can obtain more than one 'TxSimulator's for parallel execution.
   153  	// Any snapshoting/synchronization should be performed at the implementation level if required
   154  	NewTxSimulator() (TxSimulator, error)
   155  	// NewQueryExecutor gives handle to a query executor.
   156  	// A client can obtain more than one 'QueryExecutor's for parallel execution.
   157  	// Any synchronization should be performed at the implementation level if required
   158  	NewQueryExecutor() (QueryExecutor, error)
   159  	// NewHistoryQueryExecutor gives handle to a history query executor.
   160  	// A client can obtain more than one 'HistoryQueryExecutor's for parallel execution.
   161  	// Any synchronization should be performed at the implementation level if required
   162  	NewHistoryQueryExecutor() (HistoryQueryExecutor, error)
   163  	// Commits block into the ledger
   164  	Commit(block *common.Block) error
   165  	//Prune prunes the blocks/transactions that satisfy the given policy
   166  	Prune(policy PrunePolicy) error
   167  }
   168  
   169  // ValidatedLedger represents the 'final ledger' after filtering out invalid transactions from PeerLedger.
   170  // Post-v1
   171  type ValidatedLedger interface {
   172  	Ledger
   173  }
   174  
   175  // Ledger captures the methods that are common across the 'PeerLedger', 'OrdererLedger', and 'ValidatedLedger'
   176  type Ledger interface {
   177  	// GetBlockchainInfo returns basic info about blockchain
   178  	GetBlockchainInfo() (*pb.BlockchainInfo, error)
   179  	// GetBlockByNumber returns block at a given height
   180  	// blockNumber of  math.MaxUint64 will return last block
   181  	GetBlockByNumber(blockNumber uint64) (*common.Block, error)
   182  	// GetBlocksIterator returns an iterator that starts from `startBlockNumber`(inclusive).
   183  	// The iterator is a blocking iterator i.e., it blocks till the next block gets available in the ledger
   184  	// ResultsIterator contains type BlockHolder
   185  	GetBlocksIterator(startBlockNumber uint64) (ResultsIterator, error)
   186  	// Close closes the ledger
   187  	Close()
   188  }
   189  
   190  //BlockChain represents an instance of a block chain. In the case of a consensus algorithm that could cause a fork, an instance of BlockChain
   191  // represent one of the forks (i.e., one of the chains starting from the genesis block to the one of the top most blocks)
   192  type BlockChain interface {
   193  	GetTopBlockHash() string
   194  	GetBlockchainInfo() (*protos.BlockchainInfo, error)
   195  	GetBlockHeaders(startingBlockHash, endingBlockHash string) []*BlockHeader
   196  	GetBlocks(startingBlockHash, endingBlockHash string) []*protos.Block
   197  	GetBlockByNumber(blockNumber uint64) *protos.Block
   198  	GetBlocksByNumber(startingBlockNumber, endingBlockNumber uint64) []*protos.Block
   199  	GetBlockchainSize() uint64
   200  	VerifyChain(highBlock, lowBlock uint64) (uint64, error)
   201  }
   202  ```
   203  
   204  # Engine specific thoughts
   205  
   206  <a name="pointintime"></a>
   207  ### Point-in-Time Queries
   208  In abstract temporal terms, there are three varieties of query important to chaincode and application developers:
   209  
   210  1. Retrieve the most recent value of a key. (type: current; ex. How much money is in Alice's account?)
   211  2. Retrieve the value of a key at a specific time. (type: historical; ex. What was Alice's account balance at the end of last month?)
   212  3. Retrieve all values of a key over time. (type: lineage; ex. Produce a statement listing all of Alice's transactions.)
   213  
   214  When formulating a query, a developer will benefit from the ability to filter, project, and relate transactions to one-another. Consider the following examples:
   215  
   216  1. Simple Filtering: Find all accounts that fell below a balance of $100 in the last month.
   217  2. Complex Filtering: Find all of Trudy's transactions that occurred in Iraq or Syria where the amount is above a threshold and the other party has a name that matches a regular expression.
   218  3. Relating: Determine if Alice has ever bought from the same gas station more than once in the same day. Feed this information into a fraud detection model.
   219  4. Projection: Retrieve the city, state, country, and amount of Alice's last ten transactions. This information will be fed into a risk/fraud detection model.
   220  
   221  <a name="querylanguage"></a>
   222  ### Query Language
   223  Developing a query language to support such a diverse range of queries will not be simple. The challenges are:
   224  
   225  1. Scaling the query language with developers as their needs grow. To date, the requests from developers have been modest. As the Hyperledger project's user base grows, so will the query complexity.
   226  2. There are two nearly disjoint classes of query:
   227      1. Find a single value matching a set of constraints. Amenable to existing SQL and NoSQL grammars.
   228      2. Find a chain or chains of transactions satisfying a set of constraints. Amenable to graph query languages, such as Neo4J's Cypher or SPARQL.
   229  
   230  <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.
   231  s