github.com/ipld/go-ipld-prime@v0.21.0/storage/sharding/sharding.go (about)

     1  /*
     2  This package contains several useful readymade sharding functions,
     3  which should plug nicely into most storage implementations.
     4  
     5  The API contract for a sharding function is:
     6  
     7  	func(key string, shards *[]string)
     8  
     9  In other words, the return is actually by a pointer to a slice which will be mutated.
    10  This API allows the calling code to hand in a slice with existing capacity,
    11  and thus allows for sharding functions to work without allocations.
    12  
    13  There is not a named type for this contract, because we prefer that packages
    14  implementing the storage APIs should be possible to write without
    15  being required to import any code from the go-ipld-prime module.
    16  However, the function type definition above can be seen in many packages.
    17  
    18  Not all packages use this API convention.  The `fsstore` package does;
    19  some other storage implementations don't use sharding functions because they don't need them;
    20  most of the adapter packages which target older code do not,
    21  because those modules have their own sharding APIs already.
    22  */
    23  package sharding
    24  
    25  // Shard_r133 is a sharding function which will return three hunks,
    26  // the last of which is the full original key,
    27  // and the first two of which are three bytes long.
    28  // The prefix hunks are taken from the end of the original key,
    29  // after skipping one byte.
    30  // If the key is too short, padding of the ascii "0" character is used.
    31  //
    32  // (This somewhat odd-sounding procedure is a useful one in practice,
    33  // because if applying it on a base32 string that's a CID or multihash (which is the typical usage),
    34  // it avoids the uneven distribution of the trailing characters of a base32 string,
    35  // and also avoids the uneven distribution of the prefixes of CIDs or mulithashes.)
    36  //
    37  // If the shards parameter is a pointer to a slice that starts at zero length
    38  // and a capacity of at least 3, this function will operate with no allocations.
    39  //
    40  // Supposing the key is a base32 string (where each byte effectively contains 2^5 bits),
    41  // if a sufficient range of keys is present that all shards are seen,
    42  // each group of shards will contain (2^5)^3=32768 entries.
    43  func Shard_r133(key string, shards *[]string) {
    44  	l := len(key)
    45  	switch {
    46  	case l > 6:
    47  		*shards = append(*shards, key[l-7:l-4], key[l-4:l-1], key)
    48  	case l > 3:
    49  		*shards = append(*shards, "000", key[l-4:l-1], key)
    50  	default:
    51  		*shards = append(*shards, "000", "000", key)
    52  	}
    53  }
    54  
    55  // Shard_r133 is a sharding function which will return three hunks.
    56  // It is very similar to Shard_r133, but with shorter hunks.
    57  // The last hunk is the full original key,
    58  // and the first two hunks are two bytes long each.
    59  // The prefix hunks are taken from the end of the original key,
    60  // after skipping one byte.
    61  // If the key is too short, padding of the ascii "0" character is used.
    62  //
    63  // If the shards parameter is a pointer to a slice that starts at zero length
    64  // and a capacity of at least 3, this function will operate with no allocations.
    65  //
    66  // Supposing the key is a base32 string (where each byte effectively contains 2^5 bits),
    67  // if a sufficient range of keys is present that all shards are seen,
    68  // each group of shards will contain (2^5)^2=1024 entries.
    69  // (This is often a useful number in practice, because if one is mapping shards
    70  // onto filesystem directories, 1024 entries is almost certainly going to fit
    71  // efficiently within any filesystem format you're likely to encounter;
    72  // 1024-within-1024 also means you'll see about a billion entries before
    73  // directories on the second layer of sharding will contain more than 1024 files.
    74  // (If we're assuming 1MB blocks of data asthe actual contents, that would be quite
    75  // a few terabytes of storage, so this is a very nice balanced trade for
    76  // most practical systems.))
    77  func Shard_r122(key string, shards *[]string) {
    78  	l := len(key)
    79  	switch {
    80  	case l > 4:
    81  		*shards = append(*shards, key[l-5:l-3], key[l-3:l-1], key)
    82  	case l > 2:
    83  		*shards = append(*shards, "00", key[l-3:l-1], key)
    84  	default:
    85  		*shards = append(*shards, "00", "00", key)
    86  	}
    87  }
    88  
    89  // Shard_r12 is a sharding function which will return two hunks.
    90  // The last hunk is the full original key,
    91  // and the first hunk is two bytes long.
    92  // The prefix is are taken from the end of the original key,
    93  // after skipping one byte.
    94  // If the key is too short, the first hunk is just the ascii characters "00" instead.
    95  //
    96  // If the shards parameter is a pointer to a slice that starts at zero length
    97  // and a capacity of at least 2, this function will operate with no allocations.
    98  //
    99  // Shard_r122 is functionally equivalent to "flatfs/shard/v1/next-to-last/2",
   100  // as it's known in some other code -- it may be familiar as the default
   101  // for block storage in go-ipfs.
   102  func Shard_r12(key string, shards *[]string) {
   103  	l := len(key)
   104  	switch {
   105  	case l > 2:
   106  		*shards = append(*shards, key[l-3:l-1], key)
   107  	default:
   108  		*shards = append(*shards, "00", key)
   109  	}
   110  }