github.com/ipld/go-ipld-prime@v0.21.0/storage/sharding/sharding.go (about) 1 /* 2 This package contains several useful readymade sharding functions, 3 which should plug nicely into most storage implementations. 4 5 The API contract for a sharding function is: 6 7 func(key string, shards *[]string) 8 9 In other words, the return is actually by a pointer to a slice which will be mutated. 10 This API allows the calling code to hand in a slice with existing capacity, 11 and thus allows for sharding functions to work without allocations. 12 13 There is not a named type for this contract, because we prefer that packages 14 implementing the storage APIs should be possible to write without 15 being required to import any code from the go-ipld-prime module. 16 However, the function type definition above can be seen in many packages. 17 18 Not all packages use this API convention. The `fsstore` package does; 19 some other storage implementations don't use sharding functions because they don't need them; 20 most of the adapter packages which target older code do not, 21 because those modules have their own sharding APIs already. 22 */ 23 package sharding 24 25 // Shard_r133 is a sharding function which will return three hunks, 26 // the last of which is the full original key, 27 // and the first two of which are three bytes long. 28 // The prefix hunks are taken from the end of the original key, 29 // after skipping one byte. 30 // If the key is too short, padding of the ascii "0" character is used. 31 // 32 // (This somewhat odd-sounding procedure is a useful one in practice, 33 // because if applying it on a base32 string that's a CID or multihash (which is the typical usage), 34 // it avoids the uneven distribution of the trailing characters of a base32 string, 35 // and also avoids the uneven distribution of the prefixes of CIDs or mulithashes.) 36 // 37 // If the shards parameter is a pointer to a slice that starts at zero length 38 // and a capacity of at least 3, this function will operate with no allocations. 39 // 40 // Supposing the key is a base32 string (where each byte effectively contains 2^5 bits), 41 // if a sufficient range of keys is present that all shards are seen, 42 // each group of shards will contain (2^5)^3=32768 entries. 43 func Shard_r133(key string, shards *[]string) { 44 l := len(key) 45 switch { 46 case l > 6: 47 *shards = append(*shards, key[l-7:l-4], key[l-4:l-1], key) 48 case l > 3: 49 *shards = append(*shards, "000", key[l-4:l-1], key) 50 default: 51 *shards = append(*shards, "000", "000", key) 52 } 53 } 54 55 // Shard_r133 is a sharding function which will return three hunks. 56 // It is very similar to Shard_r133, but with shorter hunks. 57 // The last hunk is the full original key, 58 // and the first two hunks are two bytes long each. 59 // The prefix hunks are taken from the end of the original key, 60 // after skipping one byte. 61 // If the key is too short, padding of the ascii "0" character is used. 62 // 63 // If the shards parameter is a pointer to a slice that starts at zero length 64 // and a capacity of at least 3, this function will operate with no allocations. 65 // 66 // Supposing the key is a base32 string (where each byte effectively contains 2^5 bits), 67 // if a sufficient range of keys is present that all shards are seen, 68 // each group of shards will contain (2^5)^2=1024 entries. 69 // (This is often a useful number in practice, because if one is mapping shards 70 // onto filesystem directories, 1024 entries is almost certainly going to fit 71 // efficiently within any filesystem format you're likely to encounter; 72 // 1024-within-1024 also means you'll see about a billion entries before 73 // directories on the second layer of sharding will contain more than 1024 files. 74 // (If we're assuming 1MB blocks of data asthe actual contents, that would be quite 75 // a few terabytes of storage, so this is a very nice balanced trade for 76 // most practical systems.)) 77 func Shard_r122(key string, shards *[]string) { 78 l := len(key) 79 switch { 80 case l > 4: 81 *shards = append(*shards, key[l-5:l-3], key[l-3:l-1], key) 82 case l > 2: 83 *shards = append(*shards, "00", key[l-3:l-1], key) 84 default: 85 *shards = append(*shards, "00", "00", key) 86 } 87 } 88 89 // Shard_r12 is a sharding function which will return two hunks. 90 // The last hunk is the full original key, 91 // and the first hunk is two bytes long. 92 // The prefix is are taken from the end of the original key, 93 // after skipping one byte. 94 // If the key is too short, the first hunk is just the ascii characters "00" instead. 95 // 96 // If the shards parameter is a pointer to a slice that starts at zero length 97 // and a capacity of at least 2, this function will operate with no allocations. 98 // 99 // Shard_r122 is functionally equivalent to "flatfs/shard/v1/next-to-last/2", 100 // as it's known in some other code -- it may be familiar as the default 101 // for block storage in go-ipfs. 102 func Shard_r12(key string, shards *[]string) { 103 l := len(key) 104 switch { 105 case l > 2: 106 *shards = append(*shards, key[l-3:l-1], key) 107 default: 108 *shards = append(*shards, "00", key) 109 } 110 }