Key Based Sharding
To understand key-based
sharding, we'll need to understand-- or revisit-- the concept of a hash function
. For this lesson, we can assume the hash function to be a black box that maps values. Thus, it takes a piece of data as its input, and outputs a discrete value corresponding to the input value. In this case, the value is known as hash value
.
In key-based sharding, values from a column in a database table are used. The values are plugged into the hash function
. The output hash value
determines which shard the data should go to. More precisely, the hash value obtained is the shard ID
, which determines which shard the data will be stored on.

The values in the hash function all come from the same column. They can be thought of as primary keys
, establishing a unique identifier for each row in the table. The values in this selected column are known as shard keys
. It should be noted that the shard key needs to be of a value that does not change over time. Otherwise, the update operations may give errors and increase the amount of work.