Mark As Completed Discussion

Key Based Sharding

To understand key-based sharding, we'll need to understand-- or revisit-- the concept of a hash function. For this lesson, we can assume the hash function to be a black box that maps values. Thus, it takes a piece of data as its input, and outputs a discrete value corresponding to the input value. In this case, the value is known as hash value.

In key-based sharding, values from a column in a database table are used. The values are plugged into the hash function. The output hash value determines which shard the data should go to. More precisely, the hash value obtained is the shard ID, which determines which shard the data will be stored on.

Key Based Sharding

The values in the hash function all come from the same column. They can be thought of as primary keys, establishing a unique identifier for each row in the table. The values in this selected column are known as shard keys. It should be noted that the shard key needs to be of a value that does not change over time. Otherwise, the update operations may give errors and increase the amount of work.