One Pager Cheat Sheet
- We will learn about database sharding and its scalability benefits through a focus on
sharding strategies
, theirmerits
, anddownfalls
. - By scaling or
sharding
your database, you can ensure optimal performance and customer satisfaction for your application when there is a large number of users. - Sharding, or
horizontal partitioning
, is the process of breaking up large tables into multipleshards
on differentservers
, with asharding key
specifying which data belongs where, providing improved data management and more efficient query handling. - Databases should generally only be sharded if all other methods of optimization, such as caching and upgrading servers, have been exhausted, as it adds significant complexity to perform operations on the database.
- The four
CRUD
operations - Create, Read, Update and Delete - enable users to interact with and modify data stored in a database. Sharding
can be employed using variousstrategies
, depending on the type of data and type of application.- Key-based sharding uses values from a database table as
shard keys
to plug into ahash function
, which outputs a discrete value as theshard ID
to determine which shard the data should be stored on. - Range-based sharding divides data into shards based on specified ranges of values for an attribute, providing a simple and effective strategy for data distribution.
- Directory-based sharding utilizes a lookup table to map the data with a shard ID that tells which shard holds the data corresponding to the
shard key
selected from the original table. - A hash function is used to convert a shard key into a unique numerical value to quickly locate the corresponding shard ID in a lookup table for key-based sharding.
- Hash functions and
shard key
determine in which shard a given row from the database table will be stored. - The
sharding
strategy that is most suitable for your specific needs depends on factors such as application requirements and performance needs, so careful considerations should be taken when deciding which strategy to use. - Range-based architecture is the
simplest
and easiest to implement approach for sharding compared to key-based or directory-based alternatives. Directory-based sharding
provides the most flexibility for dynamic addition or removal of servers without the need forrehashing
and resulting in no serverdowntime
.- Key-based sharding is
preferred
forDistributed Data Storage
, as it does not rely heavily on lookup tables and allows for algorithmic distribution of data. - Sharding with key-based and directory-based techniques ensures an
even distribution
of data, making it a better option than range-based sharding which may lead to disproportionate reads. - Choosing the right sharding strategy according to the situation and carefully assessing the need to shard the
database
is important for successful optimization.