One Pager Cheat Sheet
- We will learn about database sharding and its scalability benefits through a focus on
sharding strategies, theirmerits, anddownfalls. - By scaling or
shardingyour database, you can ensure optimal performance and customer satisfaction for your application when there is a large number of users. - Sharding, or
horizontal partitioning, is the process of breaking up large tables into multipleshardson differentservers, with asharding keyspecifying which data belongs where, providing improved data management and more efficient query handling. - Databases should generally only be sharded if all other methods of optimization, such as caching and upgrading servers, have been exhausted, as it adds significant complexity to perform operations on the database.
- The four
CRUDoperations - Create, Read, Update and Delete - enable users to interact with and modify data stored in a database. Shardingcan be employed using variousstrategies, depending on the type of data and type of application.- Key-based sharding uses values from a database table as
shard keysto plug into ahash function, which outputs a discrete value as theshard IDto determine which shard the data should be stored on. - Range-based sharding divides data into shards based on specified ranges of values for an attribute, providing a simple and effective strategy for data distribution.
- Directory-based sharding utilizes a lookup table to map the data with a shard ID that tells which shard holds the data corresponding to the
shard keyselected from the original table. - A hash function is used to convert a shard key into a unique numerical value to quickly locate the corresponding shard ID in a lookup table for key-based sharding.
- Hash functions and
shard keydetermine in which shard a given row from the database table will be stored. - The
shardingstrategy that is most suitable for your specific needs depends on factors such as application requirements and performance needs, so careful considerations should be taken when deciding which strategy to use. - Range-based architecture is the
simplestand easiest to implement approach for sharding compared to key-based or directory-based alternatives. Directory-based shardingprovides the most flexibility for dynamic addition or removal of servers without the need forrehashingand resulting in no serverdowntime.- Key-based sharding is
preferredforDistributed Data Storage, as it does not rely heavily on lookup tables and allows for algorithmic distribution of data. - Sharding with key-based and directory-based techniques ensures an
even distributionof data, making it a better option than range-based sharding which may lead to disproportionate reads. - Choosing the right sharding strategy according to the situation and carefully assessing the need to shard the
databaseis important for successful optimization.

