Mark As Completed Discussion

One Pager Cheat Sheet

  • We will learn about database sharding and its scalability benefits through a focus on sharding strategies, their merits, and downfalls.
  • By scaling or sharding your database, you can ensure optimal performance and customer satisfaction for your application when there is a large number of users.
  • Sharding, or horizontal partitioning, is the process of breaking up large tables into multiple shards on different servers, with a sharding key specifying which data belongs where, providing improved data management and more efficient query handling.
  • Databases should generally only be sharded if all other methods of optimization, such as caching and upgrading servers, have been exhausted, as it adds significant complexity to perform operations on the database.
  • The four CRUD operations - Create, Read, Update and Delete - enable users to interact with and modify data stored in a database.
  • Sharding can be employed using various strategies, depending on the type of data and type of application.
  • Key-based sharding uses values from a database table as shard keys to plug into a hash function, which outputs a discrete value as the shard ID to determine which shard the data should be stored on.
  • Range-based sharding divides data into shards based on specified ranges of values for an attribute, providing a simple and effective strategy for data distribution.
  • Directory-based sharding utilizes a lookup table to map the data with a shard ID that tells which shard holds the data corresponding to the shard key selected from the original table.
  • A hash function is used to convert a shard key into a unique numerical value to quickly locate the corresponding shard ID in a lookup table for key-based sharding.
  • Hash functions and shard key determine in which shard a given row from the database table will be stored.
  • The sharding strategy that is most suitable for your specific needs depends on factors such as application requirements and performance needs, so careful considerations should be taken when deciding which strategy to use.
  • Range-based architecture is the simplest and easiest to implement approach for sharding compared to key-based or directory-based alternatives.
  • Directory-based sharding provides the most flexibility for dynamic addition or removal of servers without the need for rehashing and resulting in no server downtime.
  • Key-based sharding is preferred for Distributed Data Storage, as it does not rely heavily on lookup tables and allows for algorithmic distribution of data.
  • Sharding with key-based and directory-based techniques ensures an even distribution of data, making it a better option than range-based sharding which may lead to disproportionate reads.
  • Choosing the right sharding strategy according to the situation and carefully assessing the need to shard the database is important for successful optimization.