One Pager Cheat Sheet
- By understanding the different types of storage layers (
Ephemeral
&Persistent
) and their respective use cases, this tutorial will help you to address the challenges of storage layer design and choose the best solutions for various workloads. - Ensuring business continuity and
performance
, along with accessing large data volumes and ensuring correct operation are all concerns when designing a storage layer. - All 4 types of data stores: In Memory Key Value, Relational, Large Scale (NoSQL) Key Value and Distributed File System provide different performance characteristics and have the potential to scale up to different capacities in order to solve any data management problem.
- In Memory Key Value stores provide the fastest data lookups due to their in-memory architecture,
latency
advantages, and ability to leverageparallelism
. - Transactions are atomic, consistent, isolated, and durable operations which ensure that data is always kept in a consistent state across restarts.
- The performance of a data store is usually measured by benchmarking, by testing its
latency
(how fast it can process a request) and throughput (how many requests can be processed per unit of time), though it is difficult to have both low latency and high throughput at the same time. - The most common use case is to use a relational data store, with strong ACID guarantees,
SQL
support and performance, to address correctness, performance and data access challenges. - High latency and throughput are not necessarily
inversely proportional
, and thethroughput
can still be high even with high latency if the number of operations is high enough. - Optimize system performance for a particular workload by running benchmarks and adjusting settings according to the product's documentation.
- A well-implemented
Ephemeral Storage Layer
can dramatically improve performance by allowing responses to be retrieved from a cache server and reducing latency and increasing throughput. - Designing a persistent storage layer requires careful consideration of the different solutions available to serve large volumes of data, such as
sharding
,distributed file systems
, andLarge scale NoSQL data stores
, while meeting the requirements of ACID, scalability, and low latency. - The
sharding
method applied for key value stores is horizontal, allowing differentsub-sets
of the data store to beretrieved
and managedseparately
, making them moreefficient
for dataretrieval
. - Ensuring
business continuity
by regular backups or through replication across multiple data centers can minimize downtime in the event of disasters. Geographical distribution
is used to solve latency issues, as well as form a business continuity strategy usingdisjoint data sets
.- Data replication provides a way to
restore data
quickly in case of system or hardware failures, reducing the risk of data loss by creating multiple copies of the data stored in different locations.