Overview of Riak KV Architecture
In this article, we will explore the core components and architecture of Riak KV, a highly available NoSQL database used by Uber for low-latency, high-scale data storage.
Background
Uber uses several data storage technologies:
- Schemaless - Custom MySQL storage for long term data
- Riak KV - NoSQL key-value store optimized for availability and latency
- Cassandra - Wide column store for very large datasets
Riak KV meets Uber's needs for always-on low-latency access to critical datasets.
Key Capabilities
Riak KV is a highly resilient NoSQL database optimized for:
- High availability - Designed for fault tolerance with automatic request forwarding
- Low latency - Optimized for fast reads and writes of key-value data
- Operational simplicity - Auto-repair, auto-scaling, and easy admin
- Flexible scalability - Linear scaling with no downtime or data migration

These capabilities make Riak KV well-suited for mission-critical applications needing high throughput and low latency access to key-value data.
Architecture Overview
At a high level, Riak KV architecture consists of:
- Node clusters - Scale out capacity by adding nodes
- Virtual nodes - Evenly distribute partitions across physical nodes
- Replication - Maintain multiple copies of each partition
- Hinted handoff - Forward requests if owners are unavailable
- Read/write optimization - Tunable for different application needs
We'll dive into each area in more detail below.
Node Clusters
The basic building block of Riak KV is a cluster of nodes. Each node runs the Riak software and holds a subset of the overall data partitions.
Scaling out capacity is as simple as adding more nodes to the cluster. Riak will automatically rebalance partitions across the new nodes.
Node failures are handled gracefully via request forwarding and hinted handoffs. This provides high availability without complex failover logic.
Virtual Nodes
Each physical node runs multiple virtual nodes (vnodes). This spreads partitions evenly across physical nodes without manual configuration.
For example, with 3 nodes and 10 vnodes per node, you get 30 vnodes total that cleanly divide up the key space.
Adding more physical nodes automatically allocates more vnodes and rebalances partitions.
Replication
Riak replicates each partition to multiple nodes for fault tolerance. Writes are committed to multiple vnodes before being acknowledged.
The replication factor can be tuned based on availability needs. Higher replication provides more redundancy at the cost of increased storage and write latency.
Hinted Handoff
If a write comes in for a partition whose owners are unreachable, Riak will temporarily buffer the write on another node as a "hint". Once the destination node is back online, the write is forwarded and completed.
This mechanism keeps the cluster available during network splits or node failures. Writes are not lost and resiliency is maintained.
Tunable Optimizations
Riak provides many knobs for tuning read/write performance and replication:
- Number of replicas
- Read/write quorums
- Background vs synchronous replication
- Vector clocks for versioning
- Tombstones and active anti-entropy for repair
This allows optimizing Riak KV for different application profiles from low-latency transitory data to high durability archival data. Riak KV's architecture provides a resilient foundation for mission-critical applications like Uber. The automatic scaling, fault tolerance, and tunable performance make Riak KV well-suited for high volume read/write workloads needing low latency and high availability.
Access all course materials today
The rest of this tutorial's contents are only available for premium members. Please explore your options at the link below.