One Pager Cheat Sheet
- This article provides a tutorial on system architecture for developing a digital service similar to Twitter, which has nearly
200 million users
worldwide. - Twitter's core features include the user's ability to
tweet
,follow people
, view theirown timeline
, see ahome timeline
(tweets shared by people followed), and search using aninternal search engine
with hashtags and keywords; understanding these features is essential to discussing the platform's system design. - Twitter's high-level architecture relies on MySQL databases to handle its data, generating a new row for each user in the
Users
table and storing their tweets in theTweets
table, and a notable concept of afeed
is used to connect and display the tweets of the users followed by a particular user. - Twitter's database architecture adheres to a relational database model where
user
information andtweets
are stored inseparate tables
namelyUsers
andTweets
respectively, and are connected through aprimary key-foreign key
relationship to maintaindata integrity
and preventdata duplication
. - The bottleneck issue in fetching information from the tweet table is addressed by introducing a Followers table in the architecture and using a Redis Cluster to manage the high volume of queries and maintain
eventual consistency
, while also storing tweets and user info in a separate database. - Twitter's architecture utilizes two major timelines; the User Timeline, which shows a user's chronological tweets and retweets fetched from the user table and optimized using a caching layer, and the Home Timeline, which displays the user's followed content using a fanout caching approach for efficiency. For users with a large following, the architecture uses a combination of the home timeline approach and synchronous calls to optimize tweet loading, while inactive users' timelines are not precalculated or stored in
the cache
. - Twitter uses the
fanout
approach, dependent oncache
rather thandatabase
, to immediately push tweets to the followers' in-memory timelines, thus optimizing data delivery and enhancing user-friendliness through quicker, more efficient request handling. - Twitter uses Earlybird, a search based reverse-indexing Lucene, to efficiently break down and tag every tweet for searching purposes, as well as a dividing, scattering and gathering tool to ensure fast global searching, with search results being ranked based on the popularity of tweets.
- The one-to-many relationship between user tables and tweet tables on platforms like Twitter, where a single user can generate multiple tweets, is crucial in
database terminology
, with the user table being theprimary key
and the tweet table using thatprimary key
as aforeign key
; this relationship is essential for the functioning of Twitter's search engine. - Twitter has an emmaculate system design that efficiently supports diverse services such as
timeline service
,searching
etc., with almost negligible room for error.