AlgoDaily - Design of the Twitter Architecture

Home > Systems Design and Architecture 🔥 > High Level System Architectures > Design of the Twitter Architecture

One Pager Cheat Sheet

This article provides a tutorial on system architecture for developing a digital service similar to Twitter, which has nearly 200 million users worldwide.
Twitter's core features include the user's ability to tweet, follow people, view their own timeline, see a home timeline (tweets shared by people followed), and search using an internal search engine with hashtags and keywords; understanding these features is essential to discussing the platform's system design.
Twitter's high-level architecture relies on MySQL databases to handle its data, generating a new row for each user in the Users table and storing their tweets in the Tweets table, and a notable concept of a feed is used to connect and display the tweets of the users followed by a particular user.
Twitter's database architecture adheres to a relational database model where user information and tweets are stored in separate tables namely Users and Tweets respectively, and are connected through a primary key-foreign key relationship to maintain data integrity and prevent data duplication.
The bottleneck issue in fetching information from the tweet table is addressed by introducing a Followers table in the architecture and using a Redis Cluster to manage the high volume of queries and maintain eventual consistency, while also storing tweets and user info in a separate database.
Twitter's architecture utilizes two major timelines; the User Timeline, which shows a user's chronological tweets and retweets fetched from the user table and optimized using a caching layer, and the Home Timeline, which displays the user's followed content using a fanout caching approach for efficiency. For users with a large following, the architecture uses a combination of the home timeline approach and synchronous calls to optimize tweet loading, while inactive users' timelines are not precalculated or stored in the cache.
Twitter uses the fanout approach, dependent on cache rather than database, to immediately push tweets to the followers' in-memory timelines, thus optimizing data delivery and enhancing user-friendliness through quicker, more efficient request handling.
Twitter uses Earlybird, a search based reverse-indexing Lucene, to efficiently break down and tag every tweet for searching purposes, as well as a dividing, scattering and gathering tool to ensure fast global searching, with search results being ranked based on the popularity of tweets.
The one-to-many relationship between user tables and tweet tables on platforms like Twitter, where a single user can generate multiple tweets, is crucial in database terminology, with the user table being the primary key and the tweet table using that primary key as a foreign key; this relationship is essential for the functioning of Twitter's search engine.
Twitter has an emmaculate system design that efficiently supports diverse services such as timeline service, searching etc., with almost negligible room for error.

One Pager Cheat Sheet

Programming Categories

Popular Lessons