Mark As Completed Discussion

One Pager Cheat Sheet

  • This article provides a tutorial on system architecture for developing a digital service similar to Twitter, which has nearly 200 million users worldwide.
  • Twitter's core features include the user's ability to tweet, follow people, view their own timeline, see a home timeline (tweets shared by people followed), and search using an internal search engine with hashtags and keywords; understanding these features is essential to discussing the platform's system design.
  • Twitter's high-level architecture relies on MySQL databases to handle its data, generating a new row for each user in the Users table and storing their tweets in the Tweets table, and a notable concept of a feed is used to connect and display the tweets of the users followed by a particular user.
  • Twitter's database architecture adheres to a relational database model where user information and tweets are stored in separate tables namely Users and Tweets respectively, and are connected through a primary key-foreign key relationship to maintain data integrity and prevent data duplication.
  • The bottleneck issue in fetching information from the tweet table is addressed by introducing a Followers table in the architecture and using a Redis Cluster to manage the high volume of queries and maintain eventual consistency, while also storing tweets and user info in a separate database.
  • Twitter's architecture utilizes two major timelines; the User Timeline, which shows a user's chronological tweets and retweets fetched from the user table and optimized using a caching layer, and the Home Timeline, which displays the user's followed content using a fanout caching approach for efficiency. For users with a large following, the architecture uses a combination of the home timeline approach and synchronous calls to optimize tweet loading, while inactive users' timelines are not precalculated or stored in the cache.
  • Twitter uses the fanout approach, dependent on cache rather than database, to immediately push tweets to the followers' in-memory timelines, thus optimizing data delivery and enhancing user-friendliness through quicker, more efficient request handling.
  • Twitter uses Earlybird, a search based reverse-indexing Lucene, to efficiently break down and tag every tweet for searching purposes, as well as a dividing, scattering and gathering tool to ensure fast global searching, with search results being ranked based on the popularity of tweets.
  • The one-to-many relationship between user tables and tweet tables on platforms like Twitter, where a single user can generate multiple tweets, is crucial in database terminology, with the user table being the primary key and the tweet table using that primary key as a foreign key; this relationship is essential for the functioning of Twitter's search engine.
  • Twitter has an emmaculate system design that efficiently supports diverse services such as timeline service, searching etc., with almost negligible room for error.