Twitter System Design

Ankita
5 min readNov 9, 2020

--

Functional requirements

1) User should be able to signup.

2) User should be able to tweet

3)User should be able to see the home page.

4) User should be able to see the user page.

Non Functional requirement

1) Reads are more than writes. ratio of reads to write is 100:1

2) Home page should be able to reflect new changes without refreshing

3) When user tweet, it should be fast reflected in his user page. Though, reflecting in the followings homepage may have a lag.

Stats

Total 1.3 billion accounts

330 million monthly active users and 145 million daily active users

Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, 500 million tweets per day

Lets segregate the users on the basis of their activity status

1)Active

2)Offline

Lets start with the functional requirement flow

User Onboarding Flow

UserService :- Has all the details of user like userId, firstName, lastName, emailId, userName, activityStatus(Active, offline), isFamous . It stores all this data in RDBMS which can easily handle billions of data. And the data is also stored in redis with userId as a key and complete details (userObject) as a value.

Api details

1) get userObject by userId

2) bulk get List of userObject by userIds

3) post userObject

4) update userObject

Request To follow Flow

This call goes to userLinkService.

UserLinkService

This service get all the follower and following information. It stores the data in RDBMS (maybe sharded)
with table structure

RDBMS

Following table (sharded by userId)

userId, followingId

Follower Table (sharded by userId)

userId, followerId

REDIS

Followers
key : userId and value: list of Ids of followers

Followings

key : userId and value : list of Ids of followings

Api details :-

1) get following list by userId ( paginated view)
2) get followers by userId (paginated view)

When user tweets Flow

Tweet goes to Tweet Ingestion service

Tweet Ingestion service

This service ingests the tweet and stores the data in cassandra database and pushes an event to kafka message.

Cassandra table

userId, tweetId, tweet

partition key (userId), clusterKey(tweetId, timeStamp)

Redis

Tweet

key : TweetId, value : Tweet

UserPage
key : UserId, value : List<TweetId>

(Lpush and Ltrim to keep the size constant)

Kafka message

Event key: userId

Event value: tweetId, timeStamp

Api :-

1) get tweet by tweetId and userId

2) get bulk tweets by list of tweetId and userId pair

Tweet Service :-

This service is called for all tweet related information.

Api :-

1) Get recent tweets by list of user Id.

2) Get tweets by tweet Ids

Updating the HomePage and UserPage Flow

Tweet Consumer Service :-

This service consumes the Kafka message put out by Tweet Ingestion Service and updates the Home page and UserPage of a user.

Kafka message received is

Event key: userId

Event value: tweetId, timeStamp

Connection Service :-

This service keeps the web socket connection open with the user client which are active. It sends in the tweets to the user for home page and user page.

Flow goes like :-

  • From the Kafka message get the userId and tweetId
  • Get the userObject from userService, we get to know if the activity status of the user is active or offline.
  • Get List of followers for the userId from UserLink Service
  • If the follower is active, Tweet Consumer sends the request to Connection Service to push tweet to follower client using web socket.
  • Then Tweet Consumer Service adds the entry in the homePage and userPage tables in cassandra HomePage
    followerId, tweetId, userId, timestamp
    partition key : followerId , clusterKey : tweetId, timestamp
    UserPage
    userId, tweetId, timestamp
    partition key : userId , clusterKey : tweetId, timestamp
  • Then Tweet consumer adds (ASYNCHRONOUSLY) tweetId and posterId entry to UserPage of the user and HomePage for the follower.
    UserPage
    key: userId, value : List<tweetId>
    HomePage
    key: followerId, value : List<Object<userId, tweetId>>
    We use LPUSH for adding new entry to the collections and do LTRIM to remove the old entries.

Offline user comes online Flow

Twitter app contacts the display Service.

Display Service has these functions

  • queries from Userlink service for the followings of the user.
  • Check from redis if the homepage and userPage entry is present for the user.
    If present, and last modified time is less than 5 minutes, it returns the tweetId and posterId from the redis cluster for that user.
    If present and older modified time, it takes the last modified time and queries cassandra table HomePage table for post that timestamp.
    If not present, it queries in cassandra HomePage table for top 15 latest tweetId in HomePage.
  • Now it queries from Tweet service bulk get call to get the tweets for List of userId, tweetId pair.
  • Asks Connection Service to establish a websocket for that client and then sends the tweets to the connection service, which will be pushed to twitter app.
  • And post that the redis HomePage and UserPage is updated with new enteries using LPUSH and old entries removed using LTRIM.

Optimisations

CACHING

We have used REDIS cache for caching

1) UserService :- Rather than querying from RDBMS, the service tries to use redis to get userId to userObject values.

2) TweetService :- Rather than querying cassandra, the service tries to use redis to get the tweetId to tweet values.

3)UserLinkService :- Rather than querying RDBMS , the service tries to use redis to get the userId to follower values and userId to following values.

4) DisplayService :- Rather than querying Cassandra , the service tries to use redis to get the homepage and userPage of the userId.

Next article on How to handle the famous Users of Twitter

--

--

Ankita

Senior Developer at Twitter. Tech Enthusiast. Learner. If I can so can you !