freely-10152 (1) (1)

How to Design Twitter (Part 2)

This post is talking about system design interview, especially the question – how to design twitter. If you haven’t checked the part I, please go and read it.

Alright, here’s the system design interview question – how to design twitter PART II. From what we have covered from last post, we know that system design question can be very open ended. Even for the same question, you will have the totally different discussion with different interviewers for sure.

The analysis is provided by Gainlo team and I’m pretty sure there are better approaches.

Twitter is already a huge product with tons of features. In this post, we’re gonna talk about how to design specific features of Twitter from the system design interview perspective.

 

Trending topics

Twitter shows trending topics at both the search page and your left column of the home page (maybe somewhere else as well). Clicking each topic will direct you to all related tweets.

The question would be how to design this feature.

If you remember what we have said in the previous post, it’s recommended to have high-level approach first. In a nutshell, I would divide the problem into two subproblems: 1. How to get trending topic candidates? 2. How to rank those candidates?

For topic candidates, there are various ideas. We can get the most frequent hashtags over the last N hours. We can also get the hottest search queries. Or we may even fetch the recent most popular feeds and extract some common words or phrases. But personally, I would go with the first two approaches.

Ranking can be interesting. The most straightforward way is to rank based on frequency. But we can further improve it. For instance, we can integrate signals like reply/retweet/favorite numbers, freshness. We may also add some personalized signals like whether there are many follows/followers talking about the topic.

Who to follow

Twitter also shows you suggestions about who to follow. Actually, this is a core feature that plays an important role in user onboarding and engagement.

If you play around the feature, you will notice that there are mainly two kinds of people that Twitter will show you – people you may know (friends) and famous account (celebrities/brands…).

It won’t be hard to get all these candidates as you can just search through user’s “following graph”and people within 2 or 3 steps aways are great candidates. Also, accounts with most followers can also be included.

The question would be how to rank them given that each time we can only show a few suggestions. I would lean toward using a machine learning system to do that.

There are tons of features we can use, e.g. whether the other person has followed this user, the number of common follows/followers, any overlap in basic information (like location) and so on so forth.

This is a complicated problem and there are various follow-up questions:

  • How to scale the system when there are millions/billions of users?
  • How to evaluate the system?
  • How to design the same feature for Facebook (bi-directional relationship)

 

Moments

Twitter shows you what’s trending now in hashtags. The feature is more complicated than trending topics and I think it’s necessary to briefly explain here.

Basically, Moments will show you a list of interesting topics for different categories (news, sports, fun etc.). For each topic, you will also get several top tweets discussing it. So it’s a great way to explore what’s going on at the current moment.

I’m pretty sure that there are a lot of ways to design this system. One option is to get hottest articles from news websites for the past 1-2 hours. For each article, find tweets related to it and figure out which category (news, sport, fun etc.) it belongs to. Then we can show this article as a trending topic in Moments.

Another similar approach is to get all the trending topics (same as the first section), figuring out each topic’s category, show them in Moment.

For both approaches, we would have the following three subproblems to solve: A. Categorize each tweet/topic to a category (news, sports etc.) B. Generate and rank trending topics at current moment C. Generate and rank tweets for each topic.

For A, we can pre-define several topics and do supervised learning. Or we may also consider clustering. In fact, text in tweets, user’s profile, follower’s comments contain a lot of information to make the algorithm accurate.

For B and C, since it’s similar to the first section of this post, I won’t talk about it now.

 

Search

Twitter’s search feature is another popular function that people use every day. If you totally have no idea about how search engine works, you may take a look at this tutorial.

If we limit our discussion only to the general feed search function (excluding users search and advanced search), the high-level approach can be pretty similar to Google search except that you don’t need to crawl the web. Basically, you need to build indexing, ranking and retrieval.

Things become quite interesting if you dig into how to design the ranking algorithm. Unlike Google, Twitter search may care more about freshness and social signals.

The most straightforward approach is to give each feature/signal a weight and then compute a ranking score for each tweet. Then we can just rank them by the score. Features can include reply/retweet/favorite numbers, relevance, freshness, users popularity etc..

But how do we evaluate the ranking and search? I think it’s better to define few core metrics like total number of searches per day, tweet click even followed by a search etc. and observe these metrics every day. They are also stats we should care whatever changes are made.

Conclusion

You’ll realize that many features are quite data-driven, which doesn’t mean that you must be a machine learning expert. But having some high-level ideas about how data analysis works can be definitely helpful.

Also good engineers are usually good at coming up with great questions. Whenever you have a solution, try to ask yourself all kinds of questions to better understand the system.

If you find this post helpful, I would really appreciate if you can share it with your friends. Also you can check more system design interview questions and analysis here.

The post is written by Gainlo - a platform that allows you to have mock interviews with employees from Google, Amazon etc..

I'd like to learn more

Share on Facebook5Tweet about this on TwitterShare on LinkedIn11Share on Reddit0

8 thoughts on “How to Design Twitter (Part 2)

  1. Hi,
    The most important topic about these questions relate to scale issues, and where we can find them in our layers of design.
    I think the scale issue must be discussed when we need to choose what kind of DB will we use? rational, graph based, distributed.
    Why we prefer one over the other?
    Should we consider the use of cache? what is it solving for us? what will store inside? what kind of cache?
    Should we use micro services? what is it good for?
    What will we use for frontend? Rest API? can we see one example?
    I wish I could find an answer to these questions. That will be a good interview demo.
    This article is too general even for high level design.
    Thanks
    O

    1. Hi Oded,

      Thank you for your suggestion. As you can imagine, it can be a whole book to cover all the aspects you mentioned. However, we’ll continue our posts on this topic in order to discuss more fields deeply. Stay tuned!

  2. Thanks Jake for the article
    I would also join Oded’s observation, I think that in such a large scale app my attention would be much into the scaling issues since that’s the basic expectation from a social network app at this scale.
    The features you pointed out are defiantly challenging but I would instinctively think they have a secondary priority when discussing this system design in an interview.

Leave a Reply

Your email address will not be published. Required fields are marked *