After I published our first system design interview question post – how to design twitter, we got so many requests for more tutorials like this.
Gainlo team has hand-picked a list of system design interview questions that are both classic and easy to extend. As a result, I believe these questions can be great examples to help you learn more about how to ace system design interviews.
Again, it’s worth to note that the analysis provided by Gainlo team is only used for system design interview illustration. The real solution in production can be quite different as we significantly simplified the problem here.
Create a photo sharing app
How to create a photo sharing app like Instagram?
There are several reasons we’d like to analyze this problem here. First of all, picture sharing systems are quite popular. I’m not picking a weird issue that has almost no applications in real world. Instead, there are many similar products like Pinterest, Flickr etc..
Secondly, the problem is general, which is extremely common in system design interviews. Usually, interviewers won’t ask you to solve a well-defined problem, which is exactly what makes many people uncomfortable.
Lastly, the analysis covers a wide range of topics like scalability, database, data analysis etc. so that it can be reused in other system design interview question as well.
As we have emphasized multiple times in previous posts, it’s recommended to start with a high-level solution and then you can dig into all sorts of details later.
The advantage of this approach is that you’re gonna have a clear idea of what you are trying to solve and interviewers are less likely to be confused.
To design a picture sharing system, it’s quite straightforward to identify two major objects – user object and picture object.
Personally, I’d like to use relational database to explain as it’s usually easier to understand. In this case, we will have a user table for sure, which contains information like name, email, registration date and so on. The same goes for picture table.
In addition, we also need to store two relations – user follow relation and user-picture relation. This comes very naturally and it’s worth to note that user follow relation is not bi-directional.
Therefore, having such data model allows users to follow each other. To check a user’s feed, we can fetch all pictures from people he follows.
Potential scale issues
The above solution should definitely work well. As an interviewer, I always like to ask what can go wrong when we have millions of users and how to solve it.
This question is a great way to test if a candidate can foresee potential scale issues and it’s better than just asking how can you solve problem XYZ.
Of course, there’re no standard answers and I would like to list few ideas as inspirations.
1. Response time
When users get to a certain number, it’s quite common to see slow response time becomes the bottleneck.
For instance, one costly operation is to render users feed. The server has to go over everyone the user follows, fetch all the pictures from them, and rank them based on particular algorithms. When a user has followed many people with a large number of pictures, the operation can be slow.
Various approaches can be applied here. We can upgrade the ranking algorithm if it’s the bottleneck, e.g. if we are ranking by date, we can just read the top N most recent pictures from each person with infinite scroll feature. Or we can use offline pipelines to precompute some signals that can speed up the ranking.
The point is that it’s unlikely to have someone following hundreds of users, but it’s likely to have someone with thousands of pictures. Therefore, accelerating the picture fetching and ranking is the core.
2. Scale architecture
When there are only tens of users and pictures, we may store and serve everything from a single server.
However, with millions of users, a single server is far from enough due to storage, memory, CPU bound issues etc.. That’s why it’s pretty common to see server crashes when there are a large number of requests.
To scale architecture, the rule of thumb is that service-oriented architecture beats monolithic application.
Instead of having everything together, it’s better to divide the whole system into small components by service and separate each component. For example, we can have database separate from web apps (in different servers) with load balancers.
3. Scale database
Even if we put the database in a separate server, it will not be able to store an infinite number of data.
At a certain point, we need to scale the database. For this specific problem, we can either do the vertical splitting (partitioning) by splitting the database into sub-databases like user database, comment database etc. or horizontal splitting (sharding) by splitting based on attributes like US users, European users.
You can check this post for deeper analysis of scalability issues.
It’s also interesting to discuss how to rank feeds (pictures) in users timelines.
Although it’s quite straightforward to rank everything in chronological order, is it the best approach? Such open-ended question is very common in system design interviews.
Actually, there can be quite a few alternatives. For example, an algorithm that combines time and how likely the user will like this picture is definitely promising.
To design such algorithm, a common strategy is to come up with a scoring mechanism that takes various features as signals and computes a final score for each picture.
Intuitively, features that matter a lot include like/comment numbers, whether the user has liked many photos of the owner and so on. A linear combination can be used as a starting point due to simplicity.
Later on, more advanced machine learning algorithms like collaborative filtering is worth to try.
Since a picture sharing system is full of images, I would like to ask what can be optimized related to images?
First of all, it’s usually recommended to store all pictures separately in production. Amazon S3 is one of the most popular storage systems. However, you don’t need to be able to come up with this.
The point is that images are usually of large size and seldom get updated. So a separate system for image storage has a lot of advantages. For instance, cache and replication can be much simpler when files are static.
Secondly, to save space, images should be compressed. One common approach is to only store/serve the compressed version of images. Google photos actually is using this approach with unlimited free storage.
There are still some topics that I haven’t covered in this post like how to build the explore feature in Instagram. I hope you can take some time to think about it.
Also for reference, you can check Instagram infrastructure and Flickr architecture. However, I don’t think they are very helpful to system design interviews as they are too focused on techniques rather than design principles.
If you find this post helpful, I would really appreciate if you can share it with your friends. Also you can check more system design interview questions and analysis here.