Design Facebook Chat Function

One of the most interesting parts of preparing system design interview is that you can get to know a lot of details about how existing systems are built.

To make the weekly post more helpful, I’d like to cover a wide range of topics. We’ve been talking about stuff like recommendation, ranking a lot in the past few weeks, this time I want to cover something different.


It starts with a very simple question – how to design Facebook chat function?

With great news like Facebook buys Whatsapp for $19B and Facebook messenger gets really popular recently, chat function is definitely a hot topic. So in this post, I’m quite happy to talk about messages.

Few things to mention here. First and foremost, as I mentioned in previous posts, system design interviews can be extremely diversified. It’s mostly up to the interviewer to decide which direction to discuss. As a result, different interviewers can have completely different discussions even with the same question and you should never expect this article to be something like a standard answer.

Also, I’ve never worked on Facebook messenger nor Whatsapp. All the discussion here is based on Gainlo team’s analysis.


Basic infrastructure

As said earlier, it’s better to have a high-level solution and talk about the overall infrastructure. If you have no prior experience with messaging app, you might find it not easy to come up with a basic solution. But that’s totally fine. Let’s have a very naive solution and optimize it later.

Basically, one of the most common ways to build a messaging app is to have a chat server that acts as the core of the whole system. When a message comes, it won’t be sent to the receiver directly. Instead, it goes to the chat server and is stored there first. And then, based on the receiver’s status, the server may send the message immediately to him or send a push notification.

A more detailed flow works like this:

  • User A wants to send message “Hello Gainlo” to user B. A first send the message to the chat server.
  • The chat server receives the message and sends an acknowledgement back to A, meaning the message is received. Based on the product, the front end may display a single check mark in A’s UI.
  • Case 1: if B is online and connected to the chat server, that’s great. The chat server just sends the message to B.
  • Case 2: If B is not online, the chat server sends a push notification to B.
  • B receives the message and sends back an acknowledgement to the chat server.
  • The chat server notifies A that B received the message and updates with a double check mark in A’s UI.



The whole system can be costly and inefficient once it’s scaled to certain level. So any way we can optimize the system in order to support a huge amount of concurrent requests?

There are many approaches. One obvious cost here is that when delivering messages to the receiver, the chat server might need to spawn an OS process/thread, initialize HTTP (maybe other protocol) request and close connection at the end. In fact, this happens to every message. Even if we do the other way around that the receiver keeps requesting the server to check if there’s any new message, it’s still costly.

One solution is to use HTTP persistent connection. In a nutshell, receivers can make an HTTP GET request over a persistent connection that doesn’t return until the chat server provides any data back. Each request will be re-established when it’s timed out or interrupt. This approach provides a lot of advantages in terms of response time, throughput and cost.

If you want to know more about HTTP persistent connection, you can check things like BOSH.


Online notification

Another cool feature of Facebook chat is showing online friends. Although the feature seems to be simple at the first glance, it improves user experience tremendously and it’s definitely worth to discuss. If you are asked to design this feature, how would you do it?

Obviously, the most straightforward approach is that once a user is online, he sends a notification to all his friends. But how would you evaluate the cost of this?

When it’s at the peak time, we roughly need O(average number of friends * peak users) of requests, which can be a lot when there are millions of users. And this cost can be even more than the message cost itself. One idea to improve this is to reduce unnecessary requests. For instance, we can issue notification only when this user reloads a page or sends a message. In other words, we can limit the scope to only “very active users”. Or we won’t send notification until a user has been online for 5min. This solves the cases where a user shows online and immediately goes offline.



There are many other topics I haven’t covered in the post, for example if you dig deeper about the network stuff, we can talk about what network protocol can be used in the connection. Also, how to deal with system error and replicate the data can be interesting as well since chat app is quite different.

Feel free to leave a comment if you want to have further discussion with me.

The post is written by Gainlo - a platform that allows you to have mock interviews with employees from Google, Amazon etc..

I'd like to learn more

Share on Facebook0Tweet about this on TwitterShare on LinkedIn0Share on Reddit0

11 thoughts on “Design Facebook Chat Function

  1. Hi Jake,

    Thank you for awesome posts. I have been following your blogs for a while now and these are really helpful.

    Also it’d be great if you could dig a bit deeper into solutions, presenting few technical details about how the chat server and client would communicate with each other i.e. to say, whether we are looking into separate read and write channels to make the architecture async or something other than this. The reason for this suggestion is that after interviewing at few companies, I have experienced many interviewing diving into these technical details. I have found myself struggling when the questions narrow down to individual components and their interactions in the systems.

    Nevertheless, I have learnt a lot from your blogs and look forward to reading more of them. Thanks and I am really grateful to you and your team at Gainlo!


    1. Hi Lonely Warrior,

      Thanks for the great suggestion. The reason I didn’t go deeper in every technical details is that usually system design interview questions are quite broad and I can hardly cover all the aspects. Probably in our future posts, I can select one or two areas to go even deeper. But I’d like to cover as many areas as I can.

  2. Very helpful. Moreover, WebSocket is frequently used in modern real-time chatting networking component a lot, if you can have a discussion about that, that would be great.

  3. Thanks !!! Amazing post, loved its simple explanation.

    A small suggestion – it will be great if you can provide a rough sketch of high-level design and some comments about suitable protocol (or one vs other) for the application.

Leave a Reply

Your email address will not be published. Required fields are marked *