Deep Dive into WebSockets and Their Role in Client-Server Communication

How WebSockets work, its tradeoffs, and how to design a real time messaging appImage by Kelly from UnsplashReal-time communication is everywhere — live chatbots, data streams, or instant messaging. WebSockets are a powerful enabler of this, but when should you use them? How do they work, and how do they differ from traditional HTTP requests?This article was inspired by a recent system design interview — “design a real time messaging app” — where I stumbled through some concepts. Now that I’ve dug deeper, I’d like to share what I’ve learned so you can avoid the same mistakes.In this article, we’ll explore how WebSockets fit into the bigger picture of client‑server communication. We’ll discuss what they do well, where they fall short, and — yes — how to design a real‑time messaging app.Client-server communicationAt its core, client-server communication is the exchange of data between two entities: a client and a server.The client requests for data, and the server processes these requests and returns a response. These roles are not exclusive — services can act as both a client and a server simultaneously, depending on the context.Before diving into the details of WebSockets, let’s take a step back and explore the bigger picture of client-server communication methods.1. Short pollingShort polling is the simplest, most familiar approach.The client repeatedly sends HTTP requests to the server at regular intervals (e.g., every few seconds) to check for new data. Each request is independent and one-directional (client → server).This method is easy to set up but can waste resources if the server rarely has fresh data. Use it for less time‑sensitive applications where occasional polling is sufficient.2. Long pollingLong polling is an improvement over short polling, designed to reduce the number of unnecessary requests. Instead of the server immediately responding to a client request, the server keeps the connection open until new data is available. Once the server has data, it sends the response, and the client immediately establishes a new connection.Long polling is also stateless and one-directional (client → server).A typical example is a ride‑hailing app, where the client waits for a match or booking update.3. WebhooksWebhooks flip the script by making the server the initiator. The server sends HTTP POST requests to a client-defined endpoint whenever specific events occur.Each request is independent and does not rely on a persistent connection. Webhooks are also one-directional (server to client).Webhooks are widely used for asynchronous notifications, especially when integrating with third-party services. For example, payment systems use webhooks to notify clients when the status of a transaction changes.4. Server-Sent Events (SSE)SSEs are a native HTTP-based event streaming protocol that allows servers to push real-time updates to clients over a single, persistent connection.SSE works using the EventSource API, making it simple to implement in modern web applications. It is one-directional (server to client) and ideal for situations where the client only needs to receive updates.SSE is well-suited for applications like trading platforms or live sports updates, where the server pushes data like stock prices or scores in real time. The client does not need to send data back to the server in these scenarios.But what about two-way communication?All the methods above focus on one‑directional flow. For true two‑way, real‑time exchanges, we need a different approach. That’s where WebSockets shine.Let’s dive in.How do WebSockets work?WebSockets enable real-time, bidirectional communication, making them perfect for applications like chat apps, live notifications, and online gaming. Unlike the traditional HTTP request-response model, WebSockets create a persistent connection, where both client and server can send messages independently without waiting for a request.The connection begins as a regular HTTP request and is upgraded to a WebSocket connection through a handshake.Once established, it uses a single TCP connection, operating on the same ports as HTTP (80 and 443). Messages sent over WebSockets are small and lightweight, making them efficient for low-latency, high-interactivity use cases.WebSocket connections follow a specific URI format: ws:// for regular connections and wss:// for secure, encrypted connections.What’s a handshake?A handshake is the process of initialising a connection between two systems. For WebSockets, it begins with an HTTP GET request from the client, asking for a protocol upgrade. This ensures compatibility with HTTP infrastructure before transitioning to a persistent WebSocket connection.Client sends a request, with headers that look like:GET /chat HTTP/1.1Host: server.example.comUpgrade: websocketConnection: UpgradeSec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==Origin: http://example.comSec-WebSocket-Protocol: chat, superchatSec-WebSocket-Version: 13Upgrade — signals the request to switch the

Feb 3, 2025 - 19:57

Deep Dive into WebSockets and Their Role in Client-Server Communication

How WebSockets work, its tradeoffs, and how to design a real time messaging app

Real-time communication is everywhere — live chatbots, data streams, or instant messaging. WebSockets are a powerful enabler of this, but when should you use them? How do they work, and how do they differ from traditional HTTP requests?

This article was inspired by a recent system design interview — “design a real time messaging app” — where I stumbled through some concepts. Now that I’ve dug deeper, I’d like to share what I’ve learned so you can avoid the same mistakes.

In this article, we’ll explore how WebSockets fit into the bigger picture of client‑server communication. We’ll discuss what they do well, where they fall short, and — yes — how to design a real‑time messaging app.

Client-server communication

At its core, client-server communication is the exchange of data between two entities: a client and a server.

The client requests for data, and the server processes these requests and returns a response. These roles are not exclusive — services can act as both a client and a server simultaneously, depending on the context.

Before diving into the details of WebSockets, let’s take a step back and explore the bigger picture of client-server communication methods.

1. Short polling

Short polling is the simplest, most familiar approach.

The client repeatedly sends HTTP requests to the server at regular intervals (e.g., every few seconds) to check for new data. Each request is independent and one-directional (client → server).

This method is easy to set up but can waste resources if the server rarely has fresh data. Use it for less time‑sensitive applications where occasional polling is sufficient.

2. Long polling

Long polling is an improvement over short polling, designed to reduce the number of unnecessary requests. Instead of the server immediately responding to a client request, the server keeps the connection open until new data is available. Once the server has data, it sends the response, and the client immediately establishes a new connection.

Long polling is also stateless and one-directional (client → server).

A typical example is a ride‑hailing app, where the client waits for a match or booking update.

3. Webhooks

Webhooks flip the script by making the server the initiator. The server sends HTTP POST requests to a client-defined endpoint whenever specific events occur.

Each request is independent and does not rely on a persistent connection. Webhooks are also one-directional (server to client).

Webhooks are widely used for asynchronous notifications, especially when integrating with third-party services. For example, payment systems use webhooks to notify clients when the status of a transaction changes.

4. Server-Sent Events (SSE)

SSEs are a native HTTP-based event streaming protocol that allows servers to push real-time updates to clients over a single, persistent connection.

SSE works using the EventSource API, making it simple to implement in modern web applications. It is one-directional (server to client) and ideal for situations where the client only needs to receive updates.

SSE is well-suited for applications like trading platforms or live sports updates, where the server pushes data like stock prices or scores in real time. The client does not need to send data back to the server in these scenarios.

But what about two-way communication?

All the methods above focus on one‑directional flow. For true two‑way, real‑time exchanges, we need a different approach. That’s where WebSockets shine.

Let’s dive in.

How do WebSockets work?

WebSockets enable real-time, bidirectional communication, making them perfect for applications like chat apps, live notifications, and online gaming. Unlike the traditional HTTP request-response model, WebSockets create a persistent connection, where both client and server can send messages independently without waiting for a request.

The connection begins as a regular HTTP request and is upgraded to a WebSocket connection through a handshake.

Once established, it uses a single TCP connection, operating on the same ports as HTTP (80 and 443). Messages sent over WebSockets are small and lightweight, making them efficient for low-latency, high-interactivity use cases.

WebSocket connections follow a specific URI format: ws:// for regular connections and wss:// for secure, encrypted connections.

What’s a handshake?

A handshake is the process of initialising a connection between two systems. For WebSockets, it begins with an HTTP GET request from the client, asking for a protocol upgrade. This ensures compatibility with HTTP infrastructure before transitioning to a persistent WebSocket connection.

Client sends a request, with headers that look like:

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13

Upgrade — signals the request to switch the protocol
Sec-WebSocket-Key — Randomly generated, base64 encoded string used for handshake verification
Sec-WebSocket-Protocol (optional) — Lists subprotocols the client supports, allowing the server to pick one.

2. Server responds to resquest

If the server supports WebSockets and agrees to the upgrade, it responds with a 101 Switching Protocols status. Example headers:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat

Sec-WebSocket-Accept — Base64 encoded hash of the client’s Sec-WebSocket-Key and a GUID. This ensures the handshake is secure and valid.

3. Handshake validation

With the 101 Switching Protocols response, the WebSocket connection is successfully established and both client and server can start exchanging messages in real time.

This connection will remain open till it is explicitly closed by either party.

If any code other than 101 is returned, the client has to end the connection and the WebSocket handshake will fail.

Here’s a summary.

WebSocket use cases

We’ve talked about how WebSockets enable real-time, bidirectional communication, but that’s still pretty abstract term. Let’s nail down some real examples.

WebSockets are widely used in real-time collaboration tools and chat applications, such as Excalidraw, Telegram, WhatsApp, Google Docs, Google Maps and the live chat section during a YouTube or TikTok live stream.

Trade offs

1. Having a fallback strategy if connections are terminated

WebSockets don’t automatically recover if the connection is terminated due to network issues, server crashes, or other failures. The client must explicitly detect the disconnection and attempt to re-establish the connection.

Long polling is often used as a backup while a WebSocket connection tries to get reestablished.

2. Not optimised for streaming audio and video data

WebSocket messages are designed for sending small, structured messages. To stream large media data, a technology like WebRTC is better suited for these scenarios.

3. WebSockets are stateful, hence horizontally scaling is not trivial

WebSockets are stateful, meaning the server must maintain an active connection for every client. This makes horizontal scaling more complex compared to stateless HTTP, where any server can handle a client request without maintaining persistent state.

You’ll need an additional layer of pub/sub mechanisms to do this.

Design a real time messaging app

Now let’s see how this is applied in system design. I’ve covered both the simple (unscalable) solution and a horizontally scaled one.

End-to-end flow for a horizontally scaled, real time 1–1 chat (drawn by me)

Non-scalable single server app: How do two users chat real time?

All users connect via WebSocket to one server. The server holds an in-memory mapping of userID : WebSocket conn 1
user1 sends the message over its WebSocket connection to the server.
The server writes the message to the MessageDB (persistence first).
The server then looks up user2 : WebSocket conn 2 in it’s in memory map. If user2 is online, it delivers the message in real time.
If user2 is offline, the server writes to InboxDB (a store of undelivered messages). When user2 returns online, the server fetches all offline messages from InboxDB.

Horizontally scaled system: How do two users chat real time?

A single server can only handle so many concurrent WebSockets. To serve more users, you need to horizontally scale your WebSocket connections.

The key challenge: If user1 is connected to server1 but user2 is connected to server2, how does the system know where to send the message?

Redis can be used as a global data store that maps userID : serverID for active WebSocket sessions. Each server updates Redis when a user connects (goes online) or disconnects (goes offline).

For instance:

user1 connects to server1.
server1’s in memory map: user1 : WebSocket connection
server1 also writes to Redis: user1 : server1
user2 connects to server2.
server2’s in memory map: user2 : WebSocket connection
server2 also writes to Redis: user2 : server2

End to end chat flow: user1 sends a message to user2

user1 sends a message through it’s WebSocket on server1.
server1 passes the message to a Chat Service.
Chat Service first writes the message to MessageDB for persistence.
Chat Service then checks Redis to get the online/offline status of user2.
If user2 is online, Chat Service publishes the message to a message broker, tagging it with: “user2: server2”.
The broker then routes the message to server2.
server2 looks up it’s local in memory mapping to find the WebSocket connection of user2 and pushes the message real time over that WebSocket.
If user2 is offline (no entry in Redis), Chat Service writes the message to the InboxDB. When user2 returns online, Chat Service will fetch all the undelivered messages.
Whenever a new WebSocket connection is opened or closed, the servers update Redis.
When a user first loads the app or opens a chat, the Chat Service fetches historical messages (e.g., from the last 10 days) from MessageDB. A cache layer can reduce repeated DB queries.

Some important design considerations:

Persistence first
All messages go to the DB before being delivered. If a push to WebSocket fails, the message is still safe in the DB.
Redis
Stores only active connections to minimize overhead.
A replica can be added to prevent a single point of failure.
Inbox DB helps to handle offline cases cleanly.
Chat Service abstraction
The WebSocket servers handle real‐time connections and routing.
The Chat Service layer handles HTTP requests and all DB writes.
This separation of concerns makes it easier to scale or evolve each piece.
Ensuring in order delivery of messages
Typical “real time push” workflows can have network variations, leading to messages arriving out of order.
Many message brokers also do not guarantee strict ordering.
To handle this, each message is assigned a timestamp at creation. Even if messages arrive out of order, the client can reorder them based on the timestamp.
Load balancers
L4 Load Balancer (TCP) for sticky WebSocket connections.
L7 Load Balancer (HTTP) for regular requests (CRUD, login, etc).

Wrapping up

That’s all for now! There’s so much more we could explore, but I hope this gave you a solid starting point. Feel free to drop your questions in the comments below :)

I write regularly on Python, software development and the projects I build, so give me a follow to not miss out. See you in the next article.

Deep Dive into WebSockets and Their Role in Client-Server Communication was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.