Rate Limiter System Design

Shivansh Dubey

3 min readMay 14, 2024

Design a rate limiter

Requirement gathering

Functional Requirment :

Focus is on developing service side rate limiter.
Rate limiter should be flexible enough to rate limit requests based on different criterias like IP_Address, UserId or any other property
Rate limiter should work for distributed environment.
Proper error response handling to let client know their request is throllted.

Non Functional Requirements:

Low latency
High fault tolerance.
Minimal memory overhead.
10M requests per day.

Capacity estimation

RPS :

10M requests per day = 10000000

Requests per sec = 10000000 / 24 / 60 / 60 ~ 115 RPS

Latency : > 10ms

High Level Design

When compared to Client side Rate limiter vs Server side Rate limiter, client side rate limiter is less flexible and less reliable as requests can be manipulated at client end.
Server side rate limiter has more control and flexible.

Now whether to implement rate limiting at microservice end or as a separate middleware service is up to business and system requriment.
If there is only one service in our system, rate limiting can be implemented at application service layer itself. If there are bunch of microservices and gateway service component already present in system archietecture, we can implement rate limiting at ApiGateway layer itself.

High level design for rate limit implementation at API gateway layer

Algorithms for Rate limiting:

There are various rate limiting algorithms which suits wells for various usecases each having their own pros and cons. Some of the common rate limiting algorithms are :

1- Token-Bucket

2- Leaking Bucket

3- Fixed window counter

4- Sliding window log

5- Sliding window counter

Rate limiting Rules :

ratelimit:
  api : /v1/xyz
    algorithm : sliding-window
    criteria : IP-ADDRESS
    throttle:
        threshold : 5
        timeFrameInSecs : 60

Rate limiting Headers :

We want client know necessary information about resource that has rate limiting applied on it, and whether request is been throttled, what time they should retry again etc. Below are headers that can be passed in response for the same.

X-Ratelimit-Remaining: The remaining number of allowed requests within the window.
X-Ratelimit-Limit: It indicates how many calls the client can make per time window.
X-Ratelimit-Retry-After: The number of seconds to wait until you can make a request again
without being throttled.

Error code : 429 is standard error code that tells client they have been throttled and should look for headers to handle requests at their end.

Deep Dive

All incoming requests from clients have to pass thru ApiGateway layer. We have implemented ratelimiter as part of API gateway itself.
Throttling rules are loaded from disk and stored IN-Memory cache for quick access.
Requests are intercepts, validated and decision is taken to passthru or decline request. Incase of request discardation, appriopiate Error code 429 and X-Ratelimit-Retry-After headers are send in response.
All the Rate limiting instance data for clients are stored in Redis for quick access. Having this data in redis also lets us scale our rate limiter middleware indendently since requests can land on any instance of APIGateway.

Race conditions is one problem that needs to be addressed in distributed enviroment setup. This can be addressed by Locking via Lua scripts and sorted sets data structure in redis.

Monitoring and Alerting

Analystics are important feature to keep in mind while implementing rate limiter. We might end up in scenarios where we have either strict rules and ratelimiting requests in high volume or Rules are either too linent that rate limiter purpose is not achieved at all. Apart from this deriving metrics on api interaction pattern w.r.t clients, time etc are also important usecase covered with help of monitoring. We can use Grafana and promotheus to achive this.