Rate Limiter

A rate limiter is a mechanism that restricts the number of requests or operations that can be performed by a user or system over a specified period of time. It ensures that a resource (e.g., an API, a server, or a database) isn't overwhelmed by too many requests in a short amount of time, preventing denial of service, resource exhaustion, or degrading performance for other users.

Importance of a Rate Limiter in an Application:

Prevent Resource Overload : Rate limiters protect systems from being overwhelmed by too many requests, which can lead to crashes, degraded performance, or server unavailability.
Security : They help mitigate denial-of-service (DoS) or brute-force attacks by limiting the number of requests a client can make in a given timeframe.
Fair Resource Distribution : They ensure fair usage of resources by preventing a single client or user from monopolizing system resources.
Cost Control : For paid services, rate limiting helps control costs by preventing excessive resource consumption.
Prevent Abuse : They restrict usage patterns that could lead to abuse of free-tier or limited-tier services.
Performance Optimization : Rate limiters help maintain optimal performance for the majority of users by preventing sudden spikes in load that can slow down systems.

Algorithms for rate limiting

Token Bucket : In this model, tokens are added to a bucket at a fixed rate (e.g., 10 tokens per second). Each request consumes a token. If the bucket is empty, the request is rejected or delayed until tokens are available again. It allows burst traffic within the token limits.

Example: A system allows 100 requests per minute.

Leaky Bucket : Similar to the token bucket, but with a constant outflow rate. Requests are processed at a constant rate, and if the bucket overflows, excess requests are either discarded or queued. It is usually implemented with a first-in-first-out (FIFO) queue. The algorithm works like when a request arrives, the system checks if the queue is full. If it is not full, the request is added to the queue, otherwise, the request is dropped. Requests are pulled from the queue and processed at regular intervals.

Example: Process 10 requests per second, with any excess placed in a queue or rejected.

Fixed Window : This model limits the number of requests within a fixed window of time (e.g., 1000 requests per minute). All requests within that window are counted, and once the limit is reached, further requests are blocked until the window resets.

Example: A system allows 100 requests per user per minute. At minute 00:00, the counter resets.

Sliding Window : Unlike the fixed window, this model calculates the rate limit based on a sliding time window. It adds more precision by calculating limits for a dynamic window based on the actual time of each request.

Example: Over a rolling window of the last 60 seconds, a user can send a maximum of 100 requests.

Concurrent Rate Limiting : This limits the number of concurrent requests or operations being processed simultaneously. This is especially useful for APIs or databases with capacity constraints.

Example: Only 5 concurrent connections per user to the system, and any new connection request waits for an existing one to finish.

Rate Limiting by IP/User : Rate limiting can also be based on specific user accounts or IP addresses to ensure that different users have unique limits.

Example: Allow 10 requests per second for each user or IP address.

Architecture

The concept behind rate-limiting algorithms is straightforward. Essentially, we need a counter to monitor how many requests are made by the same user, IP address, or other identifiers. If this counter exceeds the defined limit, further requests are blocked.

The challenge is determining where to store these counters. Using a database isn't ideal due to the slower nature of disk access. Instead, in-memory caching solutions are preferred because of their speed and ability to manage time-based expiration. For example, Redis is a widely used in-memory store for implementing rate limiting. It provides two useful commands:

INCR: This increments the counter by 1.
EXPIRE: This sets a timeout for the counter, automatically removing it when the time expires.

Popular available Rate Limiters

Redis-Based Rate Limiters : Redis is a popular choice for implementing rate limiting due to its in-memory speed and support for time-based expiration.

NGINX Rate Limiting : NGINX has built-in support for rate limiting, which is highly efficient for controlling the rate of HTTP requests to the server.

API Gateway Rate Limiting : AWS API Gateway: Provides built-in rate-limiting features for managing request rates and throttling at the API level.

Envoy Proxy : Envoy is a cloud-native proxy that supports rate limiting at the edge, making it ideal for microservices and service mesh environments.

Istio (Service Mesh) : Istio is a service mesh that offers rate limiting as part of its traffic management features.

Throttler (Node.js) : Express Rate Limit: A simple rate-limiting middleware for Express.js applications.

Choosing the Best Rate Limiter:

For distributed systems or microservices: Tools like Redis-based rate limiters, Envoy are ideal due to their ability to handle distributed state and high throughput.

For API gateways: Cloud-native gateways like AWS API Gateway or Azure API Management are great choices as they provide built-in rate limiting with minimal setup.

For monolithic apps: Libraries such as Guava (Java), Django Ratelimit (Python), or Rack Attack (Ruby) offer easy-to-integrate solutions for rate limiting within the application code.

Each tool has specific use cases, so the choice depends on the technology stack, traffic patterns, and performance requirements of your application.

Real-World Examples

API Usage Rate Limiting :

A third-party API like Twitter or GitHub limits users to a certain number of API requests per hour. For instance, GitHub’s API might allow 5000 requests per hour per user.

Purpose: Prevents abuse, maintains service performance, and ensures fair usage across all users.

Login Rate Limiting : A website might restrict users to 5 login attempts within a 15-minute window to prevent brute-force attacks.

Purpose: Enhances security by mitigating brute-force or credential-stuffing attacks.

Payment Gateway Rate Limiting : A payment gateway might limit API calls to process payments to 100 requests per minute.

Purpose: Ensures stable performance, prevents overload during peak times, and secures the service from fraudulent transactions.

Content Upload/Download Rate Limiting : A service like Dropbox or Google Drive limits the number of file uploads or downloads per user to 1000 requests per day.

Purpose: Protects against overuse of bandwidth and ensures the service remains responsive for all users.

Email Sending Rate Limiting : An email service like SendGrid,Microsoft exchange might limit the number of emails an account can send to 500 per hour.

Purpose: Prevents spam and ensures email servers remain functional for all users

Media Streaming Rate Limiting : A streaming service like Netflix or Spotify might limit API requests for fetching media content to prevent overuse of server resources.

Purpose: Ensures a smooth user experience and prevents server overload, especially during peak hours.

Summary :

A rate limiter controls the number of requests or operations a user or system can make within a set time to prevent resource overload, ensure fair access, and improve security by mitigating DoS attacks. Common rate-limiting algorithms include token bucket, leaky bucket, fixed window, sliding window, and concurrent rate limiting. Redis is a popular choice for rate limiting due to its speed and support for time-based expiration. Other tools include NGINX, API gateways (AWS, Azure), Envoy, and Istio. The best solution depends on the architecture, with Redis, Envoy, and cloud-native gateways suitable for microservices and distributed systems.

Reference :