🔒 Security + 🔧 DevOps intermediate

Rate Limiting

A technique to control the number of requests a client can make to an API or service within a specified time window.

What is Rate Limiting?

Rate limiting is a strategy used to control the rate of requests that clients can make to a server. It protects APIs from abuse, ensures fair usage, and prevents system overload.

Why Rate Limit?

Threat Protection
DDoS attacks Limit request flood
Brute force Slow down password guessing
Scraping Prevent data harvesting
API abuse Enforce fair usage
Cost control Limit expensive operations

Common Algorithms

1. Fixed Window

Count requests in fixed time intervals:

Window: 1 minute
Limit: 100 requests

[Minute 0-1]: 100 requests allowed
[Minute 1-2]: Counter resets, 100 more allowed

Problem: Burst at window boundaries (200 requests in 2 seconds)

2. Sliding Window

Smooths out the fixed window problem:

Current time: 1:30
Look back 60 seconds
Count: 30 requests in last 30 sec of previous window
     + 50 requests in first 30 sec of current window
     = 80 weighted requests

3. Token Bucket

Tokens replenish over time:

Bucket capacity: 10 tokens
Refill rate: 1 token/second

Request arrives:
  - If tokens > 0: Allow, consume 1 token
  - If tokens = 0: Reject (429 Too Many Requests)

4. Leaky Bucket

Requests processed at constant rate:

Queue capacity: 10
Processing rate: 1 request/second

Overflow: Rejected when queue full

Implementation

FastAPI with slowapi

from fastapi import FastAPI
from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)
app = FastAPI()

@app.get("/api/search")
@limiter.limit("10/minute")
async def search(request: Request, q: str):
    return {"results": [...]}

Express.js with express-rate-limit

const rateLimit = require("express-rate-limit");

const limiter = rateLimit({
    windowMs: 60 * 1000,  // 1 minute
    max: 100,             // 100 requests per window
    message: "Too many requests"
});

app.use("/api/", limiter);

Nginx

http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
        }
    }
}

Response Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640000000
Retry-After: 30

Rate Limit by Key

Key Type Use Case
IP address Anonymous users
API key Authenticated apps
User ID Per-user limits
Endpoint Different limits per route

Best Practices

✅ Return 429 Too Many Requests with Retry-After header
✅ Use distributed storage (Redis) for multi-server setups
✅ Different limits for different endpoints
✅ Higher limits for authenticated users
✅ Implement backoff in clients
❌ Don't rely solely on IP (NAT, proxies)

// Example Usage

GitHub's API allows 60 requests/hour for unauthenticated users and 5,000/hour for authenticated users.