Adaptive Rate Limiter

Overview

The Adaptive Rate Limiter is designed to handle the complex rate limiting requirements of modern AI APIs. It goes beyond simple token buckets by actively discovering rate limits from API responses, managing distributed state across multiple instances, and providing first-class support for streaming responses.

Get Started

Install the library and run your first rate-limited request in minutes.

Key Features

Provider-Agnostic: Works with any OpenAI-compatible API (OpenAI, Anthropic, Venice, Groq, Together, etc.)
Adaptive Strategies: Intelligent rate limit discovery from response headers
Streaming Support: Automatic reservation tracking for streaming responses with refund-based accounting
Distributed Backends: In-memory for single instances, Redis for distributed deployments
Multiple Scheduling Modes: Basic, Intelligent, and Account-level strategies
Observability: Built-in Prometheus metrics collection
Type-Safe: Full typing with protocols and Pydantic models

Quick Start

from adaptive_rate_limiter import (
    ProviderInterface,
    DiscoveredBucket,
    RateLimitInfo,
    ReservationContext,
    RequestMetadata,
    TEXT,
)

# Define your provider
class MyProvider(ProviderInterface):
    # ... implement required methods
    pass

# Create a reservation context for tracking
reservation = ReservationContext(
    reservation_id="req-abc123",      # unique ID for this reservation
    bucket_id="openai/gpt-5/tokens",  # rate limit bucket identifier
    estimated_tokens=1500,            # tokens reserved for this request
)

# Use with scheduler to make rate-limited requests
async def make_request():
    metadata = RequestMetadata(
        request_id="unique-request-id",
        resource_type=TEXT,
        estimated_tokens=1000,
    )
    response = await scheduler.submit_request(metadata, request_func)

See the Quick Start Guide for complete examples.

Public API

The library exports 30+ public symbols. Here are the key imports:

from adaptive_rate_limiter import (
    # Core - Scheduler
    Scheduler, create_scheduler, RateLimiterConfig,

    # Core - Protocols
    ClientProtocol, RequestMetadata,

    # Providers
    ProviderInterface, DiscoveredBucket, RateLimitInfo,

    # Exceptions
    RateLimiterError, CapacityExceededError, BucketNotFoundError,

    # Resource Types
    TEXT, IMAGE, AUDIO, EMBEDDING, GENERIC,
)

# Subpackage imports
from adaptive_rate_limiter.scheduler import SchedulerMode, StateConfig, CachePolicy
from adaptive_rate_limiter.backends import MemoryBackend, RedisBackend
from adaptive_rate_limiter.streaming import StreamingReservationContext
from adaptive_rate_limiter.reservation import ReservationContext, ReservationTracker

Explore the Docs

Quick Start

Installation and basic usage

Configuration

49 configuration options

Backends

Memory and Redis state storage

Providers

Custom AI provider integration

Streaming

Streaming response support

Reservations

Reservation tracking system

Exceptions

Error handling patterns

Observability

Prometheus metrics

Architecture

The library uses a scheduler-based architecture where requests are submitted to a scheduler which manages queues, rate limits, and execution.

License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

Getting Started

Core Concepts

Features

Reference

Overview

Get Started

Key Features

Quick Start

Public API

Explore the Docs

Quick Start

Configuration

Backends

Providers

Streaming

Reservations

Exceptions

Observability

Architecture

License

Getting Started

Core Concepts

Features

Reference

​Overview

Get Started

​Key Features

​Quick Start

​Public API

​Explore the Docs

Quick Start

Configuration

Backends

Providers

Streaming

Reservations

Exceptions

Observability

​Architecture

​License

Overview

Key Features

Quick Start

Public API

Explore the Docs

Architecture

License