Overview
The Adaptive Rate Limiter is designed to handle the complex rate limiting requirements of modern AI APIs. It goes beyond simple token buckets by actively discovering rate limits from API responses, managing distributed state across multiple instances, and providing first-class support for streaming responses.Get Started
Install the library and run your first rate-limited request in minutes.
Key Features
- Provider-Agnostic: Works with any OpenAI-compatible API (OpenAI, Anthropic, Venice, Groq, Together, etc.)
- Adaptive Strategies: Intelligent rate limit discovery from response headers
- Streaming Support: Automatic reservation tracking for streaming responses with refund-based accounting
- Distributed Backends: In-memory for single instances, Redis for distributed deployments
- Multiple Scheduling Modes: Basic, Intelligent, and Account-level strategies
- Observability: Built-in Prometheus metrics collection
- Type-Safe: Full typing with protocols and Pydantic models
Quick Start
Public API
The library exports 30+ public symbols. Here are the key imports:Explore the Docs
Quick Start
Installation and basic usage
Configuration
49 configuration options
Backends
Memory and Redis state storage
Providers
Custom AI provider integration
Streaming
Streaming response support
Reservations
Reservation tracking system
Exceptions
Error handling patterns
Observability
Prometheus metrics