Imports
Architecture
The reservation system uses a Two-Layer Architecture:-
Local Layer (
ReservationTracker):- In-memory tracking within the application process.
- Fast O(1) lookups.
- LOCAL-ONLY: Does NOT call backends directly.
- Manages the lifecycle of local request objects.
- Handles cleanup of stale local state via background task.
-
Distributed Layer (Backend):
- Redis-based tracking (when using
RedisBackend). - Manages global concurrency and token quotas.
- Handles “Orphan Recovery” for crashed instances.
- Strategy layer (e.g.,
IntelligentModeStrategy) handles backend release calls.
- Redis-based tracking (when using
The
IntelligentModeStrategy creates its own internal ReservationTracker using a composition pattern. The tracker is a pure local storage layer.ReservationContext
TheReservationContext dataclass holds reservation metadata:
| Field | Type | Description |
|---|---|---|
reservation_id | str | Unique identifier for this reservation |
bucket_id | str | Rate limit bucket (e.g., "shared_tier:chat") |
estimated_tokens | int | Number of tokens reserved |
created_at | float | Timestamp (auto-populated via time.time()) |
ReservationTracker
TheReservationTracker class provides local storage for reservation state. It does not interact with backends—that responsibility belongs to the strategy layer.
Configuration
Lifecycle
TheReservationTracker runs a background cleanup task that must be explicitly started and stopped. Failure to call start() will prevent stale reservation cleanup, causing memory leaks in long-running applications.
async start()
Starts the background cleanup task. This method is idempotent—calling it multiple times has no effect.
async stop()
Stops the background cleanup task. Call this during application shutdown to ensure clean termination.
Proper Usage Pattern
Key Features
- Compound Key Indexing: Stores reservations using
(request_id, bucket_id)tuples for precise lookups. - Secondary Index: Maintains a mapping of
request_id->Set[(request_id, bucket_id)]to allow clearing all reservations for a single request (e.g., on timeout). - Stale Cleanup: A background task periodically removes reservations that have exceeded their
max_reservation_ageto prevent memory leaks.
Methods
Store a Reservation
ReservationCapacityError if the tracker is at maximum capacity (as configured by max_reservations).
Before raising the error,
store() attempts to clean up stale reservations. The error is only raised if cleanup doesn’t free sufficient capacity.Get Without Clearing (for Streaming)
Get and Clear Atomically (for Completion)
Clear All for Error Recovery
request_id. Returns a list of all cleared ReservationContext objects. Use this for error recovery scenarios where all reservations for a failed request must be released.
Maintenance Methods
get_and_clear_stale()
Retrieves all stale reservation entries and removes them from tracking.
Returns a list of stale entries that exceeded their TTL.
compact_heap()
Optimizes the internal min-heap by removing stale entries.
Call this periodically in long-running applications to prevent memory growth.
stale_entry_ratio (property)
Returns the current ratio of stale entries to total entries in the heap.
Useful for monitoring and deciding when to trigger compaction.
Monitoring
TheReservationTracker provides properties to monitor its current state:
| Property | Type | Description |
|---|---|---|
reservation_count | int | Current number of tracked reservations |
request_count | int | Current number of unique requests with reservations |
Usage Pattern
Distributed Tracking & Orphan Recovery
In a distributed system, a worker might crash after reserving capacity but before releasing it. This creates “orphaned” reservations that permanently consume quota. TheRedisBackend implements an Orphan Recovery mechanism:
- In-Flight Tracking: Every reservation is recorded in a “pending” set in Redis with a timestamp.
- Recovery Task: A background task runs on every instance (leader-elected or randomized) to scan for pending reservations older than the
max_request_timeout. - Reclamation: Expired reservations are assumed to be from crashed workers and are automatically released, returning their capacity to the pool.
Drift Correction
The system also handles Clock Drift and State Erasure:- Drift Correction: Uses sequence numbers to order requests and updates.
- State Erasure Prevention: When updating limits from API headers, the system accounts for “in-flight” requests that haven’t yet been reflected in the server’s response headers. This prevents the local state from “jumping back” and ignoring recent consumption.