Understanding Rate Limits

Each account has a default rate limit of 300 Requests Per Minute (RPM) Per Model.

Rate limits control how frequently users can make requests to our LLM API within specific time periods. These limits help ensure fair resource distribution and maintain service stability.

Prevent Abuse

Protect against API misuse and abuse

Fair Usage

Ensure fair resource distribution

Stability

Maintain consistent API performance

Rate Limit Details

Best Practices

1

Implement Request Throttling

Add rate limiting in your application code to stay within limits:

const rateLimiter = new RateLimiter({
  requests: 300,
  period: '1m'
});
2

Add Exponential Backoff

Implement retry logic with increasing delays:

const backoff = (attempt) => Math.min(1000 * Math.pow(2, attempt), 10000);
3

Monitor Usage

Track your API usage through our dashboard: View Usage Stats

Handling Rate Limits

When you receive a 429 error, implement these handling strategies:

  1. Retry Later: Wait for the specified cooldown period
  2. Optimize Requests: Batch operations when possible
  3. Monitor Usage: Track your consumption patterns

Increasing Rate Limits

Repeatedly exceeding rate limits may result in temporary account restrictions. Please monitor your usage and request limit increases if needed.