Documents
AI Request Queue and Rate Limiting
AI Request Queue and Rate Limiting
Type
Document
Status
Published
Created
Aug 8, 2025
Updated
Mar 18, 2026
Updated by
Dosu Bot

Request Queue (RequestQueue)#

The request queue is implemented as a token bucket rate limiter with two primary parameters: rate (tokens per second) and capacity (bucket size). Each translation request consumes a token; tokens refill at the configured rate, and the bucket can hold up to the configured capacity. This mechanism ensures that requests are dispatched at a controlled pace, preventing overload of external translation services and smoothing out bursts of activity. The queue schedules tasks using a priority queue (binary heap), executing them when tokens are available and their scheduled time has arrived.

Provider Support and Custom Options

The request queue supports multiple AI translation providers, including OpenAI, Gemini, MiniMax, Alibaba Cloud (Bailian), Moonshot AI, and Hugging Face. Users can select their preferred provider and configure provider-specific options, such as temperature, directly from the translation options page. The system validates custom options, but recommended provider options must be manually reviewed and applied through the sparkles icon in the model selector UI. Empty provider options remain empty unless explicitly populated by the user, giving users full control over which options are sent with translation requests.

Request Deduplication

Before enqueuing a new translation request, the queue checks for duplicates using a hash of the request parameters. If a matching task is already waiting or executing, the queue returns the existing promise, ensuring that identical requests are processed only once and saving resources.

Retry Strategies

Failed requests are retried using exponential backoff with jitter. The queue will attempt a task up to maxRetries times, with each retry delayed by baseRetryDelayMs * 2^(retryCount-1), plus a small random jitter to avoid thundering herd effects. Retries are rescheduled in the queue, and if all attempts fail, the promise is rejected.

Timeout Handling

Each task is raced against a timeout promise. If the task does not complete within timeoutMs, the promise is rejected and the timeout is cleared. This prevents requests from hanging indefinitely and ensures system responsiveness.

Performance and Reliability

The queue's design improves reliability by controlling request rates, deduplicating identical requests, retrying transient failures, and timing out slow operations. High-volume scenarios (e.g., 100 concurrent tasks) are tested to ensure the queue drains tasks without starvation or leaks, demonstrating robustness under load.

Configuring Queue Parameters and Provider Options

Queue parameters are exposed on the translate page via the RequestRate UI component. Users can configure capacity (maximum burst size) and rate (steady request rate per second) using numeric input fields. Provider-specific options, such as temperature, are also available for supported providers (e.g., OpenAI, MiniMax), with recommended options accessible via a sparkles icon that opens a popover for manual review and application. The UI validates values against minimums (MIN_TRANSLATE_CAPACITY = 1, MIN_TRANSLATE_RATE = 1) and applies defaults (DEFAULT_REQUEST_CAPACITY = 60, DEFAULT_REQUEST_RATE = 8). Changes to queue parameters are propagated to the queue in real time, updating its behavior immediately.

// Example: Setting queue parameters and provider options in the UI
<RequestRate />
// Users select capacity, rate, and provider-specific options; changes are validated and sent to the queue

Validation and Schema

Queue configuration is validated using a schema that enforces minimum values:

export const requestQueueConfigSchema = z.object({
  capacity: z.number().gte(MIN_TRANSLATE_CAPACITY),
  rate: z.number().gte(MIN_TRANSLATE_RATE),
  // providerOptions: z.object({ ... }) // validated per provider
})

Batch Queue (BatchQueue) for LLM Translation#

For LLM-based translation providers, Read Frog uses a batch queue to group multiple translation tasks into a single API call. This reduces the number of requests, improves translation throughput, and can lower costs for providers that charge per request.

Provider Support and Configuration

Batch translation is supported for LLM-based translation providers, including OpenAI, Gemini, MiniMax, Alibaba Cloud (Bailian), Moonshot AI, and Hugging Face. Users can select their preferred provider and configure provider-specific options, such as temperature, directly from the extension options. The batch queue is compatible with the latest AI SDK (v6), ensuring improved performance and compatibility with new provider features.

Important: Pure translation providers (DeepL, DeepLX, Google Translate, Microsoft Translator) do NOT use the application's batch queue system. While DeepL has native batching capabilities built into its API (accepting text[] arrays), this is handled directly at the adapter level, not through the batch queue architecture.

Availability

  • The BatchQueue feature is enabled by default for LLM translation providers. Batch translation and its configuration options are always visible and active in the extension options for these providers.

Batching Logic

  • Tasks are grouped by provider, language pair, and configuration.
  • A batch is flushed (sent for translation) when either:
    • The number of tasks reaches maxItemsPerBatch, or
    • The total character count reaches maxCharactersPerBatch, or
    • The batch delay timer expires.
  • The batch queue splits and parses results using a special separator (%%) to align input and output paragraphs.
  • If a batch request fails, it is retried with exponential backoff. If all retries fail, the system can fall back to individual requests for each task.

Configuring Batch Parameters and Provider Options

Batch queue parameters are exposed on the translate page via the Batch Translation UI component. Users can configure:

  • maxCharactersPerBatch: Maximum number of characters per batch request (default: 1000, minimum: 1)
  • maxItemsPerBatch: Maximum number of paragraphs per batch (default: 4, minimum: 1)
  • Provider-specific options (such as temperature) for supported providers, with recommended options accessible via a sparkles icon for manual review and application
// Example: Setting batch parameters and provider options in the UI
<RequestBatch />
// Users select max characters, max items per batch, and provider options; changes are validated and sent to the batch queue

Validation and Schema

Batch configuration is validated using a schema:

export const batchQueueConfigSchema = z.object({
  maxCharactersPerBatch: z.number().gte(MIN_BATCH_CHARACTERS),
  maxItemsPerBatch: z.number().gte(MIN_BATCH_ITEMS),
  // providerOptions: z.object({ ... }) // validated per provider
})

Batch Prompt Format

The batch translation prompt includes special instructions for the LLM to use the %% separator and preserve paragraph alignment. The system parses the LLM output using this separator to distribute results to the original requests.

Fallback and Error Handling

  • If a batch request fails after all retries, and fallback is enabled, the system will attempt to translate each paragraph individually.
  • Errors are propagated to the UI and logged for diagnostics.

Testing and Reliability

Comprehensive unit tests verify batching logic, edge cases (e.g., separator handling, batch size limits), error handling, and fallback behavior.


Pre-translate Range (Preload Config)#

Pre-translate Range (Preload Config)#

Read Frog now supports a configurable Pre-translate Range for page translation. This feature allows users to control how much content below the viewport is pre-translated, helping to reduce API usage and tune scroll performance.

Configuration Options#

  • Preload Distance (margin, px):
    • Determines how far below the viewport content will begin to be translated.
    • Lower values save API costs but may cause delays when scrolling.
    • Range: 0–5000 (default: 1000)
  • Visibility Threshold:
    • Percentage of element visibility required to trigger translation (0–1).
    • Higher values delay translation until more of the element is visible.
    • Range: 0–1 (default: 0)

These options are available on the translation options page under Pre-translate Range. Adjusting these parameters lets you balance between immediate translation and API efficiency, especially on long or infinite-scroll pages.

Technical Details#

  • The content script uses these settings to configure the IntersectionObserver for page translation.
  • The rootMargin and threshold values are set according to user configuration, controlling when translation requests are triggered for elements below the viewport.
  • Defaults and bounds are enforced to prevent excessive API usage or poor user experience.

Impact on API Usage and Performance#

  • Setting a larger margin or lower threshold will translate more content in advance, increasing API usage but reducing scroll delays.
  • Setting a smaller margin or higher threshold will save API costs but may cause visible delays as you scroll.

Tip: Tune these values to match your reading habits and API quota. For most users, the default settings provide a good balance between responsiveness and cost.

For more information, see the configuration UI tooltips and the Intersection Observer API documentation.

Batch Request Statistics and Visualization#

Read Frog tracks batch translation requests and their effectiveness. The extension options page now includes a Statistics section, where users can view:

  • The total number of original requests and batch requests over selectable time periods (e.g., last 5, 7, 30, or 60 days)
  • The percentage of requests saved by batching (i.e., reduction in API calls)
  • Trend charts visualizing batch request activity and savings
  • Key metrics and comparisons with previous periods

Batch request records are stored in IndexedDB and cleaned up automatically to maintain performance. The statistics page provides insights into how batching improves efficiency and reduces costs.


Summary of Key Parameters#

  • Request Queue

    • rate: Number of requests allowed per second (default: 8, minimum: 1)
    • capacity: Maximum burst size (default: 60, minimum: 1)
    • timeoutMs: Maximum time to wait for a request before timing out
    • maxRetries: Maximum number of retry attempts for failed requests
    • baseRetryDelayMs: Base delay for exponential backoff between retries
    • providerOptions: Custom options for each provider (e.g., temperature for OpenAI and MiniMax)
    • Supported Providers: OpenAI, Gemini, MiniMax, Alibaba Cloud (Bailian), Moonshot AI, Hugging Face (AI SDK v6)
  • Batch Queue (LLM only)

    • maxCharactersPerBatch: Maximum characters per batch (default: 1000, minimum: 1)
    • maxItemsPerBatch: Maximum paragraphs per batch (default: 4, minimum: 1)
    • batchDelay: Maximum time to wait before flushing a batch
    • providerOptions: Custom options for each provider (e.g., temperature)

Adjust these parameters on the translate page to tune queue performance for your use case and translation provider. Batch queue options are visible and active for LLM providers only, including OpenAI, Gemini, MiniMax, Alibaba Cloud (Bailian), Moonshot AI, and Hugging Face. Pure translation providers (DeepL, DeepLX, Google Translate, Microsoft Translator) use direct API calls and do not route through the batch queue.

For more details on the rate limiting algorithm, see the Token Bucket Wikipedia page (linked from the configuration UI).