Documents
API Providers Configuration
API Providers Configuration
Type
Document
Status
Published
Created
Aug 13, 2025
Updated
Apr 1, 2026
Updated by
Dosu Bot

Concept of API Providers#

Read Frog uses API Providers to manage connections to external AI and translation services. The system supports over 27 LLM and translation providers, including OpenAI, DeepSeek, Gemini, Anthropic, xAI (Grok), Amazon Bedrock, Groq, DeepInfra, Mistral, Together AI, Cohere, Fireworks, Cerebras, Replicate, Perplexity, Vercel, SiliconFlow, 302.AI, Tensdaq, DeepL, DeepLX, Google, Microsoft, OpenRouter, Ollama, Volcengine (Doubao), MiniMax, Alibaba Cloud (Bailian), Moonshot AI, Hugging Face, and any Custom Provider (OpenAI-compatible) endpoints. The configuration supports an arbitrary number of providers via an array-based schema, enabling users to add, remove, customize, and now reorder providers—including custom endpoints. This flexible structure allows for advanced integration scenarios, such as custom model settings, per-provider model selection, and third-party/proxy endpoints.

Configuration Schema#

API Providers are configured in the providersConfig array within the main configuration. Each provider is represented as an object with these required fields:

  • id: Unique identifier for the provider (used for referencing in config)
  • name: User-facing label (must be unique)
  • provider: Provider type (e.g., openai, deepseek, google, anthropic, xai, bedrock, groq, deepinfra, mistral, togetherai, cohere, fireworks, cerebras, replicate, perplexity, vercel, openrouter, siliconflow, ai302, tensdaq, openai-compatible, deepl, deeplx, google-translate, microsoft-translate, edge-tts, ollama, volcengine, minimax, alibaba, moonshotai, huggingface)
  • enabled: Boolean indicating if the provider is active
  • description: (Optional) Description of the provider
  • apiKey: API key/token (if required)
  • baseURL: Endpoint URL (can be customized for proxies or compatible services; for non-custom LLM providers, this field is optional; not required for deepl provider as the endpoint is automatically selected based on API key)
  • model: For LLM providers, an object specifying the selected model and custom model options
  • temperature: (Optional, LLM translation providers only) Number controlling randomness/creativity in AI responses. Lower values (e.g., 0) are more deterministic, higher values are more creative. Leave empty to use provider default.
  • providerOptions: (Optional, LLM translation providers only) Object containing provider-specific advanced options (e.g., for enabling special API features). This is a JSON object passed directly to the provider; see provider documentation for supported keys.
  • connectionOptions: (Optional) Object containing provider-specific connection settings (e.g., AWS region for Bedrock)

Example configuration schema:

providersConfig: [
  {
    id: 'openai-default',
    name: 'OpenAI',
    provider: 'openai',
    enabled: true,
    description: 'Provides models like GPT-4o',
    apiKey: 'sk-xxxx',
    baseURL: 'https://api.openai.com/v1',
    model: { model: 'gpt-4.1-mini', isCustomModel: false, customModel: null },
    temperature: 0.7, // Optional: controls randomness for translation
    providerOptions: { // Optional: provider-specific advanced options
      "reasoning_mode": true
    },
    connectionOptions: { // Optional: provider-specific connection settings
      "region": "us-east-1"
    },
  },
  {
    id: 'minimax-default',
    name: 'MiniMax',
    provider: 'minimax',
    enabled: true,
    description: 'MiniMax AI platform offering advanced MiniMax-M2 models for text generation',
    apiKey: 'your-minimax-api-key',
    model: { model: 'MiniMax-M2.7', isCustomModel: false, customModel: null },
  },
  // ...additional providers
]

The schema is validated using Zod, ensuring each provider's configuration matches the expected structure and that both id and name are unique. Each feature (translate, selection toolbar, input translation, video subtitles) independently selects its own provider via the providerId field in the respective feature configuration. Custom AI actions in the selection toolbar also specify their own providerId to select the LLM provider for processing.

Note: As of schema version 54, TTS (text-to-speech) is no longer part of the feature provider system. TTS is configured separately with built-in Edge TTS as the only synthesis engine. See the TTS Configuration section below for details.

Note: The provider type names have changed. Use the updated keys listed above when configuring providers.

Temperature and Provider Options#

  • temperature is only used for LLM translation providers. It controls the creativity of AI-generated translations. If omitted, the provider's default is used.
  • providerOptions is a JSON object for advanced, provider-specific settings (such as enabling special modes or features). When provider options are not explicitly set (undefined), the system automatically applies recommended defaults based on the selected model for better out-of-the-box behavior. An explicit empty object ({}) means "no options" and overrides any defaults. As of version 1.31.1, runtime defaults are applied directly at execution time when no user options are saved, ensuring that recommended settings are used automatically without requiring manual configuration. The placeholder text in the UI shows the recommended defaults for the currently selected model. See the Vercel AI SDK documentation or your provider's official docs for supported options.
  • Option Aliases for OpenAI-Compatible Providers: For OpenAI-compatible providers (including custom providers, Volcengine, MiniMax, Alibaba Cloud, Moonshot AI, and Hugging Face), common snake_case option aliases are automatically normalized to their camelCase equivalents. Specifically, reasoning_effort is automatically converted to reasoningEffort, and verbosity is automatically converted to textVerbosity. Users can use either the snake_case or camelCase form when configuring provider options, and the system will handle the conversion automatically. If both forms are present in the configuration, the canonical camelCase value takes precedence.

Referencing Providers#

Providers are referenced by their id (not name) in each feature's configuration. Each feature (translate, selection toolbar, input translation, video subtitles) independently selects its own provider, for example:

translate: {
  providerId: 'openai-default',
  ...
},
selectionToolbar: {
  features: {
    translate: { providerId: 'openai-default' },
  },
  customActions: [
    {
      id: 'custom-action-1',
      name: 'Summarize',
      providerId: 'openai-default',
      icon: 'tabler:sparkles',
      systemPrompt: 'You are a helpful assistant.',
      prompt: 'Summarize the following text: {{selection}}',
      outputSchema: [{ id: 'summary', name: 'Summary', type: 'string' }],
    },
  ],
  ...
},
inputTranslation: {
  providerId: 'google-translate-default',
  ...
},
videoSubtitles: {
  providerId: 'microsoft-translate-default',
  ...
},

Make sure to use the correct provider ID for translation-only providers (e.g., google-translate-default, microsoft-translate-default, deepl-default).

Note: TTS does not require a providerId field. As of schema version 54, TTS is configured separately with built-in Edge TTS. See the TTS Configuration section for details.

Managing API Providers in the UI#

The API Providers configuration is managed through a dedicated page in the extension's options UI. Each provider appears as a configuration card displaying its logo, name, and description. Users can:

  • Reorder providers using drag-and-drop. Drag a provider card up or down to change its order. The UI provides clear visual feedback (the dragged card floats with a shadow), and the list auto-scrolls when dragging near the edges. The order is saved immediately and is reflected in the configuration.
  • Enable or disable each provider using a toggle switch.
  • Enter or update the API key for each provider. The input field supports toggling between masked and visible text, and uses real-time validation (with Zod) for fields like baseURL.
  • Set a custom base URL for the provider. The base URL input is validated and provides immediate feedback if the value is not a valid URL. For non-custom LLM providers, this field is optional and indicated as such in the UI. For the deepl provider, the base URL field is hidden in the UI as the endpoint is automatically determined based on the API key.
  • See provider-specific descriptions and guidance, localized via i18n. For DeepL, the description is "Official DeepL API." For Volcengine, the description is "ByteDance's cloud AI platform offering Doubao models for translation." For MiniMax, the description is "MiniMax AI platform offering advanced MiniMax-M2 models for text generation." For Alibaba Cloud, the description is "Alibaba Cloud's Qwen model series with advanced reasoning and multilingual capabilities." For Moonshot AI, the description is "Moonshot AI's Kimi model series with strong reasoning and long-context capabilities." For Hugging Face, the description is "Hugging Face Inference API providing access to thousands of open-source models."
  • Test API connectivity directly from the UI using the Test Connection button.
  • Add or remove providers (including Volcengine, MiniMax, Custom Provider, OpenRouter, Ollama, Edge TTS, or any supported type) using the "Add Provider" dialog. Providers are grouped by type (LLM, Custom Provider, pure translation, TTS) and the dialog auto-selects the newly added provider. The dialog ensures unique provider names and provides theme-aware logos.
  • Delete providers with confirmation dialog. When deleting a provider, a confirmation dialog warns: "Features using this provider will be automatically reassigned to another available provider." The system enforces that at least one LLM translation provider must remain. If the deleted provider is assigned to any features (including custom AI actions), the system automatically reassigns those features to the next available compatible enabled provider. If deletion would leave no compatible provider for a required feature, the action is blocked and a clear error message is shown.
  • Select a model per provider, with a "use custom model" toggle and model dropdown. Model lists are specific to each provider and are regularly updated to reflect the latest available models. For Volcengine, the default model is doubao-seed-1-6-flash-250828. For MiniMax, the default model is MiniMax-M2.7, with MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.5-highspeed, MiniMax-M2.1, MiniMax-M2.1-highspeed, MiniMax-M2, and MiniMax-M2-Stable also available. For Alibaba Cloud, the default model is qwen3.5-flash. For Moonshot AI, the default model is kimi-k2. For Hugging Face, the default model is Qwen/Qwen3-32B.
  • For custom (OpenAI-compatible) providers, a "Fetch available models" button is available next to the custom model input. This button retrieves the list of models from the provider's /models endpoint (requires baseURL and API key), displays them in a popover, and allows the user to select a model, which is then auto-filled into the custom model field. This helps users quickly pick valid models and avoid configuration errors. The fetch flow includes improved error handling and localized messages for loading, errors, and retry actions.
  • Assign features to providers via the "Feature Providers" collapsible section: For each provider, a "Feature Providers" section allows users to toggle which features use this provider. The section displays only features compatible with the provider type (e.g., LLM providers show translate, videoSubtitles, selectionToolbar.translate, inputTranslation, and Language Detection; translation-only providers show only translate and videoSubtitles). Toggles are disabled if the feature is already assigned to this provider (indicating the current assignment). Users can toggle ON to assign features to the provider but cannot toggle OFF directly—instead, they must assign the feature to another provider. Feature labels use i18n keys from options.general.featureProviders.features.*, with feature keys containing dots (like "selectionToolbar.translate") mapped to i18n-safe keys with underscores (like "selectionToolbar_translate") via the FEATURE_KEY_I18N_MAP constant.
  • Feature assignment badge: Each provider card displays a badge showing the number of features assigned to it (e.g., "2 features") using the i18n key options.apiProviders.badges.featureCount. Hovering over the badge displays a tooltip listing all assigned features by their localized names. The badge count includes Language Detection assignments (when mode is LLM and the provider is selected). This badge replaces the previous "Translate" badge that indicated the default translation provider.
  • Configure advanced options for LLM translation providers: For providers that support LLM-based translation, an "Advanced Options" collapsible section is available in the provider config form. This section includes:
    • Temperature: A numeric input controlling the randomness/creativity of AI-generated translations. Lower values (e.g., 0) are more deterministic; higher values are more creative. Leave empty to use the provider's default.
    • Provider Options: A JSON code editor for entering provider-specific advanced options (such as enabling special API features). The editor supports syntax highlighting and validation. Invalid JSON is highlighted and an error message is shown. A link to the Vercel AI SDK documentation is provided for reference.
    • Provider Options Recommendations: For models that have recommended provider options, a sparkles icon appears next to the model selector. Clicking this icon opens a popover showing the recommended JSON configuration for the selected model. Users can review the recommendation and manually click "Apply" to apply these options to their configuration. The sparkles icon briefly highlights when a model is first selected that matches a recommendation. Empty provider options remain empty unless the user explicitly applies a recommendation—the system does not auto-apply recommended options.

Configuration changes are managed reactively using Jotai atoms, ensuring that updates in the UI are immediately reflected in the underlying configuration. All configuration changes, including drag-and-drop reordering, are applied optimistically in the UI and are written to storage in a serialized, race-condition-safe manner. The scroll position remains stable after reordering.

Centralized Feature Provider Management#

In addition to assigning features within each provider's configuration form, users can manage all feature-to-provider assignments from a centralized Feature Providers page and Language Detection page in the General settings.

Feature Providers Configuration#

This page provides a comprehensive view of which provider is assigned to each feature:

  • Page Translation: Select the provider for translating web page content
  • Selection Toolbar Translation: Choose the provider for inline translation via the selection toolbar
  • Input Translation: Select the provider for translating input fields
  • Video Subtitles: Choose the provider for translating video subtitles
  • Custom AI Actions: Select the provider for each user-defined custom AI action in the selection toolbar

Note: TTS is not shown in the Feature Providers page because it is configured separately using the built-in Edge TTS engine, not through the provider selection system.

Each feature selector displays a warning if the selected provider requires an API key that has not been configured. The centralized view makes it easy to review and update feature assignments across all providers in one place. The i18n keys for feature labels are defined under options.general.featureProviders.features.*.

Language Detection Configuration#

The Language Detection configuration in General settings centralizes language detection for auto-translate, skip-languages, and TTS features. Users can:

  • Choose detection mode: Select between Basic (browser-based using the franc library) or LLM (AI-powered for higher accuracy)
  • Select an LLM provider: When LLM mode is enabled, select which LLM provider to use for language detection. The provider must be enabled and configured with an API key
  • Status indicator: A visual indicator shows the current status:
    • Orange: "Configure an LLM provider first" (no LLM providers available)
    • Blue: "Recommend enabling LLM" (basic mode active)
    • Green: "LLM detection enabled" (LLM mode active with a provider selected)

Language Detection is used by auto-translate (to detect page language), skip-languages (to determine if translation should be skipped), and TTS (for voice selection). When LLM mode is enabled, detection runs only when needed. If LLM detection fails, the system automatically falls back to basic detection and notifies the user.

Example: Setting an API Key, Base URL, Model, and Testing Connection#

  1. Open the extension's options page and navigate to API Providers.
  2. Locate or add the provider you want to configure (e.g., DeepL, Volcengine, MiniMax, Alibaba Cloud, Moonshot AI, Hugging Face, Gemini, Anthropic, Custom Provider, Tensdaq, OpenRouter, Ollama, Edge TTS).
  3. Enter your API key in the "API Key" field (required for most providers; not required for Ollama or Edge TTS).
  4. (Optional) Set a custom "Base URL". The input is validated in real time; invalid URLs are highlighted and an error message is shown. Note: For DeepL, the base URL field is not shown as the endpoint is automatically selected based on whether your API key ends with :fx (free tier uses https://api-free.deepl.com, pro tier uses https://api.deepl.com). For Volcengine, the baseURL defaults to https://ark.cn-beijing.volces.com/api/v3. For MiniMax, Alibaba Cloud, Moonshot AI, Hugging Face, and Edge TTS, no baseURL is required for standard usage.
  5. Select the desired model from the dropdown, or toggle "use custom model" and enter a custom model name. For custom (OpenAI-compatible) providers, you can click the "Fetch available models" button next to the custom model input to retrieve and select a model from the provider's /models endpoint. This requires both a valid baseURL and API key. The available models will be shown in a popover; selecting one will auto-fill the custom model field. For MiniMax, select from MiniMax-M2.7, MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.5-highspeed, MiniMax-M2.1, MiniMax-M2.1-highspeed, MiniMax-M2, or MiniMax-M2-Stable. For Alibaba Cloud, available models include qwen3-max, qwen3.5-plus, qwen3.5-flash, and others. For Moonshot AI, available models include kimi-k2, kimi-k2.5, and various thinking models. For Hugging Face, popular models include Qwen/Qwen3-32B, meta-llama/Llama-3.3-70B-Instruct, deepseek-ai/DeepSeek-V3.1, and others. (Note: Edge TTS does not use models; it uses voice configurations instead.)
  6. (For LLM translation providers only): Expand the Advanced Options section to configure:
    • Temperature: Enter a value to control the randomness/creativity of AI-generated translations. Lower values (e.g., 0) are more deterministic; higher values are more creative. Leave empty to use the provider's default.
    • Provider Options: Enter provider-specific advanced options as JSON. The editor validates your input. A link to the Vercel AI SDK documentation is provided for reference.
    • Recommendations: If your selected model has recommended provider options, a sparkles icon appears next to the model selector. Click it to preview the recommended configuration, then manually click "Apply" to use these options.
  7. (Optional): Expand the Feature Providers section to assign features to this provider. Toggle ON any feature you want to use with this provider. Toggles that are disabled indicate features already assigned to this provider.
  8. Click the "Test Connection" button to verify connectivity.
    • For most providers, if the API key is empty, the button is disabled.
    • For DeepLX, DeepL, Ollama, and Edge TTS, the button is enabled even if the API key is empty (DeepLX can work without authentication; DeepL requires an API key; Ollama and Edge TTS do not require an API key).
    • When testing, a loading animation is shown.
    • On success, a green check icon appears.
    • On failure, a red X icon appears and error details are shown via a Toast notification.
  9. Changes are saved automatically, including provider order.

DeepL Provider Details#

  • DeepL is the official DeepL API provider, supporting high-quality neural machine translation.
  • Provider type: Pure translation provider (non-LLM), used for translation features only.
  • Key characteristic: Does NOT require a baseURL field. The endpoint is automatically determined based on the API key:
    • API keys ending with :fx use the free tier endpoint: https://api-free.deepl.com
    • All other API keys use the pro tier endpoint: https://api.deepl.com
  • Native batching: Implements native text[] batching capability, allowing multiple texts to be translated in a single API request for improved performance.
  • Language handling: Automatically normalizes source languages (zh, zh-TW) to ZH and maps Chinese target languages to ZH-HANS (Simplified) or ZH-HANT (Traditional) as required by the DeepL API. Supports auto for automatic source language detection.
  • Background fetch: Uses the extension's background fetch proxy for improved security and CSP compliance.
  • Routing: Routed through the batch queue architecture for optimized request handling.
  • Default configuration: Available as deepl-default in the default provider configuration.
  • To use DeepL, add a provider with provider: 'deepl' and enter your API key from DeepL Pro API. No baseURL configuration is needed.

DeepLX Provider Details#

  • DeepLX is the unofficial/reverse-engineered DeepL API provider.
  • Provider type: Pure translation provider (non-LLM), used for translation features only.
  • Key difference from official DeepL: DeepLX requires a baseURL field and supports custom endpoints, token placement via {{apiKey}} placeholder, and flexible authentication patterns. See the DeepLX Flexible BaseURL and Token Placement section below for details.
  • To use DeepLX, add a provider with provider: 'deeplx', specify a baseURL, and optionally provide an API key if required by your endpoint.

Volcengine (Doubao) Provider Details#

  • Volcengine is ByteDance's official cloud AI platform, offering Doubao models for translation.
  • Default model: doubao-seed-1-6-flash-250828, with lite and flash variants also available.
  • The default base URL is https://ark.cn-beijing.volces.com/api/v3.
  • Volcengine uses an OpenAI-compatible API, so it is wired through the same client as other compatible providers.
  • No migration is needed to enable Volcengine; simply add your API key and use the default base URL.

MiniMax Provider Details#

  • MiniMax is an advanced AI platform offering the MiniMax-M2.7 and related models for translation and text generation.
  • Default model: MiniMax-M2.7, with MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.5-highspeed, MiniMax-M2.1, MiniMax-M2.1-highspeed, MiniMax-M2, and MiniMax-M2-Stable also available.
  • No baseURL configuration is required for standard usage; simply provide your API key from the MiniMax Platform.
  • MiniMax supports both Object Generation and Text Generation tasks.
  • The provider is integrated via the vercel-minimax-ai-provider package and follows the standard provider configuration pattern.
  • To use MiniMax, add a provider with provider: 'minimax' and enter your API key. Model selection is available in the UI.

Alibaba Cloud (Bailian) Provider Details#

  • Alibaba Cloud (Bailian) is Alibaba's AI platform offering the Qwen model series with advanced reasoning and multilingual capabilities.
  • Default model: qwen3.5-flash, with other models including qwen3-max, qwen3.5-plus, qwen-plus, qwen-flash, qwen-turbo, qwq-plus, qwen3-coder-plus, and third-party models like deepseek-v3.2, deepseek-v3.1, deepseek-r1, deepseek-v3, kimi-k2.5, MiniMax-M2.5, and glm-5.
  • No baseURL configuration is required for standard usage; simply provide your API key from Alibaba Cloud Model Studio.
  • The provider is integrated via the @ai-sdk/alibaba package and follows the standard provider configuration pattern.
  • To use Alibaba Cloud, add a provider with provider: 'alibaba' and enter your API key. Model selection is available in the UI.

Moonshot AI Provider Details#

  • Moonshot AI offers the Kimi model series with strong reasoning and long-context capabilities.
  • Default model: kimi-k2, with other models including kimi-k2.5, moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k, kimi-k2-thinking, kimi-k2-thinking-turbo, and kimi-k2-turbo.
  • No baseURL configuration is required for standard usage; simply provide your API key from the Moonshot AI Platform.
  • The provider is integrated via the @ai-sdk/moonshotai package and follows the standard provider configuration pattern.
  • To use Moonshot AI, add a provider with provider: 'moonshotai' and enter your API key. Model selection is available in the UI.

Hugging Face Provider Details#

  • Hugging Face provides access to thousands of open-source models via the Hugging Face Inference API.
  • Default model: Qwen/Qwen3-32B, with popular models including meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.1-70B-Instruct, meta-llama/Llama-3.3-70B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, deepseek-ai/DeepSeek-V3.1, deepseek-ai/DeepSeek-V3-0324, deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Distill-Llama-70B, Qwen/Qwen3-Coder-480B-A35B-Instruct, Qwen/Qwen2.5-VL-7B-Instruct, google/gemma-3-27b-it, and moonshotai/Kimi-K2-Instruct.
  • No baseURL configuration is required for standard usage; simply provide your API key from Hugging Face.
  • The provider is integrated via the @ai-sdk/huggingface package and follows the standard provider configuration pattern.
  • To use Hugging Face, add a provider with provider: 'huggingface' and enter your API key. Model selection is available in the UI.

Edge TTS Provider Details#

  • Edge TTS is Microsoft's Edge online speech synthesis service, offering text-to-speech functionality without requiring an API key (free service).
  • Key characteristic: Built-in TTS provider; does NOT require an API key or provider configuration
  • Architectural difference: Edge TTS synthesis is executed in the background script (src/entrypoints/background/edge-tts.ts) rather than in the main thread, improving performance and avoiding blocking the user interface
  • WebSocket-based synthesis: Uses WebSocket connections to Microsoft's Edge TTS service with endpoint token management for resilience and automatic token caching
  • SSML generation: Text is converted to SSML format with rate, pitch, and volume parameters applied
  • Chunked text splitting: Automatically splits long text into UTF-8 byte-aligned chunks (max 1024 bytes per chunk, up to 10 chunks) to handle long content and respect API limits
  • Circuit breaker: Includes circuit breaker pattern for handling rate limits and service disruptions. The circuit opens after repeated failures and closes automatically after a cooldown period
  • Per-language voice configuration: The TTS configuration supports per-language voice mapping (languageVoices) with automatic language detection using the franc library, allowing users to assign different voices for different languages. A defaultVoice serves as fallback when language detection fails or when a language is not explicitly mapped.
  • Voice configuration: Configure TTS using:
    • defaultVoice: Fallback voice when language not mapped or detection fails (default: en-US-GuyNeural)
    • languageVoices: Per-language voice mapping object using ISO 639-3 codes (e.g., { eng: 'en-US-GuyNeural', cmn: 'zh-CN-YunxiNeural' })
    • rate: Speech rate from -100 to +100 (default: 0, converted to percentage for SSML)
    • pitch: Voice pitch from -100 to +100 (default: 0, converted to Hz for SSML)
    • volume: Audio volume from -100 to +100 (default: 0, converted to percentage for SSML)
  • Supported voices: Supports 40+ voices across multiple languages, including:
    • Chinese (female voices like 晓晓,晓伊;male voices like 云希,云扬)
    • English (Jenny, Aria, Guy, Davis)
    • Japanese (七海,圭太)
    • Korean (선히, 인준)
  • Default inclusion: Edge TTS is the built-in TTS provider and does not require adding to providersConfig
  • To use Edge TTS, configure voice and speech parameters in the TTS settings page (src/entrypoints/options/pages/text-to-speech/tts-config.tsx)

TTS Configuration#

The TTS configuration has been refactored from provider-based to a built-in Edge TTS system. As of schema version 54, TTS does NOT use the feature provider system. Instead, the system uses Edge TTS as the built-in synthesis engine with the following configuration structure:

tts: {
  defaultVoice: 'en-US-GuyNeural',
  languageVoices: {
    eng: 'en-US-GuyNeural',
    cmn: 'zh-CN-YunxiNeural',
    jpn: 'ja-JP-KeitaNeural',
    // ... per-language voice mappings (ISO 639-3 codes)
  },
  rate: 0, // -100 to +100
  pitch: 0, // -100 to +100
  volume: 0, // -100 to +100
}

Configuration fields:

  • defaultVoice: Fallback voice when language not mapped or detection fails (default: en-US-GuyNeural)
  • languageVoices: Per-language voice mapping object using ISO 639-3 codes (e.g., { eng: 'en-US-GuyNeural', cmn: 'zh-CN-YunxiNeural' })
  • rate: Speech rate from -100 to +100 (default: 0, converted to percentage for SSML)
  • pitch: Voice pitch from -100 to +100 (default: 0, converted to Hz for SSML)
  • volume: Audio volume from -100 to +100 (default: 0, converted to percentage for SSML)

Note: As of schema version 62 (v062), language detection for TTS is configured globally via the Language Detection configuration in General settings, not per-feature. The per-feature detectLanguageMode field was removed in v062. See the Language Detection Configuration section for details.

The TTS settings page provides:

  • Language-voice grid: Select a language and assign a specific voice for when TTS detects that language
  • Default voice selector: Choose a fallback voice when language detection fails or when a language is not explicitly mapped
  • Voice search and preview: Search for voices and preview them with sample text
  • Rate/Pitch/Volume controls: Fine-tune speech characteristics with -100 to +100 range

Migration from v053 to v054: Existing TTS configurations are automatically migrated:

  • Old providerId field is removed (TTS is no longer part of the feature provider system)
  • Old speed parameter (0.25-4.0) is converted to rate (-100 to +100)
  • Old voice becomes the defaultVoice
  • Per-language voices are initialized with sensible defaults for all supported languages
  • New pitch and volume fields are added (default: 0)

DeepLX Flexible BaseURL and Token Placement#

Read Frog supports flexible DeepLX baseURL configuration to work with a wide variety of DeepLX providers. You can use the {{apiKey}} placeholder in your DeepLX baseURL to insert your API token at any position (path, query, or subdomain). This enables compatibility with providers that require tokens in different URL formats.

Examples:

  • Token in path: https://api.deeplx.com/{{apiKey}}/translate
  • Token as query parameter: https://api.deeplx.com/v1/translate?token={{apiKey}}
  • Token in subdomain: https://{{apiKey}}.api.deeplx.com/translate

Special logic for https://api.deeplx.org:

  • If you use this endpoint, the API key (if provided) is inserted between .org and /translate.
    • Without token: https://api.deeplx.org/translate
    • With token: https://api.deeplx.org/your-token/translate

Automatic /translate appending:

  • For most DeepLX providers, if your baseURL does not end with /translate, Read Frog will append it automatically.
  • If you provide an API key and no {{apiKey}} placeholder, the key is inserted as a path segment before /translate.

Providers that do not require tokens:

  • You can leave the API key field empty for providers that do not require authentication (e.g., https://api.deeplx.org).

Error handling:

  • If you use the {{apiKey}} placeholder but do not provide an API key, you will see an error: "API key is required when using {{apiKey}} placeholder in DeepLX baseURL".

For more details and configuration examples, see the DeepLX Provider Documentation.

API Connectivity Test Feature#

The UI includes a Test Connection button for each provider. This feature allows users to verify that their API key and base URL are correctly configured and that the extension can successfully connect to the provider. The test uses provider-specific methods to validate connectivity and provides immediate feedback in the UI:

  • Button states:
    • Disabled if API key is empty (except for DeepLX, Ollama, and Edge TTS, where the button is always enabled as these do not require an API key; note that DeepL requires an API key and the button will be disabled if empty)
    • Shows "Testing..." and loading animation while in progress
    • Displays success (green check) or error (red X) icon after test
  • Error details:
    • If the test fails, error details are shown via a Toast notification, using the error message from the provider's API response
  • Localization:
    • The button label and status messages are localized in English, Japanese, Korean, Simplified Chinese, and Traditional Chinese

Integrating New Providers (e.g., DeepL, Volcengine, Gemini API, Custom Provider, Anthropic, Groq, Tensdaq, OpenRouter, Ollama, Alibaba Cloud, Moonshot AI, Hugging Face, Edge TTS, etc.)#

To add a new provider, users or developers can:

  • Add a new provider object to the providersConfig array with the appropriate provider type and settings. Supported types now include all providers listed above, including deepl, volcengine, minimax, alibaba, moonshotai, huggingface, tensdaq, openrouter, ollama, and edge-tts.
  • For Custom Provider endpoints, use the dedicated openaiCompatible, siliconflow, ai302, tensdaq, volcengine, minimax, alibaba, moonshotai, or huggingface provider type and specify the required baseURL and (optionally) apiKey. For DeepL (deepl), no baseURL is required as the endpoint is automatically selected based on the API key. Volcengine uses an OpenAI-compatible endpoint at https://ark.cn-beijing.volces.com/api/v3 by default. OpenRouter uses https://openrouter.ai/api/v1 by default. Ollama uses http://127.0.0.1:11434/api by default and does not require an API key. MiniMax, Alibaba Cloud, Moonshot AI, and Hugging Face do not require a baseURL for standard usage; just provide your API key from the respective platform.
  • Note on TTS: Edge TTS is the built-in text-to-speech provider and does not require adding to providersConfig. It is configured directly in the TTS settings page with voice and speech parameters, not via the provider system.
  • The UI will render a configuration card for each provider in the array, including logo and description if recognized.
  • Provider logo assets: When adding a new provider to src/utils/constants/providers.ts, import all provider logo SVGs or images using the ?url&no-inline query parameter. For example:
    import customProviderLogo from "@/assets/providers/custom-provider.svg?url&no-inline"
    import deeplxLogoDark from "@/assets/providers/deeplx-dark.svg?url&no-inline"
    
    This query parameter is required for provider logos to work correctly in content scripts with the background fetch proxy system. The ?url&no-inline ensures the asset is loaded as a URL string rather than being inlined, which allows the provider icon component to normalize the URL via browser.runtime.getURL(), proxy it through background fetch when needed (in content scripts with strict CSP), and render it via canvas with proper CSP compliance.
  • Model selection and custom model names are managed per provider in the model field (for LLM providers). Model lists and default selections are regularly updated and maintained centrally to reflect the latest available models for each provider. For Volcengine, the default model is doubao-seed-1-6-flash-250828. For MiniMax, the default model is MiniMax-M2.7, with MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.5-highspeed, MiniMax-M2.1, MiniMax-M2.1-highspeed, MiniMax-M2, and MiniMax-M2-Stable also available. For Alibaba Cloud, the default model is qwen3.5-flash, with models including qwen3-max, qwen3.5-plus, qwen-plus, qwen-flash, qwen-turbo, qwq-plus, qwen3-coder-plus, deepseek-v3.2, deepseek-v3.1, deepseek-r1, deepseek-v3, kimi-k2.5, MiniMax-M2.5, and glm-5. For Moonshot AI, the default model is kimi-k2, with models including kimi-k2.5, moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k, kimi-k2-thinking, kimi-k2-thinking-turbo, and kimi-k2-turbo. For Hugging Face, the default model is Qwen/Qwen3-32B, with popular models including meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.1-70B-Instruct, meta-llama/Llama-3.3-70B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, deepseek-ai/DeepSeek-V3.1, deepseek-ai/DeepSeek-V3-0324, deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Distill-Llama-70B, Qwen/Qwen3-Coder-480B-A35B-Instruct, Qwen/Qwen2.5-VL-7B-Instruct, google/gemma-3-27b-it, and moonshotai/Kimi-K2-Instruct. For OpenRouter, the default models are deepseek/deepseek-chat-v3.1:free and x-ai/grok-4-fast:free. For Ollama, the default models are deepseek-v3 and gemma3:4b. For OpenAI, the latest models include gpt-5.4-pro, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, and gpt-5.3-chat-latest. For Google, the latest models include gemini-3.1-pro-preview and gemini-3-flash-preview. For Anthropic, the latest models include claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5, and claude-sonnet-4-5. For xAI, the latest models include grok-4-1, grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning, grok-4-0709, and grok-4-latest. For AWS Bedrock, the latest models include us.anthropic.claude-opus-4-6-v1, us.anthropic.claude-opus-4-5-20251101-v1:0, us.anthropic.claude-haiku-4-5-20251001-v1:0, openai.gpt-oss-120b-1:0, openai.gpt-oss-20b-1:0, meta.llama3-2-11b-instruct-v1:0, meta.llama3-2-90b-instruct-v1:0, us.meta.llama3-2-11b-instruct-v1:0, and us.meta.llama3-2-90b-instruct-v1:0. For Mistral, the latest models include magistral-small-2507, magistral-medium-2507, and mistral-medium-2508. For Together AI, the primary model is meta-llama/Llama-3.3-70B-Instruct-Turbo. For Fireworks, the latest models include accounts/fireworks/models/kimi-k2-thinking, accounts/fireworks/models/kimi-k2p5, and accounts/fireworks/models/minimax-m2. For Cerebras, the latest model is zai-glm-4.7.
  • For custom (OpenAI-compatible) providers, the UI provides a "Fetch available models" button to retrieve and select models directly from the provider's /models endpoint.
  • Configure per-feature provider selection via the options UI and the "Feature Providers" section within each provider's configuration form.

Custom Provider, OpenRouter, Ollama, Volcengine, Alibaba Cloud, Moonshot AI, and Hugging Face Providers#

Users can configure the extension to use any OpenAI-compatible translation service by adding a provider with provider: 'openaiCompatible', siliconflow, ai302, tensdaq, volcengine, minimax, alibaba, moonshotai, or huggingface and specifying the required baseURL and (optionally) apiKey. This allows integration with third-party or proxy services that implement the OpenAI API format. The UI provides dedicated options and icons for these providers, now labeled as "Custom Provider", "Volcengine", "MiniMax", "Alibaba Cloud", "Moonshot AI", or "Hugging Face" in the UI and configuration.

HTTP Endpoint Support (Firefox):

  • HTTP endpoints (including LAN/local addresses like http://192.168.31.210:8090/v1) are fully supported in Firefox
  • Firefox MV3 extensions include upgrade-insecure-requests in their default Content Security Policy (CSP), which would silently upgrade all HTTP URLs to HTTPS, breaking local HTTP endpoints
  • The extension overrides this default CSP with script-src 'self' 'wasm-unsafe-eval'; object-src 'self' (configured in wxt.config.ts) to allow HTTP requests to work correctly
  • HTTPS endpoints are unaffected and continue to work as expected

OpenRouter can be added as a provider with provider: 'openrouter', allowing access to a unified interface for multiple LLM providers with pay-per-use pricing. OpenRouter supports a wide range of models, and the extension provides default model selections. The API key and base URL for OpenRouter can be configured in the same way as other providers. The default baseURL is https://openrouter.ai/api/v1.

Ollama can be added as a provider with provider: 'ollama', allowing the extension to use locally deployed LLMs for translation. Ollama does not require an API key, and the baseURL defaults to http://127.0.0.1:11434/api. Users can select from available local models (e.g., gemma3:4b, deepseek-v3) and test the connection directly from the UI. Do not run the Ollama Chat graphical client alongside ollama serve to avoid port conflicts.

Tensdaq is also supported as a provider with provider: 'tensdaq', using an OpenAI-compatible endpoint at https://dashboard.x-aio.com/zh/register?ref=c356c1daba9a4641a18e by default.

MiniMax can be added as a provider with provider: 'minimax'. MiniMax requires an API key from the MiniMax Platform but does not require a baseURL for standard usage. The default model is MiniMax-M2.7, with MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.5-highspeed, MiniMax-M2.1, MiniMax-M2.1-highspeed, MiniMax-M2, and MiniMax-M2-Stable also available.

Alibaba Cloud can be added as a provider with provider: 'alibaba'. Alibaba Cloud requires an API key from Alibaba Cloud Model Studio but does not require a baseURL for standard usage. The default model is qwen3.5-flash, with a wide range of models including qwen3-max, qwen3.5-plus, and third-party models like deepseek-v3.2, kimi-k2.5, and glm-5.

Moonshot AI can be added as a provider with provider: 'moonshotai'. Moonshot AI requires an API key from the Moonshot AI Platform but does not require a baseURL for standard usage. The default model is kimi-k2, with models including kimi-k2.5, kimi-k2-thinking, and various context-length variants.

Hugging Face can be added as a provider with provider: 'huggingface'. Hugging Face requires an API key from Hugging Face but does not require a baseURL for standard usage. The default model is Qwen/Qwen3-32B, providing access to thousands of open-source models including popular options from Meta Llama, DeepSeek, Qwen, and more.

Provider-Specific Settings#

The primary configurable fields for each provider are:

  • id: Unique identifier (must be unique, used for referencing)
  • name: User-facing label (must be unique)
  • provider: Provider type (see full supported list above, including deepl, volcengine, minimax, alibaba, moonshotai, huggingface, tensdaq, openrouter, ollama, edge-tts, google-translate, microsoft-translate, and openai-compatible)
  • enabled: Boolean indicating if the provider is active
  • description: (Optional) Description of the provider
  • apiKey: API key/token (if required; not required for Ollama or Edge TTS)
  • baseURL: Endpoint URL (can be customized for proxies or compatible services; optional for non-custom LLM providers; defaults to http://127.0.0.1:11434/api for Ollama, https://ark.cn-beijing.volces.com/api/v3 for Volcengine; not required for DeepL, MiniMax, Alibaba Cloud, Moonshot AI, Hugging Face, or Edge TTS)
  • model: For LLM providers, an object specifying the selected model and custom model options
  • connectionOptions: (Optional) Object containing provider-specific connection settings (e.g., AWS region for Bedrock)

Defaults:

  • DeepL: No baseURL required; endpoint automatically selected based on API key (:fx suffix = free tier, otherwise pro tier)
  • Volcengine: baseURL defaults to https://ark.cn-beijing.volces.com/api/v3
  • MiniMax: No baseURL required; just provide your API key from the MiniMax Platform
  • Alibaba Cloud: No baseURL required; just provide your API key from Alibaba Cloud Model Studio
  • Moonshot AI: No baseURL required; just provide your API key from the Moonshot AI Platform
  • Hugging Face: No baseURL required; just provide your API key from Hugging Face
  • OpenRouter: baseURL defaults to https://openrouter.ai/api/v1
  • Ollama: baseURL defaults to http://127.0.0.1:11434/api, no API key required
  • Tensdaq: baseURL defaults to https://dashboard.x-aio.com/zh/register?ref=c356c1daba9a4641a18e
  • Amazon Bedrock: connectionOptions defaults to { region: "us-east-1" } (region can be configured in the provider settings)
  • Google Translate: provider is google-translate, default ID is google-translate-default
  • Microsoft Translator: provider is microsoft-translate, default ID is microsoft-translate-default
  • Edge TTS: provider is edge-tts, default ID is edge-tts-default, no API key or baseURL required

Migration and Backward Compatibility#

When upgrading from previous versions, user configurations are automatically migrated to the new array-based providersConfig schema. The migration script converts legacy provider fields and IDs to the new format, including renaming:

  • geminigoogle
  • grokxai
  • amazonBedrockbedrock
  • openaiCompatibleopenai-compatible
  • google (translation) → google-translate
  • microsoft (translation) → microsoft-translate

The migration ensures that per-feature provider configurations are properly set after upgrade. As of v023, Google Translate and Microsoft Translator providers are seeded automatically in the configuration during migration as google-translate-default and microsoft-translate-default. Volcengine can be added at any time without migration steps.

v052 to v053 migration:

  • Provider model configuration is unified from provider.models.read + provider.models.translate to a single provider.model field
  • The obsolete read config block is removed
  • Selection toolbar now includes per-feature provider configuration for translate feature
  • Input translation and video subtitles features now include their own providerId field

v053 to v054 migration:

  • TTS configuration is refactored from provider-based to built-in Edge TTS
  • Old tts.providerId field is removed (TTS is no longer part of the feature provider system)
  • Old tts.model and tts.voice fields are replaced with defaultVoice and languageVoices
  • Old tts.speed parameter (0.25-4.0) is converted to rate (-100 to +100)
  • New pitch and volume fields are added (default: 0)
  • Per-language voice mappings are initialized with sensible defaults for all supported languages (using ISO 639-3 codes)

v054 to v055 migration:

  • Adds detectLanguageMode field to TTS configuration (default: 'basic')

v055 to v056 migration:

  • Adds customFeatures array to selectionToolbar configuration (default: empty array)
  • Custom features enable users to create their own AI-powered actions for selected text with configurable name, icon, prompt, output schema, and provider selection

v059 to v060 migration:

  • Renames selectionToolbar.customFeatures to selectionToolbar.customActions
  • This is a terminology change only; all existing custom features are automatically renamed to custom actions during migration

v060 to v061 migration:

  • Adds speaking field to custom action output fields (default: false)
  • Auto-enables speaking for dictionary action fields named "Term" or "Context"

v061 to v062 migration:

  • Unifies language detection into root-level languageDetection configuration with mode ('basic' | 'llm') and providerId fields
  • Removes translate.page.enableLLMDetection and translate.page.enableSkipLanguagesLLMDetection fields
  • Removes tts.detectLanguageMode field
  • Language detection is now centralized for auto-translate, skip-languages, and TTS features

v063 to v064 migration:

  • Renames prompt tokens for clarity:
    • Translate/subtitle prompts: {{targetLang}}{{targetLanguage}}, {{title}}{{webTitle}}, {{summary}}{{webSummary}}
    • Selection toolbar custom action prompts: {{context}}{{paragraphs}}, {{targetLang}}{{targetLanguage}}, {{title}}{{webTitle}}
  • Updates dictionary template wording from "context" to "paragraphs" (both English and Chinese prompts, field names, and descriptions)
  • Auto-migrates existing user configs (translate custom prompts, video subtitle custom prompts, selection toolbar custom actions)

v065 to v066 migration:

  • Removes the deprecated selectionToolbar.features.vocabularyInsight configuration
  • The vocabulary insight feature has been fully removed and replaced by Custom AI Action's Dictionary feature
  • Converts translate.page.shortcut from string[] to portable hotkey string

Custom AI Actions#

As of schema version 56 (v056), users can create Custom AI Actions for the selection toolbar. These user-defined actions appear alongside the built-in translate button when text is selected, providing personalized AI-powered tools for selected text. Custom actions are fully configurable and use the same provider system as built-in features.

Concept#

Custom AI actions allow users to:

  • Define personalized AI actions that appear in the selection toolbar
  • Configure action name, icon (using Iconify icons), system prompt, and user prompt
  • Use token placeholders like {{selection}}, {{paragraphs}}, {{targetLanguage}}, and {{webTitle}} in prompts to dynamically insert selected text, paragraph context, target language, and page title. The {{paragraphs}} token provides intersecting paragraph text captured using a snapshot-based architecture that handles complex web pages, including shadow DOM. For performance and API efficiency, paragraph text is truncated to a maximum of 2000 characters when sent to AI providers.
  • Define structured output schemas (JSON schema) to format AI responses as key-value cards
  • Select any enabled LLM provider for processing (requires providers that support structured object output)
  • Enable or disable custom actions independently

Configuration Schema#

Custom actions are configured in the selectionToolbar.customActions array:

selectionToolbar: {
  features: {
    translate: { providerId: 'openai-default' },
  },
  customActions: [
    {
      id: 'summarize-action',
      name: 'Summarize',
      enabled: true,
      icon: 'tabler:sparkles',
      providerId: 'openai-default',
      systemPrompt: 'You are a helpful assistant that summarizes text.',
      prompt: 'Summarize the following text in {{targetLang}}: {{selection}}',
      outputSchema: [
        { id: 'summary', name: 'Summary', type: 'string' },
        { id: 'keyPoints', name: 'Key Points', type: 'string' },
      ],
    },
  ],
}

Required fields:

  • id: Unique identifier for the custom action
  • name: User-facing label shown in the toolbar and options page
  • icon: Iconify icon string (e.g., tabler:sparkles, mdi:lightbulb)
  • providerId: ID of the LLM provider to use (must support structured output)
  • systemPrompt: System-level instructions for the AI. Note: The system prompt is automatically augmented with a "Structured Output Contract" section that includes all output field names, types, descriptions, and nullable requirements. Users only need to provide high-level instructions; the field-level contract is handled automatically
  • prompt: User prompt with token placeholders
  • outputSchema: Array of output fields defining the structured response format

Optional fields:

  • enabled: Boolean to enable/disable the action (default: true)

Output schema fields:

  • id: Unique field identifier
  • name: Field label shown in the UI
  • type: Field data type ('string' or 'number')
  • description: (Optional) Description of what the field should contain. This description is automatically included in the AI's structured output contract to guide the model in generating appropriate values
  • speaking: (Optional) Boolean indicating whether text-to-speech is enabled for this field. When enabled, a Speak button appears next to the field value in the popover, allowing users to hear the content read aloud
  • Note: All field values (both string and number types) are nullable, allowing the AI to return null when a value is unknown or not applicable

Prompt Tokens#

Custom action prompts support the following dynamic tokens:

  • {{selection}}: The text selected by the user
  • {{paragraphs}}: The intersecting paragraph text where the selection appears (provides surrounding context). For performance and API efficiency, this text is truncated to a maximum of 2000 characters when sent to custom AI actions. If the selection or surrounding context exceeds this limit, only the first 2000 characters will be included.
  • {{targetLanguage}}: The user's configured target language (e.g., "English", "Japanese")
  • {{webTitle}}: The title of the current webpage

Note: These tokens were renamed in v064 for clarity (previously {{context}}, {{targetLang}}, and {{title}}). Existing configurations are automatically migrated.

Example prompt using tokens:

Analyze the word "{{selection}}" in the context of: {{paragraphs}}

Provide explanations in {{targetLanguage}}. The source is from the page titled "{{webTitle}}".

Managing Custom Actions in the UI#

The Custom AI Actions configuration page allows users to:

  • Add new custom actions: Click "Add AI Action" to open a template selection dialog. Choose from pre-built templates (dictionary, improve writing) to quickly create an action, or start from scratch with a blank template
  • Edit existing actions: Select an action from the list to edit its configuration
  • Delete actions: Remove custom actions that are no longer needed (with confirmation dialog)
  • Enable/disable actions: Toggle actions on or off without deleting them
  • Reorder custom actions: Custom actions can be reordered via drag-and-drop in the action list. The order determines how custom actions appear in the selection toolbar. The order is automatically saved to the configuration
  • Configure per-action provider: Each custom action independently selects its own LLM provider from enabled providers
  • Define output schemas: Add structured output fields via dedicated dialogs (not inline editing). Each field dialog includes inputs for name, type (string or number), description, and an optional "Enable speaking" toggle. Field descriptions help guide the AI in generating appropriate values. When speaking is enabled for a field, a Speak button appears next to the field value in the popover. Output schema fields can also be reordered via drag-and-drop within the action configuration form
  • Preview and test: Custom actions appear in the selection toolbar when text is selected, showing streaming AI responses

Important notes:

  • Custom actions require at least one enabled LLM provider that supports structured output. If no enabled LLM provider exists, the options page displays a warning: "Enable at least one LLM provider before creating custom AI actions."
  • The selected provider must be enabled and have a valid API key configured
  • Custom actions use the same provider configuration (model, temperature, provider options) as the selected provider
  • Custom action names must be unique
  • Output schema field names must be unique within each action
  • The dialog-based workflow for adding/editing fields makes it easier to create complex structured output actions with clear field semantics

Provider Selection for Custom Actions#

Custom actions use the same provider selection system as built-in features. The ProviderSelector component accepts a providers array directly, allowing the configuration form to pass in a pre-filtered list of enabled LLM providers with structured output support:

<ProviderSelector
  providers={enabledLLMProviders}
  value={action.providerId}
  onChange={(id) => updateActionProvider(action.id, id)}
  className="w-full"
/>

The available providers for custom actions are filtered using helper utilities:

  • filterEnabledProvidersConfig(): Returns only enabled providers
  • isLLMProviderConfig(): Checks if a provider is an LLM provider (vs. pure translation provider)

Custom actions can only use enabled LLM providers that support structured object generation (via the Vercel AI SDK's streamObject API). When a provider is deleted, custom actions using that provider are automatically reassigned to the next available enabled LLM provider.

Selection Toolbar Integration#

When users select text on a webpage, custom action buttons appear in the selection toolbar alongside the built-in translate button. Each custom action:

  • Displays its configured icon in the toolbar
  • Shows the action name on hover
  • Opens a popover when clicked, showing:
    • The selected text in the "Selection" section
    • A streaming AI response formatted as key-value cards based on the output schema (using the @json-render library for structured object rendering)
    • Loading state during streaming
    • Error messages if the request fails (with improved AI SDK error message extraction)

The custom action popover uses the same UI pattern as built-in features, with streaming responses and graceful error handling.

Selection Popover Component Architecture#

Selection toolbar features (translate, custom AI actions) use a shared SelectionPopover compound component with the following subcomponents:

  • Root: Manages popover state (open/closed, pin state, anchor position)
  • Trigger: Button that opens the popover and captures the anchor position
  • Content: Main popover container with drag-and-resize capabilities (powered by react-rnd)
  • Header: Draggable header bar with title and control buttons
  • Body: Scrollable content area that displays streaming results
  • Footer: Contains provider selector and action buttons (regenerate, copy, speak)
  • Pin: Button to pin/unpin the popover, keeping it open while working with other content
  • Close: Button to close the popover
  • Title: Header title text

Pin/Unpin functionality:

  • Users can pin a popover by clicking the pin button in the header
  • Pinned popovers remain open when clicking outside, allowing users to work with other content while keeping the AI response visible
  • Pinned popovers can be freely dragged and resized using react-rnd
  • The drag handle appears in the header (visible when hovering or during drag)
  • Popovers can be resized from all edges and corners when pinned
  • The popover automatically stays within the viewport bounds during drag and resize operations

Improved user experience:

  • The popover uses smart layout positioning to stay within the viewport and auto-adjust when content grows during streaming
  • Drag position and streaming content growth no longer cause viewport overflow
  • The popover remains anchored within viewport bounds as content streams
  • After resizing, the scrollable body area correctly restores scrolling behavior
  • Selection snapshots ensure that pinned popovers keep displaying the original selection even after the user selects different text, until the user clicks the trigger again to refresh

Translation popover features:

  • The translate popover includes an inline target language selector in the header, allowing users to change the translation target language without navigating to settings
  • The target language selector displays the current target language and opens a searchable dropdown with all available languages
  • Changing the target language immediately triggers a new translation request with the selected language
  • The footer includes action buttons for copying the translation and speaking it aloud via text-to-speech
  • Copy and speak buttons include tooltip feedback showing the current state (e.g., "Copied" after copying)

Popover footer actions:

  • Provider selector: Inline dropdown to switch between enabled providers (LLM or translation providers) for the current feature. Changes are saved immediately
  • Regenerate button: Re-runs the AI request with the same parameters, useful for getting alternative responses or recovering from errors
  • Copy button (translation, custom actions): Copies the result to the clipboard, with visual feedback (checkmark icon and tooltip)
  • Speak button (translation): Triggers text-to-speech for the translated text, with playback controls (play/stop) and loading state

All action buttons include accessible tooltips and keyboard support.

Example: Creating a "Summarize" Custom Action#

  1. Navigate to Settings > Selection Toolbar > Custom AI Actions
  2. Click Add AI Action
  3. Configure the action:
    • Name: "Summarize"
    • Icon: tabler:sparkles
    • Provider: Select an enabled LLM provider (e.g., OpenAI, DeepSeek, Gemini)
    • System Prompt: "You are a helpful assistant that summarizes text concisely."
    • Prompt: "Summarize the following text in {{targetLang}}: {{selection}}"
    • Output Schema:
      • Field 1: Name = "Summary", Type = "string"
      • Field 2: Name = "Key Points", Type = "string"
  4. Save the action
  5. Select text on any webpage
  6. Click the Summarize button in the selection toolbar to see the AI-generated summary

Provider Configuration Compatibility#

Custom actions use the same provider configuration system as built-in features. This includes:

  • Model selection: Uses the model configured for the selected provider
  • Temperature: Uses the temperature setting configured for the provider
  • Provider options: Uses the advanced provider options (e.g., reasoning mode) configured for the provider
  • Structured output support: Requires providers that support the Vercel AI SDK's streamObject API

The centralized Feature Providers page (General settings) includes a section for managing provider assignments for all custom actions, making it easy to review and update provider selections across all custom actions in one place.

Save to Notebase Workflow#

Custom AI Actions support saving their structured outputs directly to a notebase table, enabling seamless knowledge capture workflows directly from the browser selection toolbar.

Configuration:

When creating or editing a custom action, users can configure a notebase connection with:

  1. Table Selection: Choose which notebase table will receive the action outputs. The table selector displays the current table name (or "unavailable" if the table no longer exists). A "Refresh" button reloads the table schema and updates field mappings.

  2. Output-to-Column Mappings: Map each output field from the AI action's JSON schema to a column in the selected notebase table:

    • Each mapping specifies which AI output field should be saved to which table column
    • At least one complete mapping (both output field and target column selected) is required to enable saving
    • Mappings are validated in real time for compatibility (e.g., string output fields can only map to string columns)
    • Invalid mappings are preserved in the configuration but prevent saving until fixed
    • Users can add multiple mappings and remove unwanted mappings via the UI

Validation and Status Messages:

The configuration form validates mappings and displays clear status messages:

  • Login required: Users must be authenticated to configure notebase connections
  • Table unavailable: Shown when the selected table no longer exists
  • Invalid mappings: Displayed when mappings reference missing fields or have incompatible types
  • Missing local field: The mapped action output field no longer exists
  • Missing remote field: The mapped notebase column no longer exists
  • Incompatible types: The field types no longer match (e.g., string vs. number)
  • Schema loading: Shown while fetching the notebase table schema

Usage:

When a custom action with a valid notebase connection is configured:

  1. Users select text and click the custom action button in the selection toolbar
  2. The AI processes the request and generates the structured output
  3. A "Save to Notebase" button appears in the selection toolbar footer
  4. Clicking the button saves the mapped outputs to the configured notebase table via the Notebase API

The save operation:

  • Only saves fields with valid mappings (invalid mappings are skipped)
  • Requires at least one valid mapping to be enabled
  • Shows a success toast notification with the table name on successful save
  • Displays clear error messages for authentication failures, missing tables, or validation errors

Beta Feature:

This feature is currently behind the betaExperience.enabled toggle and requires beta access to configure.


Technical Note:
All configuration changes, including drag-and-drop reordering, are handled optimistically in the UI and are written to storage using a serialized, race-condition-safe queue. This prevents flicker or lost updates across tabs and ensures that the scroll position remains unchanged after reordering. The config atom is synchronized across extension contexts and reloads from storage when a tab becomes visible, ensuring consistency even if a tab was inactive during updates.