Documents
Data Source Synchronization
Data Source Synchronization
Type
Topic
Status
Published
Created
Apr 13, 2026
Updated
Jun 17, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Data Source Synchronization#

Dosu keeps its knowledge base current through two distinct strategies — real-time webhooks and periodic polling — assigned per platform based on available APIs . See How often are documents kept up to date? and Data Source overview for the full product-level reference.

Sync Strategies by Platform#

PlatformMechanismInterval
GitHubWebhook (real-time) or pollingImmediate or 15 min
SlackWebhook (real-time)Immediate
GitLabWebhook + pollingImmediate / 5 min
Azure DevOpsWebhook (real-time)Immediate
ConfluencePollingEvery 5 min
CodaPollingEvery 5 min
Microsoft TeamsPollingEvery 6 hours
NotionPollingEvery 6 hours
WebReal-time searchPer request (no indexing)


Real-Time Webhook Sync#

GitHub (when using the Dosu app) syncs code, issues, pull requests, discussions, and repository metadata (renames, transfers) immediately when events occur via webhooks . Slack syncs messages as they are posted . GitLab also receives real-time webhooks — Dosu proactively creates these webhooks on GitLab projects when a data source is configured, storing the generated secret for later validation . Azure DevOps uses service hooks to trigger immediate synchronization for both repository push events (git.push) and pull request lifecycle events (git.pullrequest.created, git.pullrequest.updated, git.pullrequest.merged) — Dosu creates the service hook subscriptions when a data source is configured and stores a generated secret token for validation. When code is pushed to the default branch, Dosu re-indexes the repository to keep the knowledge base current.

All platforms share the same event pipeline:

Webhook endpoints:

  • GitHub: /github/webhook — validates HMAC-SHA256 signature, then publishes to Pub/Sub
  • Slack: /slack/events — verifies Slack signature, publishes to Pub/Sub
  • GitLab: /gitlab/webhook — validates stored token, maps event types, publishes to Pub/Sub
  • Azure DevOps: /azure-devops/webhook — validates stored secret token (via X-Dosu-Webhook-Token header), publishes to Pub/Sub

Events are published to Google Cloud Pub/Sub and consumed asynchronously by the sync_github_event_to_db workflow (GitHub), SlackMessageHandler (Slack), handle_azure_devops_pull_request_event (Azure DevOps PR events), and handle_azure_devops_push_event (Azure DevOps push events). Azure DevOps PR events sync metadata (title, description, status, author, timestamps) and changed file counts to the azure_devops.pull_request table using the Azure DevOps Iteration API to retrieve changed files for each PR during the sync process. Azure DevOps push events extract repository information and changed files from the event payload, then trigger a re-index of the repository to update the knowledge base with the latest code changes. Deduplication IDs and upsert operations ensure idempotency across retried deliveries .


Periodic Polling Sync#

Confluence, Coda, Microsoft Teams, Notion, and GitLab (as a supplement to webhooks) are kept current by DBOS scheduled workflows using cron expressions . Each platform has a dedicated scheduled entry point that enqueues per-connection sync jobs with partition keys for sequential processing and enqueue_once() for deduplication.

Platform sync entry points:

All scheduled polling workflows have DBOS-level recovery with max_recovery_attempts=3 to automatically retry workflows that crash or fail due to infrastructure issues . All polling workflows also use a 30-minute workflow timeout via SetWorkflowTimeout to prevent hangs, plus retry logic with exponential backoff for API calls. Connections with expired credentials are skipped via reauthorization flag checks.


Connection Health and Alerts#

When a connection's OAuth credentials expire or otherwise require reauthorization, Dosu automatically alerts organization owners and admins:

  • Email notifications: Organization owners and admins (and the user who originally connected the integration) receive an email prompting them to reconnect the integration
  • Internal alerts: For organizations with alert webhooks configured, Dosu also sends Slack notifications to internal monitoring channels

Important: Data syncing stops for connections that need reauthorization. Until the integration is reconnected at Settings > Data Sources, answers and documentation may become stale.

Alerts are debounced: if multiple integrations for the same organization require reauthorization within 24 hours, Dosu sends only one email notification to avoid alert fatigue.


Web Sources#

Web data sources appear as "synced" in the UI but are never crawled or indexed . Instead, Dosu performs a live web search on every request, avoiding the cost and unreliability of scheduled crawls .

Data Source Synchronization | Dosu