Initialize a pipeline#

This guide walks you through creating and initializing a dlt pipeline in dltHub Workspace—whether manually, with agentic help, or from one of the verified sources maintained by dltHub team.

Overview#

A dlt pipeline moves data from a source (like an API or database) into a destination (like DuckDB, Snowflake, or Iceberg). Initializing a pipeline is the first step in the data workflow.
You can create one in two CLI-based ways:

Method	Command	Best for
Manual	`dlthub pipeline init <source> <destination>`	Developers who prefer manual setup
Verified source	`dlthub pipeline init <verified_source> <destination>`	Prebuilt, tested connectors from the community and dltHub team

Outside of a workspace (plain OSS dlt), the same scaffold is reachable as dlt init <source> <destination>. Inside a dltHub workspace, dlthub pipeline init is the canonical entry point—it adds the pipeline to the current workspace.

Step 0: Install dlt with workspace support#

Before you start, make sure you followed the installation instructions and have a dltHub workspace initialized. The fastest way is:

uvx dlthub-start@latest

This scaffolds a workspace with .dlt/.workspace already set, the AI toolkits vendored, and dlt[hub] synced. See the installation guide for the alternative paths (adding to an existing project, or enabling workspace mode by hand).

dltHub Workspace is a unified environment for developing, running, and maintaining data pipelines—from local development to production.

More about dlt Workspace

Step 1: Initialize a custom pipeline#

Manual setup (standard workflow)#

A lightweight, code-first approach ideal for developers comfortable with Python.

dlthub pipeline init {source_name} duckdb

for example:

dlthub pipeline init my_github_pipeline duckdb

It scaffolds the pipeline template—a minimal starter project with a single Python script that shows three quick ways to load data into DuckDB using dlt:

fetch JSON from a public REST API (chess.com as an example) with requests,
read a public CSV with pandas, and
pull rows from a SQL database via SQLAlchemy.

The file also includes an optional GitHub REST client example (a @dlt.resource + @dlt.source) that can use a token from .dlt/secrets.toml, but will work unauthenticated at low rate limits.
It’s meant as a hands-on playground you can immediately run and then adapt into a real pipeline.

Learn how to build you own dlt pipeline with dlt Fundamentals course.

Agentic setup#

A collaborative AI-human workflow that integrates dlt with AI editors and agents like:

Claude
Cursor
Codex
the full list

Start with the /find-source skill to describe your data source in natural language—the assistant identifies a verified source or researches the API, then chains into pipeline scaffolding.

Next steps: Deploy and scale#

Once your pipeline runs locally:

Monitor via the workspace dashboard
Set up Profiles to manage separate dev, prod, and test environments
Deploy to runtime