Documents
create-a-pipeline
create-a-pipeline
Type
External
Status
Published
Created
Mar 3, 2026
Updated
May 19, 2026
Updated by
Dosu Bot
Source
View

Create a pipeline#

This guide walks you through creating a pipeline that uses our REST API Client
to connect to DuckDB.

Please make sure you have installed dlt before following the
steps below.

Task overview#

Imagine you want to analyze issues from a GitHub project locally.
To achieve this, you need to write code that accomplishes the following:

  1. Constructs a correct request.
  2. Authenticates your request.
  3. Fetches and handles paginated issue data.
  4. Stores the data for analysis.

This may sound complicated, but dlt provides a REST API Client that allows you to focus more on your data rather than on managing API interactions.

1. Initialize project#

Create a new empty directory for your dlt project by running:

mkdir github_api_duckdb && cd github_api_duckdb

Start a dlt project with a pipeline template that loads data to DuckDB by running:

dlt init github_api duckdb

Install the dependencies necessary for DuckDB:

pip install -r requirements.txt

2. Obtain and add API credentials from GitHub#

You will need to sign in to your GitHub account and create your access token via the Personal access tokens page.

Copy your new access token over to .dlt/secrets.toml:

[sources]
api_secret_key = '<api key value>'

This token will be used by github_api_source() to authenticate requests.

The secret name corresponds to the argument name in the source function.
Below, api_secret_key will get its value
from secrets.toml when github_api_source() is called.

@dlt.source
def github_api_source(api_secret_key: str = dlt.secrets.value):
    return github_api_resource(api_secret_key=api_secret_key)

Run the github_api_pipeline.py pipeline script to test that authentication headers look fine:

python github_api_pipeline.py

Your API key should be printed out to stdout along with some test data.

3. Request project issues from the GitHub API#

Modify github_api_resource in github_api_pipeline.py to request issues data from your GitHub project's API:

from dlt.sources.helpers.rest_client import paginate
from dlt.sources.helpers.rest_client.auth import BearerTokenAuth
from dlt.sources.helpers.rest_client.paginators import HeaderLinkPaginator

@dlt.resource(write_disposition="replace")
def github_api_resource(api_secret_key: str = dlt.secrets.value):
    url = "https://api.github.com/repos/dlt-hub/dlt/issues"

    for page in paginate(
        url,
        auth=BearerTokenAuth(api_secret_key), # type: ignore
        paginator=HeaderLinkPaginator(),
        params={"state": "open"}
    ):
        yield page

4. Load the data#

Uncomment the commented-out code in the main function in github_api_pipeline.py, so that running the
python github_api_pipeline.py command will now also run the pipeline:

if __name__=='__main__':
    # configure the pipeline with your destination details
    pipeline = dlt.pipeline(
        pipeline_name='github_api_pipeline',
        destination='duckdb',
        dataset_name='github_api_data'
    )

    # print credentials by running the resource
    data = list(github_api_resource())

    # print the data yielded from resource
    print(data)

    # run the pipeline with your parameters
    load_info = pipeline.run(github_api_source())

    # pretty print the information on data that was loaded
    print(load_info)

Run the github_api_pipeline.py pipeline script to test that the API call works:

python github_api_pipeline.py

This should print out JSON data containing the issues in the GitHub project.

It also prints the load_info object.

Let's explore the loaded data with the command dlt pipeline <pipeline_name> show.

dlt pipeline github_api_pipeline show

This will open the workspace dashboard app that gives you an overview of the data loaded.

5. Next steps#

With a functioning pipeline, consider exploring:

create-a-pipeline | Dosu