Skip to content

GitLab CI template for dbt

This project implements a GitLab CI/CD template to continuously integrate and deploy your data with dbt

Review environments

The template supports review environments: those are dynamic and ephemeral environments to deploy your ongoing developments (a.k.a. feature or topic branches).

When enabled, it deploys the result from upstream build stages to a dedicated environment. It is only active for non-production, non-integration branches.

It is a strict equivalent of GitLab's Review Apps feature.

Integration environment

If you're using a Git Workflow with an integration branch (such as Gitflow), the template supports an integration environment.

When enabled, it deploys the result from upstream build stages to a dedicated environment. It is only active for your integration branch (develop by default).

Production environments

Lastly, the template supports 2 environments associated to your production branch (main or master by default):

  • a staging environment (an iso-prod environment meant for testing and validation purpose),
  • the production environment.

You're free to enable whichever or both, and you can also choose your deployment-to-production policy:

  • continuous deployment: automatic deployment to production (when the upstream pipeline is successful),
  • continuous delivery: deployment to production can be triggered manually (when the upstream pipeline is successful).

Usage

This template can be used both as a CI/CD component or using the legacy include:project syntax.

Use as a CI/CD component

Add the following to your .gitlab-ci.yml:

include:
  # 1: include the component
  - component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt@4.0.0
    # 2: set/override component inputs
    inputs:
      # ⚠ this is only an example
      staging-target: "nonprod"
      prod-target: "prod"

Use as a CI/CD template (legacy)

Add the following to your .gitlab-ci.yml:

include:
  # 1: include the template
  - project: 'to-be-continuous/dbt'
    ref: '4.0.0'
    file: '/templates/gitlab-ci-dbt.yml'

variables:
  # 2: set/override template variables
  # ⚠ this is only an example
  DBT_STAGING_TARGET: "nonprod"
  DBT_PROD_TARGET: "prod"

You can find a sample of dbt project here : https://gitlab.com/to-be-continuous/samples/dbt-sample/

Global configuration

The dbt template uses some global configuration used throughout all jobs.

Input / Variable Description Default value
image / DBT_IMAGE The Docker image used to run dbt registry.hub.docker.com/library/python:latest
project-dir / DBT_PROJECT_DIR The dbt_project.yml dir .
profiles-dir / DBT_PROFILES_DIR The dbt profile location .
adapter / DBT_ADAPTER The dbt adapter to use none (required)
target / DBT_TARGET The dbt target to use none (required)
build-args / DBT_BUILD_ARGS Arguments used by dbt cli none

Environments configuration

As seen above, the dbt template may support up to 4 environments (review, integration, staging and production).

Each deployment job produces output variables that are propagated to downstream jobs (using dotenv artifacts):

  • environment_type: set to the type of environment (review, integration, staging or production),
  • environment_name: the application name (see below),

They may be freely used in downstream jobs (for instance to run acceptance tests against the latest deployed environment).

Here are configuration details for each environment.

Review environments

Review environments are dynamic and ephemeral environments to deploy your ongoing developments (a.k.a. feature or topic branches).

They are disabled by default and can be enabled by setting the DBT_REVIEW_TARGET variable (see below).

Here are variables supported to configure review environments:

Input / Variable Description Default value
review-target / DBT_REVIEW_TARGET dbt target for review env none (disabled)

Integration environment

The integration environment is the environment associated to your integration branch (develop by default).

It is disabled by default and can be enabled by setting the DBT_INTEG_TARGET variable (see below).

Here are variables supported to configure the integration environment:

Input / Variable Description Default value
integ-target / DBT_INTEG_TARGET dbt target for integration env none (disabled)

Staging environment

The staging environment is an iso-prod environment meant for testing and validation purpose associated to your production branch (main or master by default).

It is disabled by default and can be enabled by setting the DBT_STAGING_TARGET variable (see below).

Here are variables supported to configure the staging environment:

Input / Variable Description Default value
staging-target / DBT_STAGING_TARGET dbt target for staging env none (disabled)

Production environment

The production environment is the final deployment environment associated with your production branch (main or master by default).

It is disabled by default and can be enabled by setting the DBT_PROD_TARGET variable (see below).

Here are variables supported to configure the production environment:

Input / Variable Description Default value
prod-target / DBT_PROD_TARGET dbt target for production env none (disabled)
prod-deploy-strategy / DBT_PROD_DEPLOY_STRATEGY Defines the deployment to production strategy. One of manual (i.e. one-click) or auto. manual

Jobs

dbt-build job

This job performs build, doc generation and documentation coverage.

dbt-build generates executable SQL from source model, test, and analysis files

dbt-sqlfluff-lint job

This job performs SQL Lint.

dbt-sqlfluff-lint execute sqlfluff linter with dbt plugin to lint SQL and uses the following variables:

Input / Variable Description Default value
sqlfluff-enabled / DBT_SQLFLUFF_ENABLED set to true to enable SQLFluff lint none (disabled)
sqlfluff-lint-args / DBT_SQLFLUFF_LINT_ARGS Lint options and arguments none

⚠ this jobs read SQLFluff configuration files in DBT_PROJECT_DIR directory.

dbt-deploy job

This job performs deployment.

dbt-deploy execute generated SQL from models on target and uses the following variables:

Input / Variable Description Default value
deploy-enabled / DBT_DEPLOY_ENABLED set to true to enable deployment none (disabled)

Secrets management

Here are some advices about your secrets (variables marked with a 🔒):

  1. Manage them as project or group CI/CD variables:
    • masked to prevent them from being inadvertently displayed in your job logs,
    • protected if you want to secure some secrets you don't want everyone in the project to have access to (for instance production secrets).
  2. In case a secret contains characters that prevent it from being masked, simply define its value as the Base64 encoded value prefixed with @b64@: it will then be possible to mask it and the template will automatically decode it prior to using it.
  3. Don't forget to escape special characters (ex: $ -> $$).

Variants

GitLab Pages variant

Basically it copies the content of the dbt generated site folder (target by default) to the public folder which is published by GitLab pages.

If you wish to use it, Add the following to your .gitlab-ci.yml:

  # main template
  - component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt@4.0.0
  # GitLab pages variant
  - component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt-pages@4.0.0

Google Cloud variant

This variant allows retrieving an OAuth access token for the dbt BigQuery Adapter (using the GCP Auth Provider as a service container).

Provided you successfully configured the federated authentication using OpenID Connect, this variant automatically obtains a temporary OAuth token and stores it in the $GOOGLE_OAUTH_ACCESS_TOKEN variable (supported by the dbt BigQuery Config Setup as an authentication credential).

Configuration

The variant requires the additional configuration parameters:

Input / Variable Description Default value
TBC_GCP_PROVIDER_IMAGE The GCP Auth Provider image to use (can be overridden) registry.gitlab.com/to-be-continuous/tools/gcp-auth-provider:latest
gcp-oidc-aud / GCP_OIDC_AUD The aud claim for the JWT token $CI_SERVER_URL
gcp-oidc-provider / GCP_OIDC_PROVIDER Default Workload Identity Provider associated with GitLab to authenticate with OpenID Connect none
gcp-oidc-account / GCP_OIDC_ACCOUNT Default Service Account to which impersonate with OpenID Connect authentication none
gcp-review-oidc-provider / GCP_REVIEW_OIDC_PROVIDER Workload Identity Provider associated with GitLab to authenticate with OpenID Connect on review environment (only define to override default) none
gcp-review-oidc-account / GCP_REVIEW_OIDC_ACCOUNT Service Account to which impersonate with OpenID Connect authentication on review environment (only define to override default) none
gcp-integ-oidc-provider / GCP_INTEG_OIDC_PROVIDER Workload Identity Provider associated with GitLab to authenticate with OpenID Connect on integration environment (only define to override default) none
gcp-integ-oidc-account / GCP_INTEG_OIDC_ACCOUNT Service Account to which impersonate with OpenID Connect authentication on integration environment (only define to override default) none
gcp-staging-oidc-provider / GCP_STAGING_OIDC_PROVIDER Workload Identity Provider associated with GitLab to authenticate with OpenID Connect on staging environment (only define to override default) none
gcp-staging-oidc-account / GCP_STAGING_OIDC_ACCOUNT Service Account to which impersonate with OpenID Connect authentication on staging environment (only define to override default) none
gcp-prod-oidc-provider / GCP_PROD_OIDC_PROVIDER Workload Identity Provider associated with GitLab to authenticate with OpenID Connect on production environment (only define to override default) none
gcp-prod-oidc-account / GCP_PROD_OIDC_ACCOUNT Service Account to which impersonate with OpenID Connect authentication on production environment (only define to override default) none

Example

With a common default GCP_OIDC_PROVIDER and GCP_OIDC_ACCOUNT configuration for non-prod environments, and a specific one for production:

  # main template
  - component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt@4.0.0
  # Google Cloud variant
  - component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt-gcp@4.0.0
    inputs:
      # common OIDC config for non-prod envs
      gcp-oidc-provider: "projects/<gcp_nonprod_proj_id>/locations/global/workloadIdentityPools/<pool_id>/providers/<provider_id>"
      gcp-oidc-account: "<name>@$<gcp_nonprod_proj_id>.iam.gserviceaccount.com"
      # specific OIDC config for prod
      gcp-prod-oidc-provider: "projects/<gcp_prod_proj_id>/locations/global/workloadIdentityPools/<pool_id>/providers/<provider_id>"
      gcp-prod-oidc-account: "<name>@$<gcp_prod_proj_id>.iam.gserviceaccount.com"

The variant provide Google Application Default Credentials To setup dbt for bigquery, apply this configuration in dbt profiles.yml as described in dbt documentation:

my-bigquery-db:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: [GCP project id]
      dataset: [the name of your dbt dataset]
      threads: [1 or more]
      <optional_config>: <value>