GitLab CI template for dbt¶
This project implements a GitLab CI/CD template to continuously integrate and deploy your data with dbt
Review environments¶
The template supports review environments: those are dynamic and ephemeral environments to deploy your ongoing developments (a.k.a. feature or topic branches).
When enabled, it deploys the result from upstream build stages to a dedicated environment. It is only active for non-production, non-integration branches.
It is a strict equivalent of GitLab's Review Apps feature.
Integration environment¶
If you're using a Git Workflow with an integration branch (such as Gitflow), the template supports an integration environment.
When enabled, it deploys the result from upstream build stages to a dedicated environment.
It is only active for your integration branch (develop
by default).
Production environments¶
Lastly, the template supports 2 environments associated to your production branch (main
or master
by default):
- a staging environment (an iso-prod environment meant for testing and validation purpose),
- the production environment.
You're free to enable whichever or both, and you can also choose your deployment-to-production policy:
- continuous deployment: automatic deployment to production (when the upstream pipeline is successful),
- continuous delivery: deployment to production can be triggered manually (when the upstream pipeline is successful).
Usage¶
This template can be used both as a CI/CD component
or using the legacy include:project
syntax.
Use as a CI/CD component¶
Add the following to your .gitlab-ci.yml
:
include:
# 1: include the component
- component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt@4.0.0
# 2: set/override component inputs
inputs:
# ⚠ this is only an example
staging-target: "nonprod"
prod-target: "prod"
Use as a CI/CD template (legacy)¶
Add the following to your .gitlab-ci.yml
:
include:
# 1: include the template
- project: 'to-be-continuous/dbt'
ref: '4.0.0'
file: '/templates/gitlab-ci-dbt.yml'
variables:
# 2: set/override template variables
# ⚠ this is only an example
DBT_STAGING_TARGET: "nonprod"
DBT_PROD_TARGET: "prod"
You can find a sample of dbt project here : https://gitlab.com/to-be-continuous/samples/dbt-sample/
Global configuration¶
The dbt template uses some global configuration used throughout all jobs.
Input / Variable | Description | Default value |
---|---|---|
image / DBT_IMAGE |
The Docker image used to run dbt | registry.hub.docker.com/library/python:latest |
project-dir / DBT_PROJECT_DIR |
The dbt_project.yml dir | . |
profiles-dir / DBT_PROFILES_DIR |
The dbt profile location | . |
adapter / DBT_ADAPTER |
The dbt adapter to use | none (required) |
target / DBT_TARGET |
The dbt target to use | none (required) |
build-args / DBT_BUILD_ARGS |
Arguments used by dbt cli |
none |
Environments configuration¶
As seen above, the dbt template may support up to 4 environments (review
, integration
, staging
and production
).
Each deployment job produces output variables that are propagated to downstream jobs (using dotenv artifacts):
environment_type
: set to the type of environment (review
,integration
,staging
orproduction
),environment_name
: the application name (see below),
They may be freely used in downstream jobs (for instance to run acceptance tests against the latest deployed environment).
Here are configuration details for each environment.
Review environments¶
Review environments are dynamic and ephemeral environments to deploy your ongoing developments (a.k.a. feature or topic branches).
They are disabled by default and can be enabled by setting the DBT_REVIEW_TARGET
variable (see below).
Here are variables supported to configure review environments:
Input / Variable | Description | Default value |
---|---|---|
review-target / DBT_REVIEW_TARGET |
dbt target for review env |
none (disabled) |
Integration environment¶
The integration environment is the environment associated to your integration branch (develop
by default).
It is disabled by default and can be enabled by setting the DBT_INTEG_TARGET
variable (see below).
Here are variables supported to configure the integration environment:
Input / Variable | Description | Default value |
---|---|---|
integ-target / DBT_INTEG_TARGET |
dbt target for integration env |
none (disabled) |
Staging environment¶
The staging environment is an iso-prod environment meant for testing and validation purpose associated to your production branch (main
or master
by default).
It is disabled by default and can be enabled by setting the DBT_STAGING_TARGET
variable (see below).
Here are variables supported to configure the staging environment:
Input / Variable | Description | Default value |
---|---|---|
staging-target / DBT_STAGING_TARGET |
dbt target for staging env |
none (disabled) |
Production environment¶
The production environment is the final deployment environment associated with your production branch (main
or master
by default).
It is disabled by default and can be enabled by setting the DBT_PROD_TARGET
variable (see below).
Here are variables supported to configure the production environment:
Input / Variable | Description | Default value |
---|---|---|
prod-target / DBT_PROD_TARGET |
dbt target for production env |
none (disabled) |
prod-deploy-strategy / DBT_PROD_DEPLOY_STRATEGY |
Defines the deployment to production strategy. One of manual (i.e. one-click) or auto . |
manual |
Jobs¶
dbt-build
job¶
This job performs build, doc generation and documentation coverage.
dbt-build
generates executable SQL from source model, test, and analysis files
dbt-sqlfluff-lint
job¶
This job performs SQL Lint.
dbt-sqlfluff-lint
execute sqlfluff linter with dbt plugin to lint SQL and uses the following variables:
Input / Variable | Description | Default value |
---|---|---|
sqlfluff-enabled / DBT_SQLFLUFF_ENABLED |
set to true to enable SQLFluff lint |
none (disabled) |
sqlfluff-lint-args / DBT_SQLFLUFF_LINT_ARGS |
Lint options and arguments | none |
this jobs read SQLFluff configuration files in DBT_PROJECT_DIR
directory.
dbt-deploy
job¶
This job performs deployment.
dbt-deploy
execute generated SQL from models on target and uses the following variables:
Input / Variable | Description | Default value |
---|---|---|
deploy-enabled / DBT_DEPLOY_ENABLED |
set to true to enable deployment |
none (disabled) |
Secrets management¶
Here are some advices about your secrets (variables marked with a ):
- Manage them as project or group CI/CD variables:
- In case a secret contains characters that prevent it from being masked,
simply define its value as the Base64 encoded value prefixed with
@b64@
: it will then be possible to mask it and the template will automatically decode it prior to using it. - Don't forget to escape special characters (ex:
$
->$$
).
Variants¶
GitLab Pages variant¶
Basically it copies the content of the dbt generated site folder (target
by default) to the public
folder which is published by GitLab pages.
If you wish to use it, Add the following to your .gitlab-ci.yml
:
# main template
- component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt@4.0.0
# GitLab pages variant
- component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt-pages@4.0.0
Google Cloud variant¶
This variant allows retrieving an OAuth access token for the dbt BigQuery Adapter (using the GCP Auth Provider as a service container).
Provided you successfully configured the federated authentication using OpenID Connect, this variant automatically obtains a temporary OAuth token and stores it in the $GOOGLE_OAUTH_ACCESS_TOKEN
variable (supported by the dbt BigQuery Config Setup as an authentication credential).
Configuration¶
The variant requires the additional configuration parameters:
Input / Variable | Description | Default value |
---|---|---|
TBC_GCP_PROVIDER_IMAGE |
The GCP Auth Provider image to use (can be overridden) | registry.gitlab.com/to-be-continuous/tools/gcp-auth-provider:latest |
gcp-oidc-aud / GCP_OIDC_AUD |
The aud claim for the JWT token |
$CI_SERVER_URL |
gcp-oidc-provider / GCP_OIDC_PROVIDER |
Default Workload Identity Provider associated with GitLab to authenticate with OpenID Connect | none |
gcp-oidc-account / GCP_OIDC_ACCOUNT |
Default Service Account to which impersonate with OpenID Connect authentication | none |
gcp-review-oidc-provider / GCP_REVIEW_OIDC_PROVIDER |
Workload Identity Provider associated with GitLab to authenticate with OpenID Connect on review environment (only define to override default) |
none |
gcp-review-oidc-account / GCP_REVIEW_OIDC_ACCOUNT |
Service Account to which impersonate with OpenID Connect authentication on review environment (only define to override default) |
none |
gcp-integ-oidc-provider / GCP_INTEG_OIDC_PROVIDER |
Workload Identity Provider associated with GitLab to authenticate with OpenID Connect on integration environment (only define to override default) |
none |
gcp-integ-oidc-account / GCP_INTEG_OIDC_ACCOUNT |
Service Account to which impersonate with OpenID Connect authentication on integration environment (only define to override default) |
none |
gcp-staging-oidc-provider / GCP_STAGING_OIDC_PROVIDER |
Workload Identity Provider associated with GitLab to authenticate with OpenID Connect on staging environment (only define to override default) |
none |
gcp-staging-oidc-account / GCP_STAGING_OIDC_ACCOUNT |
Service Account to which impersonate with OpenID Connect authentication on staging environment (only define to override default) |
none |
gcp-prod-oidc-provider / GCP_PROD_OIDC_PROVIDER |
Workload Identity Provider associated with GitLab to authenticate with OpenID Connect on production environment (only define to override default) |
none |
gcp-prod-oidc-account / GCP_PROD_OIDC_ACCOUNT |
Service Account to which impersonate with OpenID Connect authentication on production environment (only define to override default) |
none |
Example¶
With a common default GCP_OIDC_PROVIDER
and GCP_OIDC_ACCOUNT
configuration for non-prod environments, and a specific one for production:
# main template
- component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt@4.0.0
# Google Cloud variant
- component: $CI_SERVER_FQDN/to-be-continuous/dbt/gitlab-ci-dbt-gcp@4.0.0
inputs:
# common OIDC config for non-prod envs
gcp-oidc-provider: "projects/<gcp_nonprod_proj_id>/locations/global/workloadIdentityPools/<pool_id>/providers/<provider_id>"
gcp-oidc-account: "<name>@$<gcp_nonprod_proj_id>.iam.gserviceaccount.com"
# specific OIDC config for prod
gcp-prod-oidc-provider: "projects/<gcp_prod_proj_id>/locations/global/workloadIdentityPools/<pool_id>/providers/<provider_id>"
gcp-prod-oidc-account: "<name>@$<gcp_prod_proj_id>.iam.gserviceaccount.com"
The variant provide Google Application Default Credentials
To setup dbt for bigquery, apply this configuration in dbt profiles.yml
as described in dbt documentation:
my-bigquery-db:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: [GCP project id]
dataset: [the name of your dbt dataset]
threads: [1 or more]
<optional_config>: <value>