dbt Model Generation

Overview

This section covers the automated generation of dbt models and YAML documentation from schema mappings using Chicory AI agents. The system creates production-ready dbt artifacts following best practices.

Generated Artifacts

1. dbt Model SQL

The Chicory agent generates complete dbt model SQL files with:

  • Proper source references

  • Column transformations

  • Data quality logic

  • Performance optimizations

Example Generated Model:

{{
  config(
    materialized='table',
    tags=['auto-generated', 'dimension', 'customer'],
    description='Customer dimension table with standardized attributes'
  )
}}

with source_data as (
  select
    customer_id,
    first_name,
    last_name,
    email,
    phone,
    created_date,
    updated_date
  from {{ source('raw_crm', 'customers') }}
),

transformed as (
  select
    -- Surrogate key
    {{ dbt_utils.surrogate_key(['customer_id']) }} as customer_sk,

    -- Business key
    cast(customer_id as string) as customer_bk,

    -- Standardized names
    initcap(trim(first_name)) as first_name,
    initcap(trim(last_name)) as last_name,
    trim(concat(
      initcap(trim(first_name)),
      ' ',
      initcap(trim(last_name))
    )) as full_name,

    -- Contact information
    lower(trim(email)) as email_address,
    regexp_replace(phone, r'[^\d]', '') as phone_number,

    -- Temporal attributes
    cast(created_date as date) as customer_created_date,
    cast(updated_date as timestamp) as last_updated_ts,

    -- Audit columns
    current_timestamp() as created_at,
    current_timestamp() as updated_at,
    true as is_active

  from source_data
)

select * from transformed

2. YAML Documentation

Comprehensive YAML documentation with tests and metadata:

Generated Model Types

1. Dimension Tables

For dimension tables, the agent generates:

  • Surrogate keys using dbt_utils

  • SCD Type 2 structure (when applicable)

  • Standardized attribute transformations

  • Comprehensive business key mapping

2. Fact Tables

For fact tables, the agent includes:

  • Foreign key relationships

  • Measure calculations

  • Grain documentation

  • Aggregation logic

3. Staging Tables

For staging models, the agent provides:

  • Basic data type casting

  • Column renaming for consistency

  • Initial data quality checks

  • Source system documentation

Customization Options

1. Model Configuration

Customize generated models through dbt_project.yml:

2. Transformation Templates

Create custom transformation templates for specific patterns:

3. Testing Standards

Define organization-specific testing standards:

Advanced Features

1. Incremental Models

For large datasets, the agent can generate incremental models:

2. Slowly Changing Dimensions

Generate SCD Type 2 logic for dimension tables:

3. Data Quality Monitoring

Generate data quality monitoring models:

Integration with dbt Packages

1. dbt-utils Integration

The generated models leverage dbt-utils for:

  • Surrogate key generation

  • Cross-database compatibility

  • Common transformations

2. dbt-expectations Integration

For advanced data quality testing:

Performance Optimization

1. Partitioning and Clustering

For BigQuery targets:

2. Query Optimization

The agent applies optimization techniques:

Validation and Testing

1. Model Compilation

Test generated models compile correctly:

2. Data Quality Testing

Run comprehensive tests:

3. Documentation Generation

Generate and serve documentation:

Monitoring and Maintenance

1. Model Performance Monitoring

2. Automated Alerts

Set up alerts for:

  • Model compilation failures

  • Test failures

  • Performance degradation

  • Data freshness issues

3. Model Evolution

Track model changes over time:


Next: Testing & Validation

Last updated