GitHub Actions Workflow

Overview

This section covers setting up GitHub Actions workflows to automate the schema mapping and dbt model generation process. We'll create two workflows:

  1. Schema Mapping Workflow: Triggered by Airflow, creates mapping and raises PR

  2. dbt Generation Workflow: Triggered by PR merge, generates dbt artifacts

Schema Mapping Workflow

1. Workflow Configuration

Create .github/workflows/schema-mapping.yml:

name: Automated Schema Mapping

on:
  workflow_dispatch:
    inputs:
      source_system:
        description: 'Source system name'
        required: true
        type: string
      table_name:
        description: 'Source table name'
        required: true
        type: string
      schema_json:
        description: 'JSON schema information'
        required: true
        type: string
      s3_file_path:
        description: 'S3 file path'
        required: true
        type: string

env:
  CHICORY_API_KEY: ${{ secrets.CHICORY_API_KEY }}
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_REGION: us-east-1

jobs:
  schema-mapping:
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          token: ${{ secrets.GITHUB_TOKEN }}

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install dependencies
        run: |
          pip install requests boto3 pyyaml

      - name: Parse schema information
        id: parse-schema
        run: |
          python -c "
          import json
          import os

          schema_data = json.loads('${{ github.event.inputs.schema_json }}')

          # Set outputs
          with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
              f.write(f'source_system=${{ github.event.inputs.source_system }}\\n')
              f.write(f'table_name=${{ github.event.inputs.table_name }}\\n')
              f.write(f'row_count={schema_data.get(\"row_count\", 0)}\\n')
              f.write(f'column_count={len(schema_data.get(\"columns\", []))}\\n')
          "

      - name: Generate schema mapping
        id: generate-mapping
        run: |
          python scripts/generate_schema_mapping.py \
            --source-schema '${{ github.event.inputs.schema_json }}' \
            --source-system '${{ github.event.inputs.source_system }}' \
            --table-name '${{ github.event.inputs.table_name }}' \
            --output-file mapping_result.json

      - name: Validate mapping result
        run: |
          python scripts/validate_mapping.py mapping_result.json

      - name: Create mapping directory
        run: |
          mkdir -p mappings/${{ steps.parse-schema.outputs.source_system }}
          cp mapping_result.json mappings/${{ steps.parse-schema.outputs.source_system }}/${{ steps.parse-schema.outputs.table_name }}_mapping.json

      - name: Generate mapping documentation
        run: |
          python scripts/generate_mapping_docs.py \
            --mapping-file mapping_result.json \
            --output-file mappings/${{ steps.parse-schema.outputs.source_system }}/${{ steps.parse-schema.outputs.table_name }}_mapping.md

      - name: Create Pull Request
        uses: peter-evans/create-pull-request@v5
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          commit-message: |
            Add schema mapping for ${{ steps.parse-schema.outputs.source_system }}.${{ steps.parse-schema.outputs.table_name }}

            - Source: ${{ github.event.inputs.s3_file_path }}
            - Columns: ${{ steps.parse-schema.outputs.column_count }}
            - Rows: ${{ steps.parse-schema.outputs.row_count }}

            Auto-generated by Chicory AI schema mapping agent.
          title: 'Schema Mapping: ${{ steps.parse-schema.outputs.source_system }}.${{ steps.parse-schema.outputs.table_name }}'
          body: |
            ## Schema Mapping Summary

            **Source System:** ${{ steps.parse-schema.outputs.source_system }}
            **Table Name:** ${{ steps.parse-schema.outputs.table_name }}
            **S3 File:** `${{ github.event.inputs.s3_file_path }}`
            **Columns:** ${{ steps.parse-schema.outputs.column_count }}
            **Rows:** ${{ steps.parse-schema.outputs.row_count }}

            ### Changes
            - ✅ Generated schema mapping configuration
            - ✅ Created mapping documentation
            - ✅ Validated mapping structure

            ### Next Steps
            1. Review the generated mapping in `mappings/${{ steps.parse-schema.outputs.source_system }}/${{ steps.parse-schema.outputs.table_name }}_mapping.json`
            2. Validate business logic and transformations
            3. Merge this PR to trigger dbt model generation

            ### Mapping Overview
            ```json
            $(cat mapping_result.json | jq '.mapping_metadata + {column_count: .column_mappings | length}')
            ```

            ---
            🤖 *This PR was automatically created by the Chicory AI schema mapping workflow.*
          branch: feature/schema-mapping-${{ steps.parse-schema.outputs.source_system }}-${{ steps.parse-schema.outputs.table_name }}
          delete-branch: true
          labels: |
            automated
            schema-mapping
            chicory-ai

2. Schema Mapping Script

Create scripts/generate_schema_mapping.py:

3. Validation Script

Create scripts/validate_mapping.py:

dbt Generation Workflow

1. Workflow Configuration

Create .github/workflows/dbt-generation.yml:

2. dbt Generation Script

Create scripts/generate_dbt_artifacts.py:

Repository Setup

1. GitHub Secrets

Configure these secrets in your GitHub repository:

2. Target Standards Configuration

Create scripts/target_standards.json:

3. Directory Structure

Ensure your repository has the required structure:

Testing GitHub Actions

1. Local Testing

Test the scripts locally before deploying:

2. Workflow Testing

Use workflow dispatch to test:


Next: dbt Model Generation

Last updated