GitHub Actions Workflow
Overview
This section covers setting up GitHub Actions workflows to automate the schema mapping and dbt model generation process. We'll create two workflows:
Schema Mapping Workflow: Triggered by Airflow, creates mapping and raises PR
dbt Generation Workflow: Triggered by PR merge, generates dbt artifacts
Schema Mapping Workflow
1. Workflow Configuration
Create .github/workflows/schema-mapping.yml:
name: Automated Schema Mapping
on:
workflow_dispatch:
inputs:
source_system:
description: 'Source system name'
required: true
type: string
table_name:
description: 'Source table name'
required: true
type: string
schema_json:
description: 'JSON schema information'
required: true
type: string
s3_file_path:
description: 'S3 file path'
required: true
type: string
env:
CHICORY_API_KEY: ${{ secrets.CHICORY_API_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: us-east-1
jobs:
schema-mapping:
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: |
pip install requests boto3 pyyaml
- name: Parse schema information
id: parse-schema
run: |
python -c "
import json
import os
schema_data = json.loads('${{ github.event.inputs.schema_json }}')
# Set outputs
with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
f.write(f'source_system=${{ github.event.inputs.source_system }}\\n')
f.write(f'table_name=${{ github.event.inputs.table_name }}\\n')
f.write(f'row_count={schema_data.get(\"row_count\", 0)}\\n')
f.write(f'column_count={len(schema_data.get(\"columns\", []))}\\n')
"
- name: Generate schema mapping
id: generate-mapping
run: |
python scripts/generate_schema_mapping.py \
--source-schema '${{ github.event.inputs.schema_json }}' \
--source-system '${{ github.event.inputs.source_system }}' \
--table-name '${{ github.event.inputs.table_name }}' \
--output-file mapping_result.json
- name: Validate mapping result
run: |
python scripts/validate_mapping.py mapping_result.json
- name: Create mapping directory
run: |
mkdir -p mappings/${{ steps.parse-schema.outputs.source_system }}
cp mapping_result.json mappings/${{ steps.parse-schema.outputs.source_system }}/${{ steps.parse-schema.outputs.table_name }}_mapping.json
- name: Generate mapping documentation
run: |
python scripts/generate_mapping_docs.py \
--mapping-file mapping_result.json \
--output-file mappings/${{ steps.parse-schema.outputs.source_system }}/${{ steps.parse-schema.outputs.table_name }}_mapping.md
- name: Create Pull Request
uses: peter-evans/create-pull-request@v5
with:
token: ${{ secrets.GITHUB_TOKEN }}
commit-message: |
Add schema mapping for ${{ steps.parse-schema.outputs.source_system }}.${{ steps.parse-schema.outputs.table_name }}
- Source: ${{ github.event.inputs.s3_file_path }}
- Columns: ${{ steps.parse-schema.outputs.column_count }}
- Rows: ${{ steps.parse-schema.outputs.row_count }}
Auto-generated by Chicory AI schema mapping agent.
title: 'Schema Mapping: ${{ steps.parse-schema.outputs.source_system }}.${{ steps.parse-schema.outputs.table_name }}'
body: |
## Schema Mapping Summary
**Source System:** ${{ steps.parse-schema.outputs.source_system }}
**Table Name:** ${{ steps.parse-schema.outputs.table_name }}
**S3 File:** `${{ github.event.inputs.s3_file_path }}`
**Columns:** ${{ steps.parse-schema.outputs.column_count }}
**Rows:** ${{ steps.parse-schema.outputs.row_count }}
### Changes
- ✅ Generated schema mapping configuration
- ✅ Created mapping documentation
- ✅ Validated mapping structure
### Next Steps
1. Review the generated mapping in `mappings/${{ steps.parse-schema.outputs.source_system }}/${{ steps.parse-schema.outputs.table_name }}_mapping.json`
2. Validate business logic and transformations
3. Merge this PR to trigger dbt model generation
### Mapping Overview
```json
$(cat mapping_result.json | jq '.mapping_metadata + {column_count: .column_mappings | length}')
```
---
🤖 *This PR was automatically created by the Chicory AI schema mapping workflow.*
branch: feature/schema-mapping-${{ steps.parse-schema.outputs.source_system }}-${{ steps.parse-schema.outputs.table_name }}
delete-branch: true
labels: |
automated
schema-mapping
chicory-ai2. Schema Mapping Script
Create scripts/generate_schema_mapping.py:
3. Validation Script
Create scripts/validate_mapping.py:
dbt Generation Workflow
1. Workflow Configuration
Create .github/workflows/dbt-generation.yml:
2. dbt Generation Script
Create scripts/generate_dbt_artifacts.py:
Repository Setup
1. GitHub Secrets
Configure these secrets in your GitHub repository:
2. Target Standards Configuration
Create scripts/target_standards.json:
3. Directory Structure
Ensure your repository has the required structure:
Testing GitHub Actions
1. Local Testing
Test the scripts locally before deploying:
2. Workflow Testing
Use workflow dispatch to test:
Next: dbt Model Generation
Last updated