Automated Ingestion & Schema Mapping
This cookbook demonstrates how to integrate Chicory AI with your data stack to automate schema mapping and dbt model generation when new CSV files land in your data lake. The workflow combines Airflow orchestration, GitHub Actions, and Chicory agents to create a fully automated data ingestion pipeline.
Quick Start
Set up GCS bucket monitoring with Airflow
Configure GitHub Actions for schema mapping
Deploy Chicory agent for schema transformation
Set up automated dbt model generation workflow
Test the complete pipeline with a sample CSV file

Architecture Overview
The automated ingestion workflow follows these steps:
New CSV lands in S3 bucket
Airflow S3 Sensor detects file → starts DAG
DAG extracts schema → triggers GitHub Action
GitHub Action + Chicory Agent map source schema → target model, outputs mapping.json
Action raises PR with mapping
Once PR is merged, second GitHub Action calls Chicory Agent → generates dbt model + YAML docs
Final PR adds dbt artifacts → ready to run
Contents
Introduction – Architecture overview and prerequisites
S3 Bucket Setup – Configure S3 bucket and IAM permissions
Airflow DAG Configuration – Set up S3 sensor and schema extraction
Chicory Agent Creation – Creating schema mapping agents
GitHub Actions Workflow – Automated PR creation and dbt generation
dbt Model Generation – Automated model and documentation creation
Testing & Validation – End-to-end pipeline testing
Troubleshooting – Common issues & fixes
Last updated