Troubleshooting

Overview

This section provides solutions for common issues encountered when implementing and running the automated ingestion and schema mapping pipeline.

Common Issues

1. S3 and Airflow Issues

Issue: S3 Sensor Not Triggering

Symptoms:

  • DAG not starting when files are uploaded

  • S3KeySensor task stuck in running state

  • No detection of new files

Solutions:

# Check S3 connection in Airflow
from airflow.providers.amazon.aws.hooks.s3 import S3Hook

def test_s3_connection():
    hook = S3Hook(aws_conn_id='aws_default')
    try:
        # Test bucket access
        hook.list_keys(bucket_name='your-bucket', prefix='incoming/')
        print("S3 connection successful")
    except Exception as e:
        print(f"S3 connection failed: {e}")

# Add to your DAG for debugging
test_connection = PythonOperator(
    task_id='test_s3_connection',
    python_callable=test_s3_connection,
    dag=dag
)

Checklist:

Issue: Schema Extraction Fails

Symptoms:

  • Task fails with pandas or CSV parsing errors

  • Memory issues with large files

  • Encoding problems

Solutions:

2. Chicory Agent Issues

Issue: Agent API Timeouts

Symptoms:

  • Requests timeout after 60 seconds

  • Agent responses are slow

  • Intermittent connectivity issues

Solutions:

Issue: Poor Quality Agent Responses

Symptoms:

  • Inconsistent mapping quality

  • Missing required fields in responses

  • Incorrect data type mappings

Solutions:

3. GitHub Actions Issues

Issue: Workflow Not Triggering

Symptoms:

  • GitHub Actions workflow doesn't start

  • No workflow runs showing in GitHub

  • API calls fail with 404

Solutions:

  1. Check Workflow File Location:

  1. Validate Workflow Syntax:

  1. Test API Trigger:

Issue: dbt Model Generation Fails

Symptoms:

  • dbt compilation errors

  • Invalid SQL syntax in generated models

  • Missing source references

Solutions:

4. dbt Issues

Issue: dbt Models Don't Compile

Symptoms:

  • dbt compile fails

  • Missing dependencies

  • Invalid references

Solutions:

Debugging Steps:

Issue: Test Failures

Symptoms:

  • Data quality tests fail

  • Relationship tests fail

  • Custom tests error out

Solutions:

5. Performance Issues

Issue: Slow Pipeline Execution

Symptoms:

  • Long processing times

  • Memory issues

  • Timeout errors

Solutions:

  1. Optimize Airflow Configuration:

  1. Implement Parallel Processing:

  1. Optimize Large File Processing:

Monitoring and Alerting

1. Set Up Comprehensive Monitoring

2. Create Debugging Scripts

Emergency Procedures

1. Pipeline Failure Recovery

2. Data Quality Issues

Getting Help

1. Log Analysis

2. Support Channels

  • Chicory AI Support: [email protected]

  • GitHub Issues: Create issue in your repository

  • Internal Documentation: Update troubleshooting docs with new solutions

3. Escalation Process

  1. Level 1: Check common issues in this guide

  2. Level 2: Run debugging scripts and check logs

  3. Level 3: Contact system administrators

  4. Level 4: Engage vendor support (Chicory, cloud providers)


This concludes the Automated Ingestion & Schema Mapping cookbook. For additional support, refer to the Chicory AI documentation or contact your system administrator.

Last updated