Troubleshooting
Overview
This section provides solutions for common issues encountered when implementing and running the automated ingestion and schema mapping pipeline.
Common Issues
1. S3 and Airflow Issues
Issue: S3 Sensor Not Triggering
Symptoms:
DAG not starting when files are uploaded
S3KeySensor task stuck in running state
No detection of new files
Solutions:
# Check S3 connection in Airflow
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
def test_s3_connection():
hook = S3Hook(aws_conn_id='aws_default')
try:
# Test bucket access
hook.list_keys(bucket_name='your-bucket', prefix='incoming/')
print("S3 connection successful")
except Exception as e:
print(f"S3 connection failed: {e}")
# Add to your DAG for debugging
test_connection = PythonOperator(
task_id='test_s3_connection',
python_callable=test_s3_connection,
dag=dag
)Checklist:
Issue: Schema Extraction Fails
Symptoms:
Task fails with pandas or CSV parsing errors
Memory issues with large files
Encoding problems
Solutions:
2. Chicory Agent Issues
Issue: Agent API Timeouts
Symptoms:
Requests timeout after 60 seconds
Agent responses are slow
Intermittent connectivity issues
Solutions:
Issue: Poor Quality Agent Responses
Symptoms:
Inconsistent mapping quality
Missing required fields in responses
Incorrect data type mappings
Solutions:
3. GitHub Actions Issues
Issue: Workflow Not Triggering
Symptoms:
GitHub Actions workflow doesn't start
No workflow runs showing in GitHub
API calls fail with 404
Solutions:
Check Workflow File Location:
Validate Workflow Syntax:
Test API Trigger:
Issue: dbt Model Generation Fails
Symptoms:
dbt compilation errors
Invalid SQL syntax in generated models
Missing source references
Solutions:
4. dbt Issues
Issue: dbt Models Don't Compile
Symptoms:
dbt compilefailsMissing dependencies
Invalid references
Solutions:
Debugging Steps:
Issue: Test Failures
Symptoms:
Data quality tests fail
Relationship tests fail
Custom tests error out
Solutions:
5. Performance Issues
Issue: Slow Pipeline Execution
Symptoms:
Long processing times
Memory issues
Timeout errors
Solutions:
Optimize Airflow Configuration:
Implement Parallel Processing:
Optimize Large File Processing:
Monitoring and Alerting
1. Set Up Comprehensive Monitoring
2. Create Debugging Scripts
Emergency Procedures
1. Pipeline Failure Recovery
2. Data Quality Issues
Getting Help
1. Log Analysis
2. Support Channels
Chicory AI Support: [email protected]
GitHub Issues: Create issue in your repository
Internal Documentation: Update troubleshooting docs with new solutions
3. Escalation Process
Level 1: Check common issues in this guide
Level 2: Run debugging scripts and check logs
Level 3: Contact system administrators
Level 4: Engage vendor support (Chicory, cloud providers)
This concludes the Automated Ingestion & Schema Mapping cookbook. For additional support, refer to the Chicory AI documentation or contact your system administrator.
Last updated