Getting Started

Onboarding

Prerequisites

Before you begin, ensure you have:

  • A project repository with SparkSQL queries (we support both *.sql and *.py)

  • A GitHub account (for GitHub support)

  • Visual Studio Code (for IDE support)

In the current version of our tooling, extraction of SparkSQL queries from PySpark code is supported exclusively through the .sql() transformation method.

Step 1: Sign Up for the API Service

  1. Request your API key (keep this key secure, as you'll need it to authenticate requests)

Standard Tier: 10 requests per day

Step 2: Prepare Your Project - Offline Metadata Server Configuration

To enhance the precision of data pipeline analysis, Chicory utilizes the context from your metadata service by understanding the backend setup. Offline context provision is also supported:

{
    "[TABLE_NAME]": "[DDL]",
    . . .
}

Use command:

pg_dump -s -h [host] -U [username] -d [database_name] > db_schema.sql

# OR

SELECT 'CREATE TABLE ' || tablename || ' (' || LISTAGG(column || ' ' || type, ', ') WITHIN GROUP (ORDER BY ordinal_position) || ');'
FROM (
    SELECT tablename, ordinal_position, column, type
    FROM pg_table_def
    WHERE schemaname = 'public' -- Specify your schema name if different
) AS dt
GROUP BY tablename;

Use command:


SELECT relname AS "Table",
       n_live_tup AS "Live Rows",
       n_dead_tup AS "Dead Rows",
       last_vacuum AS "Last Vacuum",
       last_analyze AS "Last Analyze"
FROM pg_stat_user_tables;

# OR

SELECT table_id, 
       table, 
       size, 
       pct_used, 
       empty, 
       unsorted, 
       stats_off 
FROM svv_table_info;

Supported Integrations

Support

For support, please open an issue in the project repository or contact us directly - hello@chicory.ai

Last updated