Data Pipeline Reference
Data Pipeline Reference
Section titled “Data Pipeline Reference”Technical reference for the Starburst Galaxy → Parquet → Ryugraph data pipeline.
Overview
Section titled “Overview”This document details how relational data flows from Starburst Galaxy (managed Trino SaaS) SQL queries through Parquet files into Ryugraph/KuzuDB graph structures. It documents external system behaviour that our components depend on.
References:
Related documents:
- requirements.md - Mapping definition JSON schema (authoritative source)
- export-worker.design.md - How our worker executes exports
- ryugraph-wrapper.design.md - How our wrapper loads graphs
- control-plane.design.md#mapping-generator-subsystem - How we validate mappings
Data Flow Pipeline
Section titled “Data Flow Pipeline”Mermaid Source
flowchart LR A["Starburst<br/>SQL Query"] -->|UNLOAD| B["GCS<br/>Parquet Files"] B -->|Separate files for<br/>nodes and edges| C["Ryugraph<br/>COPY FROM"] C -->|CREATE NODE TABLE<br/>CREATE REL TABLE| D["Graph<br/>Instance"]Two-Tier Export Strategy
Section titled “Two-Tier Export Strategy”The platform implements a two-tier export strategy with automatic fallback:
| Tier | Method | When Used | Characteristics |
|---|---|---|---|
| 1 (Primary) | Server-side (system.unload()) | Starburst Galaxy with GCS catalog | Distributed execution, direct GCS write |
| 2 (Fallback) | Client-side (PyArrow) | When server-side unavailable | Streams through export worker, memory buffered |
Export Flow with Fallback
Section titled “Export Flow with Fallback”Export Worker → Starburst Galaxy │ ┌───────────┴───────────┐ │ │ ▼ ▼ system.unload() SELECT * (fallback) │ │ │ ▼ │ Export Worker │ │ │ ▼ │ PyArrow │ │ └───────────┬───────────┘ ▼ GCSFallback Triggers
Section titled “Fallback Triggers”The system falls back to PyArrow when:
system.unload()is not available- GCS catalog is not configured
- Feature flag is disabled
- Permission errors occur
Memory Considerations (Fallback Path)
Section titled “Memory Considerations (Fallback Path)”| Dataset Size | Memory Required | Recommendation |
|---|---|---|
| < 100 MB | 256 MB | Safe for fallback |
| 100 MB - 1 GB | 1-2 GB | Monitor closely |
| > 1 GB | 2+ GB | Prefer server-side |
Starburst Galaxy UNLOAD
Section titled “Starburst Galaxy UNLOAD”Syntax
Section titled “Syntax”The UNLOAD table function exports query results directly to GCS:
SELECT * FROM TABLE( io.unload( input => TABLE( SELECT customer_id, name, city, signup_date FROM analytics.customers ), location => 'gs://bucket/{user_id}/{mapping_id}/{snapshot_id}/nodes/customers/', format => 'PARQUET', compression => 'SNAPPY' ))Parameters:
| Parameter | Required | Description |
|---|---|---|
| input | Yes | TABLE(…) containing the SELECT query |
| location | Yes | GCS destination path |
| format | Yes | Output format (use ‘PARQUET’) |
| compression | No | Compression codec (recommend ‘SNAPPY’) |
Parallel Writing Behaviour
Section titled “Parallel Writing Behaviour”Starburst writes in parallel without partitioning. Multiple worker nodes each produce their own output files:
gs://bucket/snapshot/nodes/customers/├── 00000-123-uuid1-0-00001.parquet ← worker 1├── 00000-124-uuid2-0-00001.parquet ← worker 2├── 00000-125-uuid3-0-00001.parquet ← worker 3└── 00000-126-uuid4-0-00001.parquet ← worker 4File naming pattern: {sequence}-{task}-{UUID}-{split}-{part}.parquet
The number of files depends on cluster size and data volume. For typical ≤2GB graphs, this produces a manageable number of files.
Column Ordering Requirements
Section titled “Column Ordering Requirements”The Parquet schema is controlled entirely by the SELECT query. Column ordering is critical:
- For nodes: primary_key must be first column
- For edges: from_key, to_key must be first two columns
- Properties follow in defined order
SELECT * FROM TABLE( io.unload( input => TABLE( SELECT customer_id, -- Primary key (must be first) name, -- Property CAST(balance AS DOUBLE) as balance, -- Cast DECIMAL to DOUBLE signup_date -- Property FROM analytics.customers ), location => 'gs://bucket/path/nodes/customers/', format => 'PARQUET', compression => 'SNAPPY' ))Ryugraph Schema
Section titled “Ryugraph Schema”Node Table Definition
Section titled “Node Table Definition”CREATE NODE TABLE Customer( customer_id STRING PRIMARY KEY, name STRING, city STRING, signup_date DATE);Rules:
- Every node table requires a
PRIMARY KEY - Primary key must be unique across all nodes of that type
- Supported PK types: STRING, INT64, DATE, UUID
Relationship Table Definition
Section titled “Relationship Table Definition”CREATE REL TABLE PURCHASED( FROM Customer TO Product, purchase_date DATE, amount DOUBLE, quantity INT64);Rules:
- Must specify
FROM NodeTable TO NodeTable - First two columns in Parquet must be FROM and TO primary keys
- Relationship properties follow the FROM/TO columns
Multi-Source/Target Relationships
Section titled “Multi-Source/Target Relationships”CREATE REL TABLE REVIEWED( FROM Customer TO Product, FROM Customer TO Merchant, rating INT64, review_text STRING);When importing, specify which pair:
COPY REVIEWED FROM 'customer_product_reviews.parquet' (from='Customer', to='Product');COPY REVIEWED FROM 'customer_merchant_reviews.parquet' (from='Customer', to='Merchant');Design decision: Our mapping schema does not currently support multi-source/target relationships. Each edge definition connects exactly one
from_nodeto oneto_node. For polymorphic relationships, define separate edge types (e.g.,REVIEWED_PRODUCT,REVIEWED_MERCHANT). This simplifies validation, export, and import logic. Revisit if analysts need to query across polymorphic endpoints as a single relationship type—would require allowing multiple edge definitions with the sametypeand identicalproperties.
Multiplicity Constraints
Section titled “Multiplicity Constraints”-- Each customer has at most one primary addressCREATE REL TABLE HAS_PRIMARY_ADDRESS(FROM Customer TO Address, MANY_ONE);
-- One-to-one marriage relationshipCREATE REL TABLE MARRIED_TO(FROM Person TO Person, ONE_ONE);Design decision: Our mapping schema does not support multiplicity constraints. All relationships use the default
MANY_MANY. Since our graphs are read-only analytics snapshots (no inserts/updates after initial load), cardinality enforcement provides no benefit—the source data already defines the actual cardinality.
Parquet File Requirements
Section titled “Parquet File Requirements”Node Table Parquet Structure
Section titled “Node Table Parquet Structure”For a node table:
CREATE NODE TABLE Customer( customer_id STRING PRIMARY KEY, name STRING, age INT64, city STRING);Parquet columns must be in exact order:
| Column | Type | Notes |
|---|---|---|
| customer_id | STRING | Primary key (required, first) |
| name | STRING | Property |
| age | INT64 | Property |
| city | STRING | Property |
Relationship Table Parquet Structure
Section titled “Relationship Table Parquet Structure”For a relationship table:
CREATE REL TABLE PURCHASED( FROM Customer TO Product, purchase_date DATE, amount DOUBLE);Parquet columns must be:
| Column | Type | Notes |
|---|---|---|
| customer_id | STRING | FROM node primary key (required, first) |
| product_id | STRING | TO node primary key (required, second) |
| purchase_date | DATE | Relationship property |
| amount | DOUBLE | Relationship property |
SERIAL Primary Keys (Auto-Increment)
Section titled “SERIAL Primary Keys (Auto-Increment)”For large datasets with sequential IDs:
CREATE NODE TABLE Event( id SERIAL PRIMARY KEY, event_type STRING, timestamp TIMESTAMP);When using SERIAL:
- Omit the primary key column from Parquet file
- Ryugraph auto-generates IDs starting from 0
- Significantly improves load performance
Parquet columns for SERIAL node:
| Column | Type |
|---|---|
| event_type | STRING |
| timestamp | TIMESTAMP |
Design decision: Our mapping schema does not support SERIAL primary keys. We always use the source database primary key to maintain traceability between graph nodes and source data. At our target scale (≤2GB graphs, ~10M nodes max), the performance benefit of SERIAL is negligible. SERIAL would also complicate edge definitions since edges reference nodes by primary key, and auto-generated IDs aren’t known until after node import. Revisit if scale increases significantly (50M+ nodes)—SERIAL provides fastest graph traversal via sequential INT64 keys with optimal memory layout and cache locality.
Ryugraph COPY FROM
Section titled “Ryugraph COPY FROM”Syntax
Section titled “Syntax”COPY TableName FROM 'gs://bucket/path/*.parquet';Import Order Constraint
Section titled “Import Order Constraint”Critical: Always import nodes before relationships. Relationships reference node primary keys that must already exist.
-- STEP 1: Import nodes FIRST (order matters!)COPY Customer FROM 'gs://bucket/snapshot_123/nodes/customers/*.parquet';COPY Product FROM 'gs://bucket/snapshot_123/nodes/products/*.parquet';
-- STEP 2: Import relationships AFTER nodes existCOPY PURCHASED FROM 'gs://bucket/snapshot_123/edges/purchased/*.parquet';Glob Pattern Support
Section titled “Glob Pattern Support”Ryugraph reads all matching files via glob pattern:
COPY Customer FROM 'gs://bucket/path/*.parquet'; -- reads all parquet filesType Mapping
Section titled “Type Mapping”Starburst to Ryugraph Type Mapping
Section titled “Starburst to Ryugraph Type Mapping”| Starburst/Trino Type | Ryugraph Type | Notes |
|---|---|---|
| VARCHAR, CHAR | STRING | UTF-8 encoded |
| BIGINT | INT64 | 64-bit signed integer |
| INTEGER | INT32 | 32-bit signed integer |
| SMALLINT | INT16 | 16-bit signed integer |
| TINYINT | INT8 | 8-bit signed integer |
| DOUBLE | DOUBLE | 64-bit floating point |
| REAL | FLOAT | 32-bit floating point |
| DECIMAL | DOUBLE | Cast in SQL query |
| DATE | DATE | Calendar date |
| TIMESTAMP | TIMESTAMP | Date and time |
| TIMESTAMP WITH TIME ZONE | TIMESTAMP | Timezone stripped |
| BOOLEAN | BOOL | True/false |
| VARBINARY | BLOB | Binary data |
| ARRAY | LIST | Variable-length array |
| MAP | MAP | Key-value pairs |
| ROW | STRUCT | Structured type |
| UUID | UUID | Universal identifier |
Required SQL Casts
Section titled “Required SQL Casts”Some types require explicit casting in the SQL query:
SELECT customer_id, name, CAST(price AS DOUBLE) as price, -- DECIMAL -> DOUBLE CAST(created_at AS TIMESTAMP) as created, -- Handle timezone CAST(is_active AS BOOLEAN) as is_active -- Ensure booleanFROM source_tableError Handling
Section titled “Error Handling”Duplicate Primary Keys
Section titled “Duplicate Primary Keys”Ryugraph rejects duplicate primary keys by default:
-- Enable error skipping for dirty dataCOPY Customer FROM 'file.parquet' (IGNORE_ERRORS=true);Note: IGNORE_ERRORS=true has performance cost. Prefer clean data.
Missing Foreign Keys
Section titled “Missing Foreign Keys”Relationships referencing non-existent nodes will fail:
Error: Node with primary key 'customer_xyz' not found in Customer tableSolution: Always load nodes before relationships.
Schema Mismatch
Section titled “Schema Mismatch”Parquet columns must match Ryugraph schema exactly:
Error: Column count mismatch. Expected 4 columns, got 3.Solution: Ensure SELECT column order matches CREATE TABLE property order.
Performance Considerations
Section titled “Performance Considerations”Parallel I/O
Section titled “Parallel I/O”| Stage | Parallelism | How |
|---|---|---|
| Starburst UNLOAD | Multiple workers write files concurrently | Automatic based on cluster size |
| Ryugraph COPY FROM | Multiple threads read files concurrently | max_num_threads setting |
Both parallel write and parallel read work without partitioning. The flat file structure is optimal:
gs://bucket/snapshot/nodes/customers/├── file-001.parquet ← written by worker 1, read by thread A├── file-002.parquet ← written by worker 2, read by thread B├── file-003.parquet ← written by worker 3, read by thread C└── ...Starburst Export Tuning
Section titled “Starburst Export Tuning”- Use
compression => 'SNAPPY'for balance of speed and size - Files are automatically parallelised across workers
- No partitioning needed for performance
Ryugraph Import Tuning
Section titled “Ryugraph Import Tuning”- COPY FROM is fastest for bulk loading (vs individual inserts)
- Use glob patterns:
COPY FROM 'path/*.parquet'to read all files - Ryugraph reads multiple files in parallel using available threads
- Set buffer pool appropriately:
buffer_pool_size=2_147_483_648(2GB) - For consecutive integer IDs, use
SERIALand omit PK column
Memory Management
Section titled “Memory Management”For large graphs (approaching 2GB limit):
- Configure Ryugraph buffer pool to ~80% of pod memory
- Enable disk spilling for larger-than-memory operations
- Use persistent volume for spill files
Mapping Many-to-Many Relationships
Section titled “Mapping Many-to-Many Relationships”In relational databases, many-to-many relationships are represented using join tables with foreign keys to both entities. In the graph model, these become relationship tables with properties.
Relational Pattern
Section titled “Relational Pattern”-- Join table with foreign keys to both entitiesCREATE TABLE transactions ( transaction_id VARCHAR PRIMARY KEY, customer_id VARCHAR REFERENCES customers(customer_id), product_id VARCHAR REFERENCES products(product_id), amount DECIMAL(10,2), transaction_date TIMESTAMP);Graph Pattern
Section titled “Graph Pattern”-- Foreign keys become FROM/TO, other columns become propertiesCREATE REL TABLE PURCHASED( FROM Customer TO Product, transaction_id STRING, amount DOUBLE, transaction_date TIMESTAMP);Mapping Rules
Section titled “Mapping Rules”| Relational Concept | Graph Equivalent |
|---|---|
| Join table | REL TABLE |
| FK to source entity | FROM NodeType |
| FK to target entity | TO NodeType |
| Other columns | Relationship properties |
| Join table PK | Optional property (or omit if not needed) |
Export Query Structure
Section titled “Export Query Structure”Critical: Column ordering determines the relationship direction.
SELECT customer_id, -- FROM node PK (must be first) product_id, -- TO node PK (must be second) transaction_id, -- Property CAST(amount AS DOUBLE) as amount, transaction_dateFROM transactionsImport Order
Section titled “Import Order”Always load both node tables before the relationship table:
-- 1. Load nodes firstCOPY Customer FROM 'gs://bucket/nodes/Customer/*.parquet';COPY Product FROM 'gs://bucket/nodes/Product/*.parquet';
-- 2. Load relationships after nodes existCOPY PURCHASED FROM 'gs://bucket/edges/PURCHASED/*.parquet';Complete Example
Section titled “Complete Example”Source: Relational Schema (in Starburst)
Section titled “Source: Relational Schema (in Starburst)”-- Customers tableCREATE TABLE customers ( customer_id VARCHAR PRIMARY KEY, name VARCHAR, email VARCHAR, city VARCHAR, signup_date DATE);
-- Products tableCREATE TABLE products ( product_id VARCHAR PRIMARY KEY, name VARCHAR, category VARCHAR, price DECIMAL(10,2));
-- Transactions table (join table with attributes)CREATE TABLE transactions ( transaction_id VARCHAR PRIMARY KEY, customer_id VARCHAR REFERENCES customers(customer_id), product_id VARCHAR REFERENCES products(product_id), amount DECIMAL(10,2), transaction_date TIMESTAMP);Target: Graph Schema (in Ryugraph)
Section titled “Target: Graph Schema (in Ryugraph)”-- Node tablesCREATE NODE TABLE Customer( customer_id STRING PRIMARY KEY, name STRING, email STRING, city STRING, signup_date DATE);
CREATE NODE TABLE Product( product_id STRING PRIMARY KEY, name STRING, category STRING, price DOUBLE);
-- Relationship table (foreign keys become explicit edges)CREATE REL TABLE PURCHASED( FROM Customer TO Product, transaction_id STRING, amount DOUBLE, transaction_date TIMESTAMP);Export Queries
Section titled “Export Queries”-- Export CustomersSELECT * FROM TABLE(io.unload( input => TABLE( SELECT customer_id, name, email, city, signup_date FROM analytics.customers ), location => 'gs://bucket/snapshot_123/nodes/Customer/', format => 'PARQUET', compression => 'SNAPPY'))
-- Export ProductsSELECT * FROM TABLE(io.unload( input => TABLE( SELECT product_id, name, category, CAST(price AS DOUBLE) as price FROM analytics.products ), location => 'gs://bucket/snapshot_123/nodes/Product/', format => 'PARQUET', compression => 'SNAPPY'))
-- Export PURCHASED relationshipsSELECT * FROM TABLE(io.unload( input => TABLE( SELECT t.customer_id, -- FROM node PK (first) t.product_id, -- TO node PK (second) t.transaction_id, -- Property CAST(t.amount AS DOUBLE) as amount, t.transaction_date FROM analytics.transactions t ), location => 'gs://bucket/snapshot_123/edges/PURCHASED/', format => 'PARQUET', compression => 'SNAPPY'))Import Commands
Section titled “Import Commands”-- Create schemaCREATE NODE TABLE Customer(customer_id STRING PRIMARY KEY, name STRING, email STRING, city STRING, signup_date DATE);CREATE NODE TABLE Product(product_id STRING PRIMARY KEY, name STRING, category STRING, price DOUBLE);CREATE REL TABLE PURCHASED(FROM Customer TO Product, transaction_id STRING, amount DOUBLE, transaction_date TIMESTAMP);
-- Import nodes firstCOPY Customer FROM 'gs://bucket/snapshot_123/nodes/Customer/*.parquet';COPY Product FROM 'gs://bucket/snapshot_123/nodes/Product/*.parquet';
-- Import relationships after nodes existCOPY PURCHASED FROM 'gs://bucket/snapshot_123/edges/PURCHASED/*.parquet';GCS File Structure
Section titled “GCS File Structure”gs://bucket/snapshot_123/├── nodes/│ ├── Customer/│ │ ├── 00000-xxx-uuid1.parquet│ │ └── 00000-xxx-uuid2.parquet│ └── Product/│ └── 00000-xxx-uuid3.parquet└── edges/ └── PURCHASED/ ├── 00000-xxx-uuid4.parquet └── 00000-xxx-uuid5.parquetStarburst/Trino REST API Protocol
Section titled “Starburst/Trino REST API Protocol”This section documents the Starburst REST API used for async query execution. Reference: Trino Client Protocol.
Query Submission
Section titled “Query Submission”Submit a query via POST to /v1/statement:
POST /v1/statement HTTP/1.1Host: starburst.example.comX-Trino-Catalog: analyticsX-Trino-Schema: publicContent-Type: text/plainAuthorization: Basic <base64>
SELECT * FROM TABLE(io.unload(...))Response (success):
{ "id": "20250116_123456_00001_abcde", "infoUri": "http://starburst:8080/ui/query/20250116_123456_00001_abcde", "nextUri": "http://starburst:8080/v1/query/20250116_123456_00001_abcde/1", "stats": { "state": "QUEUED", "queued": true, "scheduled": false, "nodes": 0, "totalSplits": 0, "queuedSplits": 0, "runningSplits": 0, "completedSplits": 0 }}Response (immediate error):
{ "id": "20250116_123456_00001_abcde", "stats": {"state": "FAILED"}, "error": { "message": "line 1:1: Table 'analytics.public.invalid' does not exist", "errorCode": 1, "errorName": "TABLE_NOT_FOUND", "errorType": "USER_ERROR" }}Query Polling
Section titled “Query Polling”Poll query status via GET to the nextUri:
GET /v1/query/20250116_123456_00001_abcde/1 HTTP/1.1Host: starburst.example.comAuthorization: Basic <base64>Response (in progress):
{ "id": "20250116_123456_00001_abcde", "nextUri": "http://starburst:8080/v1/query/20250116_123456_00001_abcde/2", "stats": { "state": "RUNNING", "queued": false, "scheduled": true, "nodes": 4, "totalSplits": 100, "queuedSplits": 20, "runningSplits": 30, "completedSplits": 50 }}Response (finished - no nextUri):
{ "id": "20250116_123456_00001_abcde", "stats": { "state": "FINISHED", "queued": false, "scheduled": true, "nodes": 4, "totalSplits": 100, "queuedSplits": 0, "runningSplits": 0, "completedSplits": 100 }}Response (failed):
{ "id": "20250116_123456_00001_abcde", "stats": {"state": "FAILED"}, "error": { "message": "Query exceeded maximum time limit of 1.00h", "errorCode": 65540, "errorName": "EXCEEDED_TIME_LIMIT", "errorType": "USER_ERROR" }}Query State Machine
Section titled “Query State Machine” QUEUED ↓ PLANNING ↓ STARTING ↓ RUNNING ────→ FAILED ↓ FINISHING ↓ FINISHEDState descriptions:
| State | Description |
|---|---|
QUEUED | Query waiting in queue |
PLANNING | Query plan being generated |
STARTING | Allocating resources |
RUNNING | Executing on workers |
FINISHING | Completing final operations |
FINISHED | Query completed successfully |
FAILED | Query failed (check error field) |
Protocol Rules
Section titled “Protocol Rules”-
Completion detection: Keep polling
nextUriuntil it’s absent from the response. Thestatefield is for humans only. -
HTTP error handling:
Status Action 200 Process response 429 Retry after Retry-Afterheader502, 503, 504 Retry in 50-100ms (load balancer issue) Other Query failed -
Authentication: Headers only required on initial POST, not when following
nextUri. -
Query cancellation: Send
DELETEtonextUrito cancel a running query.
Required Headers
Section titled “Required Headers”| Header | Required | Description |
|---|---|---|
X-Trino-Catalog | Yes (POST) | Default catalog |
X-Trino-Schema | Yes (POST) | Default schema |
Authorization | Yes | Basic auth or Bearer token |
Content-Type | Yes (POST) | text/plain for query body |
Appendix: GCS Configuration for Starburst
Section titled “Appendix: GCS Configuration for Starburst”Catalog Configuration
Section titled “Catalog Configuration”connector.name=hivehive.metastore.uri=thrift://metastore:9083
# Enable native GCSfs.native-gcs.enabled=truegcs.project-id=your-gcp-projectgcs.json-key-file-path=/path/to/service-account.jsonUNLOAD Credentials
Section titled “UNLOAD Credentials”{ "id": "gcs-export", "location": "gs://your-bucket/", "configuration": { "fs.native-gcs.enabled": "true", "gcs.json-key": "{...service account key JSON...}" }}Required GCS Permissions
Section titled “Required GCS Permissions”storage.objects.createstorage.objects.delete(for overwrites)storage.objects.getstorage.objects.liststorage.buckets.get
E2E Testing with Trino
Section titled “E2E Testing with Trino”For E2E testing, we use open-source Trino with a translation proxy instead of Starburst Enterprise. This approach replicates production behavior by executing real SQL queries and writing actual Parquet files.
Architecture
Section titled “Architecture”Starburst’s io.unload() is proprietary and not available in open-source Trino. The trino-proxy translates io.unload() calls to Hive CTAS (CREATE TABLE AS SELECT) with external_location:
-- Input (Starburst io.unload):SELECT * FROM TABLE(io.unload( input => TABLE(SELECT id, name FROM catalog.schema.table), location => 'gs://bucket/export/', format => 'PARQUET', compression => 'SNAPPY'))
-- Output (Trino Hive CTAS):CREATE TABLE hive.temp.export_abc123WITH (format = 'PARQUET', external_location = 'gs://bucket/export/')AS SELECT id, name FROM catalog.schema.tableComponents
Section titled “Components”| Component | Image | Purpose |
|---|---|---|
| trino | trino-gcs:e2e-test | Trino 479 with native GCS (gcs.endpoint for fake-gcs-server) |
| trino-proxy | trino-proxy:e2e-test | io.unload() → Hive CTAS translation |
| hive-metastore | hive-metastore-gcs:e2e-test | Apache Hive 4.1.0 with GCS connector JAR |
| fake-gcs-server | fsouza/fake-gcs-server:1.52.3 | GCS emulator for local testing |
Why This Approach
Section titled “Why This Approach”- Same execution path - Trino executes queries and writes Parquet, same as production
- Same storage API - Uses GCS URLs (
gs://), not S3/MinIO - Same file format - Raw Parquet files, not Iceberg tables
- Export-worker unchanged - No code changes needed for E2E testing
References
Section titled “References”- ADR-029: E2E Trino Stack with Hive CTAS Translation
- e2e-tests.design.md
tests/e2e/k8s/- E2E test Kubernetes configurations