SDK Core Concepts
SDK Core Concepts
Section titled “SDK Core Concepts”This guide explains the fundamental concepts for working with the Graph OLAP SDK - the sole user interface for the Graph OLAP Platform. All platform operations are performed through this SDK in Jupyter notebooks. This document covers platform architecture, resource hierarchy, connection lifecycle, and query execution patterns.
1. Platform Architecture Overview
Section titled “1. Platform Architecture Overview”The Graph OLAP Platform follows a control plane + data plane architecture where the Control Plane manages resource lifecycle and the Data Plane (Wrapper Pods) handles graph operations.
Architecture Diagram
Section titled “Architecture Diagram” +------------------+ | Jupyter/SDK | | (Client) | +--------+---------+ | | HTTPS v+-------------------------------------------------------------------------+| Ingress Controller || /api/* -> Control Plane | /{instance-id}/* -> Wrapper Pod |+-------------------------------------------------------------------------+ | | v v+------------------------+ +------------------------+| Control Plane | | Wrapper Pod (N) || (FastAPI) | | (FastAPI + Graph) || | | || - REST API | | - Cypher Queries || - Resource Lifecycle | | - Graph Algorithms || - Background Jobs | | - NetworkX Support |+----------+-------------+ +------------------------+ | | v v+------------------------+ +------------------------+| Cloud SQL | | Google Cloud || (PostgreSQL) | | Storage (Parquet) |+------------------------+ +------------------------+Component Responsibilities
Section titled “Component Responsibilities”| Component | Role | SDK Interaction |
|---|---|---|
| Control Plane | Manages mappings, snapshots, and instances | SDK calls /api/* endpoints |
| Wrapper Pods | Runs graph database with data | SDK calls /{id}/* for queries |
| Export Workers | Exports SQL results to Parquet | Background (no SDK interaction) |
| Cloud SQL | Stores platform metadata | Internal (no SDK interaction) |
| GCS | Stores exported Parquet files | Internal (no SDK interaction) |
Data Flow:
- User creates a Mapping defining SQL-to-graph schema via Control Plane
- User creates an Instance from a Mapping; system automatically creates a snapshot internally
- Export Workers run UNLOAD queries to GCS (automatic)
- Wrapper Pod loads Parquet data when snapshot is ready (automatic)
- User connects to the Instance and executes Cypher queries/algorithms
SDK as Client Interface
Section titled “SDK as Client Interface”The SDK (graph-olap package) provides a unified client interface that abstracts API complexity, manages authentication, provides type safety with Pydantic models, and enables rich Jupyter display.
from graph_olap import GraphOLAPClientfrom graph_olap_schemas import WrapperType
client = GraphOLAPClient.from_env()
# Control Plane operationsmapping = client.mappings.get(1)
# Create instance directly from mapping (snapshot managed internally)instance = client.instances.create_from_mapping_and_wait( mapping_id=1, name="Analysis", wrapper_type=WrapperType.RYUGRAPH,)
# Data Plane operations (via connection)conn = client.instances.connect(instance.id)result = conn.query("MATCH (n) RETURN count(n)")2. Resource Hierarchy
Section titled “2. Resource Hierarchy”The platform has three core resources: Mapping, Snapshot (internal), and Instance.
Note: Snapshots are now managed internally. Users create instances directly from mappings.
Mapping (Schema Definition) +-- MappingVersion v1, v2, ... | +-- NodeDefinition: Customer, Product | +-- EdgeDefinition: PURCHASED +-- Instance (Running Graph Database) +-- Snapshot (managed internally) +-- Parquet files in GCSMappings
Section titled “Mappings”A Mapping defines how Starburst SQL queries map to graph nodes and edges.
Key Characteristics:
- Owned by a user (
owner_username) - Contains multiple immutable versions
- Supports lifecycle settings (TTL, inactivity timeout)
Versioning: Each update creates a new MappingVersion. Versions are immutable, ensuring snapshots reference specific versions and changes are tracked.
# Get a mapping and its versionsmapping = client.mappings.get(mapping_id=1)versions = client.mappings.list_versions(mapping_id=1)
# Compare versionsdiff = client.mappings.diff(mapping_id=1, from_version=1, to_version=2)print(f"Nodes added: {diff.summary['nodes_added']}")NodeDefinition
Section titled “NodeDefinition”Specifies how to create graph nodes from SQL:
| Field | Description | Example |
|---|---|---|
label | Node type label | "Customer" |
sql | SQL query to extract data | "SELECT id, name FROM customers" |
primary_key | Unique identifier | {"name": "id", "type": "STRING"} |
properties | Additional properties | [{"name": "name", "type": "STRING"}] |
from graph_olap.models import NodeDefinition, PropertyDefinition
customer_node = NodeDefinition( label="Customer", sql="SELECT customer_id, name, email FROM analytics.customers", primary_key={"name": "customer_id", "type": "STRING"}, properties=[ PropertyDefinition(name="name", type="STRING"), PropertyDefinition(name="email", type="STRING"), ],)EdgeDefinition
Section titled “EdgeDefinition”Specifies how to create relationships between nodes:
| Field | Description | Example |
|---|---|---|
type | Relationship type | "PURCHASED" |
from_node / to_node | Source/target node labels | "Customer", "Product" |
sql | SQL query for relationship data | "SELECT customer_id, product_id FROM orders" |
from_key / to_key | Foreign keys | "customer_id", "product_id" |
properties | Edge properties | [{"name": "quantity", "type": "INT64"}] |
from graph_olap.models import EdgeDefinition
purchased_edge = EdgeDefinition( type="PURCHASED", from_node="Customer", to_node="Product", sql="SELECT customer_id, product_id, quantity FROM orders", from_key="customer_id", to_key="product_id", properties=[PropertyDefinition(name="quantity", type="INT64")],)Supported Property Types
Section titled “Supported Property Types”| Type | Python Equivalent |
|---|---|
STRING | str |
INT64 | int |
DOUBLE | float |
BOOL | bool |
DATE | datetime.date |
TIMESTAMP | datetime.datetime |
Working With Other Users’ Mappings
Section titled “Working With Other Users’ Mappings”The platform has no shared ownership, ACLs, grants, or delegation. Every mapping has exactly one owner, set at creation time and immutable. Collaboration happens through three patterns, not through a “share” feature.
Pattern 1: Read Any Mapping
Section titled “Pattern 1: Read Any Mapping”Every analyst can list and inspect every mapping on the platform, regardless of who created it. This makes the mapping catalogue function as a shared reference.
# List every mapping on the platform, not just your ownmappings = client.mappings.list()for m in mappings.items: print(f"{m.id}: {m.name} (owner: {m.owner_username})")
# Inspect a teammate's mappingcolleague_mapping = client.mappings.get(mapping_id=42)versions = client.mappings.list_versions(mapping_id=42)
# Compare two versions to see how the schema evolveddiff = client.mappings.diff(mapping_id=42, from_version=1, to_version=3)Use this to discover existing mappings before building one from scratch, to audit how a teammate modelled a particular dataset, or to pick a base to fork from.
Pattern 2: Fork by Copy
Section titled “Pattern 2: Fork by Copy”If you want to build on top of a teammate’s mapping — to modify the schema, try a variant, or take ownership of a version you intend to evolve — use client.mappings.copy(...). The copy is a brand-new mapping owned by you, and its version history starts fresh from the current version of the source mapping.
# Take your own copy of a teammate's mappingmy_copy = client.mappings.copy( mapping_id=42, new_name="My Copy of Customer Transactions",)
# Now you own it: update, snapshot, instance, delete — all yoursmy_copy = client.mappings.update( mapping_id=my_copy.id, node_definitions=[...], # your changes edge_definitions=[...], change_description="Added Product node",)The copy does not track the source — there is no “merge back” or “upstream sync.” If the original mapping evolves, your copy will not automatically pick up the changes. If you need to stay in sync, call client.mappings.copy(...) again.
Pattern 3: Ask an Admin for In-Place Edits
Section titled “Pattern 3: Ask an Admin for In-Place Edits”If a teammate’s mapping needs to be changed and you need the change to apply to everyone already using that mapping (for example, a shared analytical model the whole team snapshots from), you cannot edit it yourself — client.mappings.update(...) will raise PermissionDeniedError (HTTP 403). An Admin or Ops user can bypass the ownership check and update any mapping on behalf of the team. This is the only path for in-place edits across users.
What You Cannot Do
Section titled “What You Cannot Do”- Share a mapping with specific teammates. There is no grant/ACL mechanism.
- Transfer ownership. The
owner_usernamecolumn is immutable once set. - Merge two mappings. Each mapping is a standalone aggregate with its own version history.
- Query across multiple mappings. Queries run against instances, not mappings, and each instance is materialised from exactly one mapping version.
Note on instances and snapshots. Cypher queries against a running instance are not ownership-gated at the wrapper layer — any authenticated user can run read-only Cypher against any instance they can reach. Algorithm execution is owner-only. See Authorization — §2.1 Analyst — Data User for the full cross-user access matrix.
See also:
- API — Mappings — POST /mappings/:id/copy — copy endpoint reference.
- Authorization — §3.1 Data Resources — per-endpoint permission matrix.
- Authorization — §4 Ownership — the no-ACL model and rationale.
Snapshots
Section titled “Snapshots”SNAPSHOT FUNCTIONALITY DISABLED
Explicit snapshot APIs have been disabled. Instances are now created directly from mappings without requiring explicit snapshot creation. The snapshot layer operates implicitly when instances are created.
Use
client.instances.create_from_mapping()orclient.instances.create_from_mapping_and_wait()instead of the snapshot methods described below.
A Snapshot is a point-in-time export of data from Starburst based on a specific mapping version. Snapshots are now managed internally when creating instances from mappings.
Status Lifecycle (Internal)
Section titled “Status Lifecycle (Internal)”pending --> creating --> ready \--> failed --> creating (retry)| Status | Description | Can Create Instance? |
|---|---|---|
pending | Waiting for export workers | No |
creating | UNLOAD queries executing | No |
ready | All Parquet files written | Yes |
failed | Export encountered errors | No |
Recommended: Create Instance Directly from Mapping
Section titled “Recommended: Create Instance Directly from Mapping”from graph_olap_schemas import WrapperType
# Recommended approach - creates snapshot automaticallyinstance = client.instances.create_from_mapping_and_wait( mapping_id=1, name="My Analysis", wrapper_type=WrapperType.RYUGRAPH, ttl=24, # hours)
# Instance is ready - snapshot was managed internallyconn = client.instances.connect(instance.id)Instances
Section titled “Instances”An Instance is a running graph database loaded from a snapshot.
Status Lifecycle
Section titled “Status Lifecycle”starting --> running --> stopping --> [deleted] \--> failed| Status | Description | Can Query? |
|---|---|---|
starting | Pod initializing, loading data | No |
running | Ready for queries and algorithms | Yes |
stopping | Graceful shutdown in progress | No |
failed | Error during startup | No |
Creating Instances
Section titled “Creating Instances”from graph_olap_schemas import WrapperType
# Create directly from mapping (recommended) - snapshot managed internallyinstance = client.instances.create_from_mapping_and_wait( mapping_id=mapping.id, name="Analysis Instance", wrapper_type=WrapperType.RYUGRAPH, # or WrapperType.FALKORDB ttl=24, # 24-hour time-to-live inactivity_timeout=2, # 2-hour idle timeout)Locking Mechanism
Section titled “Locking Mechanism”Instances use an exclusive lock for algorithm execution:
- Queries: Always allowed concurrently (read-only)
- Algorithms: Require exclusive lock (one at a time)
lock = conn.get_lock()if lock.locked: print(f"Locked by: {lock.holder_name}, running: {lock.algorithm}")TTL and Lifecycle
Section titled “TTL and Lifecycle”| Setting | Description | Format |
|---|---|---|
ttl | Time-to-live from creation | ISO 8601 (e.g., "PT24H") |
inactivity_timeout | Idle time before cleanup | ISO 8601 (e.g., "PT2H") |
# Update lifecycleinstance = client.instances.set_lifecycle(instance_id, ttl="PT12H")
# Extend TTL from current expiryinstance = client.instances.extend_ttl(instance_id, hours=24)3. Connection Lifecycle
Section titled “3. Connection Lifecycle”The SDK uses a two-phase connection model: client initialization for Control Plane operations and instance connection for Data Plane operations.
Client Initialization
Section titled “Client Initialization”from graph_olap import GraphOLAPClient
# From environment variables (recommended)client = GraphOLAPClient.from_env()
# Or explicit configurationclient = GraphOLAPClient( api_url="https://graph-olap.example.com", api_key="sk-xxx", timeout=30.0, max_retries=3,)Context Manager Pattern (Recommended)
Section titled “Context Manager Pattern (Recommended)”with GraphOLAPClient.from_env() as client: # Create instance directly from mapping (snapshot managed internally) instance = client.instances.create_from_mapping_and_wait( mapping_id=1, name="Analysis", wrapper_type=WrapperType.RYUGRAPH, ) conn = client.instances.connect(instance.id) result = conn.query("MATCH (n) RETURN count(n)") client.instances.terminate(instance.id)# Client closed automaticallyInstance Connection
Section titled “Instance Connection”Once an instance is running, connect to it for queries and algorithms:
# Get connection to running instanceconn = client.instances.connect(instance_id)
# Connection propertiesprint(f"Instance ID: {conn.id}")print(f"Instance Name: {conn.name}")print(f"Snapshot ID: {conn.snapshot_id}")print(f"Status: {conn.current_status}")
# Execute queriesresult = conn.query("MATCH (n:Customer) RETURN n LIMIT 10")
# Close when doneconn.close()Connection Context Manager (recommended):
with client.instances.connect(instance_id) as conn: df = conn.query_df("MATCH (n:Customer) RETURN n.name, n.email") # Connection automatically closed after blockQuick Start Pattern
Section titled “Quick Start Pattern”For rapid prototyping:
# Create snapshot, instance, and connect in one callconn = client.quick_start( mapping_id=1, wrapper_type=WrapperType.RYUGRAPH,)result = conn.query("MATCH (n) RETURN count(n)")# Remember to terminate the instance when done!4. Query Execution
Section titled “4. Query Execution”The SDK provides multiple methods for executing Cypher queries.
Query Methods Overview
Section titled “Query Methods Overview”| Method | Returns | Best For |
|---|---|---|
query() | QueryResult | Flexible access to results |
query_df() | DataFrame | Data analysis workflows |
query_scalar() | Single value | Counts and aggregations |
query_one() | Dict or None | Single record lookup |
query() - Structured Results
Section titled “query() - Structured Results”result = conn.query(""" MATCH (c:Customer)-[:PURCHASED]->(p:Product) RETURN c.name AS customer, p.name AS product LIMIT 100""")
# Access metadataprint(f"Columns: {result.columns}, Rows: {result.row_count}")
# Iterate over rowsfor row in result: print(f"{row['customer']} bought {row['product']}")
# Convert to DataFramedf = result.to_polars() # or result.to_pandas()query_df() - DataFrame Results
Section titled “query_df() - DataFrame Results”# Polars DataFrame (default)df = conn.query_df("MATCH (c:Customer) RETURN c.id, c.name")
# Pandas DataFramedf = conn.query_df("MATCH (c:Customer) RETURN c.*", backend="pandas")query_scalar() - Single Value
Section titled “query_scalar() - Single Value”count = conn.query_scalar("MATCH (n:Customer) RETURN count(n)")avg = conn.query_scalar("MATCH (c:Customer) RETURN avg(c.total_purchases)")query_one() - Single Record
Section titled “query_one() - Single Record”customer = conn.query_one( "MATCH (c:Customer {id: $id}) RETURN c.*", {"id": "C001"})if customer: print(f"Name: {customer['name']}")Parameter Substitution
Section titled “Parameter Substitution”Use parameters to safely inject values:
result = conn.query( """ MATCH (c:Customer) WHERE c.total_purchases > $min_purchases RETURN c.name, c.total_purchases LIMIT $limit """, parameters={"min_purchases": 1000, "limit": 50})Parameter Types:
| Python Type | Cypher Type |
|---|---|
str | String |
int | Integer |
float | Float |
bool | Boolean |
list | List |
dict | Map |
None | Null |
Timeout Configuration
Section titled “Timeout Configuration”# Per-query timeoutresult = conn.query( "MATCH (n)-[*1..5]-(m) RETURN count(*)", timeout=120.0, # 2 minutes)Error Handling
Section titled “Error Handling”from graph_olap.exceptions import QueryError, QueryTimeoutError, ValidationError
try: result = conn.query("MATCH (n) RETURN n.invalid")except QueryTimeoutError: print("Query timed out")except QueryError as e: print(f"Query failed: {e}")Schema Inspection
Section titled “Schema Inspection”schema = conn.get_schema()
for label, props in schema.node_labels.items(): print(f":{label} - {[p['name'] for p in props]}")
for rel_type, info in schema.relationship_types.items(): print(f"[:{rel_type}]")Summary
Section titled “Summary”| Concept | Description |
|---|---|
| Architecture | Control Plane manages resources; Data Plane executes queries |
| Mapping | Schema definition with SQL-to-graph mappings and versioning |
| Snapshot | Point-in-time data export (pending -> creating -> ready) |
| Instance | Running graph database with locking for algorithms |
| Client | GraphOLAPClient for Control Plane operations |
| Connection | InstanceConnection for Data Plane queries |
Next Steps:
- 03-algorithms.manual.md - Running graph algorithms
- 04-advanced-patterns.manual.md - Complex workflows
- 05-troubleshooting.manual.md - Debugging tips