Skip to content

SDK Core Concepts

This guide explains the fundamental concepts for working with the Graph OLAP SDK - the sole user interface for the Graph OLAP Platform. All platform operations are performed through this SDK in Jupyter notebooks. This document covers platform architecture, resource hierarchy, connection lifecycle, and query execution patterns.


The Graph OLAP Platform follows a control plane + data plane architecture where the Control Plane manages resource lifecycle and the Data Plane (Wrapper Pods) handles graph operations.

+------------------+
| Jupyter/SDK |
| (Client) |
+--------+---------+
|
| HTTPS
v
+-------------------------------------------------------------------------+
| Ingress Controller |
| /api/* -> Control Plane | /{instance-id}/* -> Wrapper Pod |
+-------------------------------------------------------------------------+
| |
v v
+------------------------+ +------------------------+
| Control Plane | | Wrapper Pod (N) |
| (FastAPI) | | (FastAPI + Graph) |
| | | |
| - REST API | | - Cypher Queries |
| - Resource Lifecycle | | - Graph Algorithms |
| - Background Jobs | | - NetworkX Support |
+----------+-------------+ +------------------------+
| |
v v
+------------------------+ +------------------------+
| Cloud SQL | | Google Cloud |
| (PostgreSQL) | | Storage (Parquet) |
+------------------------+ +------------------------+
ComponentRoleSDK Interaction
Control PlaneManages mappings, snapshots, and instancesSDK calls /api/* endpoints
Wrapper PodsRuns graph database with dataSDK calls /{id}/* for queries
Export WorkersExports SQL results to ParquetBackground (no SDK interaction)
Cloud SQLStores platform metadataInternal (no SDK interaction)
GCSStores exported Parquet filesInternal (no SDK interaction)

Data Flow:

  1. User creates a Mapping defining SQL-to-graph schema via Control Plane
  2. User creates an Instance from a Mapping; system automatically creates a snapshot internally
  3. Export Workers run UNLOAD queries to GCS (automatic)
  4. Wrapper Pod loads Parquet data when snapshot is ready (automatic)
  5. User connects to the Instance and executes Cypher queries/algorithms

The SDK (graph-olap package) provides a unified client interface that abstracts API complexity, manages authentication, provides type safety with Pydantic models, and enables rich Jupyter display.

from graph_olap import GraphOLAPClient
from graph_olap_schemas import WrapperType
client = GraphOLAPClient.from_env()
# Control Plane operations
mapping = client.mappings.get(1)
# Create instance directly from mapping (snapshot managed internally)
instance = client.instances.create_from_mapping_and_wait(
mapping_id=1,
name="Analysis",
wrapper_type=WrapperType.RYUGRAPH,
)
# Data Plane operations (via connection)
conn = client.instances.connect(instance.id)
result = conn.query("MATCH (n) RETURN count(n)")

The platform has three core resources: Mapping, Snapshot (internal), and Instance.

Note: Snapshots are now managed internally. Users create instances directly from mappings.

Mapping (Schema Definition)
+-- MappingVersion v1, v2, ...
| +-- NodeDefinition: Customer, Product
| +-- EdgeDefinition: PURCHASED
+-- Instance (Running Graph Database)
+-- Snapshot (managed internally)
+-- Parquet files in GCS

A Mapping defines how Starburst SQL queries map to graph nodes and edges.

Key Characteristics:

  • Owned by a user (owner_username)
  • Contains multiple immutable versions
  • Supports lifecycle settings (TTL, inactivity timeout)

Versioning: Each update creates a new MappingVersion. Versions are immutable, ensuring snapshots reference specific versions and changes are tracked.

# Get a mapping and its versions
mapping = client.mappings.get(mapping_id=1)
versions = client.mappings.list_versions(mapping_id=1)
# Compare versions
diff = client.mappings.diff(mapping_id=1, from_version=1, to_version=2)
print(f"Nodes added: {diff.summary['nodes_added']}")

Specifies how to create graph nodes from SQL:

FieldDescriptionExample
labelNode type label"Customer"
sqlSQL query to extract data"SELECT id, name FROM customers"
primary_keyUnique identifier{"name": "id", "type": "STRING"}
propertiesAdditional properties[{"name": "name", "type": "STRING"}]
from graph_olap.models import NodeDefinition, PropertyDefinition
customer_node = NodeDefinition(
label="Customer",
sql="SELECT customer_id, name, email FROM analytics.customers",
primary_key={"name": "customer_id", "type": "STRING"},
properties=[
PropertyDefinition(name="name", type="STRING"),
PropertyDefinition(name="email", type="STRING"),
],
)

Specifies how to create relationships between nodes:

FieldDescriptionExample
typeRelationship type"PURCHASED"
from_node / to_nodeSource/target node labels"Customer", "Product"
sqlSQL query for relationship data"SELECT customer_id, product_id FROM orders"
from_key / to_keyForeign keys"customer_id", "product_id"
propertiesEdge properties[{"name": "quantity", "type": "INT64"}]
from graph_olap.models import EdgeDefinition
purchased_edge = EdgeDefinition(
type="PURCHASED",
from_node="Customer",
to_node="Product",
sql="SELECT customer_id, product_id, quantity FROM orders",
from_key="customer_id",
to_key="product_id",
properties=[PropertyDefinition(name="quantity", type="INT64")],
)
TypePython Equivalent
STRINGstr
INT64int
DOUBLEfloat
BOOLbool
DATEdatetime.date
TIMESTAMPdatetime.datetime

The platform has no shared ownership, ACLs, grants, or delegation. Every mapping has exactly one owner, set at creation time and immutable. Collaboration happens through three patterns, not through a “share” feature.

Every analyst can list and inspect every mapping on the platform, regardless of who created it. This makes the mapping catalogue function as a shared reference.

# List every mapping on the platform, not just your own
mappings = client.mappings.list()
for m in mappings.items:
print(f"{m.id}: {m.name} (owner: {m.owner_username})")
# Inspect a teammate's mapping
colleague_mapping = client.mappings.get(mapping_id=42)
versions = client.mappings.list_versions(mapping_id=42)
# Compare two versions to see how the schema evolved
diff = client.mappings.diff(mapping_id=42, from_version=1, to_version=3)

Use this to discover existing mappings before building one from scratch, to audit how a teammate modelled a particular dataset, or to pick a base to fork from.

If you want to build on top of a teammate’s mapping — to modify the schema, try a variant, or take ownership of a version you intend to evolve — use client.mappings.copy(...). The copy is a brand-new mapping owned by you, and its version history starts fresh from the current version of the source mapping.

# Take your own copy of a teammate's mapping
my_copy = client.mappings.copy(
mapping_id=42,
new_name="My Copy of Customer Transactions",
)
# Now you own it: update, snapshot, instance, delete — all yours
my_copy = client.mappings.update(
mapping_id=my_copy.id,
node_definitions=[...], # your changes
edge_definitions=[...],
change_description="Added Product node",
)

The copy does not track the source — there is no “merge back” or “upstream sync.” If the original mapping evolves, your copy will not automatically pick up the changes. If you need to stay in sync, call client.mappings.copy(...) again.

Pattern 3: Ask an Admin for In-Place Edits
Section titled “Pattern 3: Ask an Admin for In-Place Edits”

If a teammate’s mapping needs to be changed and you need the change to apply to everyone already using that mapping (for example, a shared analytical model the whole team snapshots from), you cannot edit it yourself — client.mappings.update(...) will raise PermissionDeniedError (HTTP 403). An Admin or Ops user can bypass the ownership check and update any mapping on behalf of the team. This is the only path for in-place edits across users.

  • Share a mapping with specific teammates. There is no grant/ACL mechanism.
  • Transfer ownership. The owner_username column is immutable once set.
  • Merge two mappings. Each mapping is a standalone aggregate with its own version history.
  • Query across multiple mappings. Queries run against instances, not mappings, and each instance is materialised from exactly one mapping version.

Note on instances and snapshots. Cypher queries against a running instance are not ownership-gated at the wrapper layer — any authenticated user can run read-only Cypher against any instance they can reach. Algorithm execution is owner-only. See Authorization — §2.1 Analyst — Data User for the full cross-user access matrix.

See also:


SNAPSHOT FUNCTIONALITY DISABLED

Explicit snapshot APIs have been disabled. Instances are now created directly from mappings without requiring explicit snapshot creation. The snapshot layer operates implicitly when instances are created.

Use client.instances.create_from_mapping() or client.instances.create_from_mapping_and_wait() instead of the snapshot methods described below.

A Snapshot is a point-in-time export of data from Starburst based on a specific mapping version. Snapshots are now managed internally when creating instances from mappings.

pending --> creating --> ready
\--> failed --> creating (retry)
StatusDescriptionCan Create Instance?
pendingWaiting for export workersNo
creatingUNLOAD queries executingNo
readyAll Parquet files writtenYes
failedExport encountered errorsNo
Section titled “Recommended: Create Instance Directly from Mapping”
from graph_olap_schemas import WrapperType
# Recommended approach - creates snapshot automatically
instance = client.instances.create_from_mapping_and_wait(
mapping_id=1,
name="My Analysis",
wrapper_type=WrapperType.RYUGRAPH,
ttl=24, # hours
)
# Instance is ready - snapshot was managed internally
conn = client.instances.connect(instance.id)

An Instance is a running graph database loaded from a snapshot.

starting --> running --> stopping --> [deleted]
\--> failed
StatusDescriptionCan Query?
startingPod initializing, loading dataNo
runningReady for queries and algorithmsYes
stoppingGraceful shutdown in progressNo
failedError during startupNo
from graph_olap_schemas import WrapperType
# Create directly from mapping (recommended) - snapshot managed internally
instance = client.instances.create_from_mapping_and_wait(
mapping_id=mapping.id,
name="Analysis Instance",
wrapper_type=WrapperType.RYUGRAPH, # or WrapperType.FALKORDB
ttl=24, # 24-hour time-to-live
inactivity_timeout=2, # 2-hour idle timeout
)

Instances use an exclusive lock for algorithm execution:

  • Queries: Always allowed concurrently (read-only)
  • Algorithms: Require exclusive lock (one at a time)
lock = conn.get_lock()
if lock.locked:
print(f"Locked by: {lock.holder_name}, running: {lock.algorithm}")
SettingDescriptionFormat
ttlTime-to-live from creationISO 8601 (e.g., "PT24H")
inactivity_timeoutIdle time before cleanupISO 8601 (e.g., "PT2H")
# Update lifecycle
instance = client.instances.set_lifecycle(instance_id, ttl="PT12H")
# Extend TTL from current expiry
instance = client.instances.extend_ttl(instance_id, hours=24)

The SDK uses a two-phase connection model: client initialization for Control Plane operations and instance connection for Data Plane operations.

from graph_olap import GraphOLAPClient
# From environment variables (recommended)
client = GraphOLAPClient.from_env()
# Or explicit configuration
client = GraphOLAPClient(
api_url="https://graph-olap.example.com",
api_key="sk-xxx",
timeout=30.0,
max_retries=3,
)
with GraphOLAPClient.from_env() as client:
# Create instance directly from mapping (snapshot managed internally)
instance = client.instances.create_from_mapping_and_wait(
mapping_id=1,
name="Analysis",
wrapper_type=WrapperType.RYUGRAPH,
)
conn = client.instances.connect(instance.id)
result = conn.query("MATCH (n) RETURN count(n)")
client.instances.terminate(instance.id)
# Client closed automatically

Once an instance is running, connect to it for queries and algorithms:

# Get connection to running instance
conn = client.instances.connect(instance_id)
# Connection properties
print(f"Instance ID: {conn.id}")
print(f"Instance Name: {conn.name}")
print(f"Snapshot ID: {conn.snapshot_id}")
print(f"Status: {conn.current_status}")
# Execute queries
result = conn.query("MATCH (n:Customer) RETURN n LIMIT 10")
# Close when done
conn.close()

Connection Context Manager (recommended):

with client.instances.connect(instance_id) as conn:
df = conn.query_df("MATCH (n:Customer) RETURN n.name, n.email")
# Connection automatically closed after block

For rapid prototyping:

# Create snapshot, instance, and connect in one call
conn = client.quick_start(
mapping_id=1,
wrapper_type=WrapperType.RYUGRAPH,
)
result = conn.query("MATCH (n) RETURN count(n)")
# Remember to terminate the instance when done!

The SDK provides multiple methods for executing Cypher queries.

MethodReturnsBest For
query()QueryResultFlexible access to results
query_df()DataFrameData analysis workflows
query_scalar()Single valueCounts and aggregations
query_one()Dict or NoneSingle record lookup
result = conn.query("""
MATCH (c:Customer)-[:PURCHASED]->(p:Product)
RETURN c.name AS customer, p.name AS product
LIMIT 100
""")
# Access metadata
print(f"Columns: {result.columns}, Rows: {result.row_count}")
# Iterate over rows
for row in result:
print(f"{row['customer']} bought {row['product']}")
# Convert to DataFrame
df = result.to_polars() # or result.to_pandas()
# Polars DataFrame (default)
df = conn.query_df("MATCH (c:Customer) RETURN c.id, c.name")
# Pandas DataFrame
df = conn.query_df("MATCH (c:Customer) RETURN c.*", backend="pandas")
count = conn.query_scalar("MATCH (n:Customer) RETURN count(n)")
avg = conn.query_scalar("MATCH (c:Customer) RETURN avg(c.total_purchases)")
customer = conn.query_one(
"MATCH (c:Customer {id: $id}) RETURN c.*",
{"id": "C001"}
)
if customer:
print(f"Name: {customer['name']}")

Use parameters to safely inject values:

result = conn.query(
"""
MATCH (c:Customer)
WHERE c.total_purchases > $min_purchases
RETURN c.name, c.total_purchases
LIMIT $limit
""",
parameters={"min_purchases": 1000, "limit": 50}
)

Parameter Types:

Python TypeCypher Type
strString
intInteger
floatFloat
boolBoolean
listList
dictMap
NoneNull
# Per-query timeout
result = conn.query(
"MATCH (n)-[*1..5]-(m) RETURN count(*)",
timeout=120.0, # 2 minutes
)
from graph_olap.exceptions import QueryError, QueryTimeoutError, ValidationError
try:
result = conn.query("MATCH (n) RETURN n.invalid")
except QueryTimeoutError:
print("Query timed out")
except QueryError as e:
print(f"Query failed: {e}")
schema = conn.get_schema()
for label, props in schema.node_labels.items():
print(f":{label} - {[p['name'] for p in props]}")
for rel_type, info in schema.relationship_types.items():
print(f"[:{rel_type}]")

ConceptDescription
ArchitectureControl Plane manages resources; Data Plane executes queries
MappingSchema definition with SQL-to-graph mappings and versioning
SnapshotPoint-in-time data export (pending -> creating -> ready)
InstanceRunning graph database with locking for algorithms
ClientGraphOLAPClient for Control Plane operations
ConnectionInstanceConnection for Data Plane queries

Next Steps: