Skip to content

Getting Started with the Graph OLAP SDK

This guide helps you install and configure the Graph OLAP Python SDK for Jupyter notebook-based graph analytics.

Graph OLAP Platform is an enterprise analytics solution that enables you to:

  • Transform relational data into graphs: Define mappings from your Starburst data warehouse tables to graph structures with nodes and relationships.
  • Create point-in-time snapshots: Export data from Starburst into optimized graph format for analysis.
  • Run graph instances: Spin up in-memory graph databases loaded with your snapshot data.
  • Query and analyze: Execute Cypher queries and run graph algorithms (PageRank, community detection, shortest paths, and more).

The Graph OLAP SDK (graph-olap-sdk) is the sole user interface for the Graph OLAP Platform. All platform operations - from creating mappings to running graph algorithms - are performed through this Python SDK in Jupyter notebooks. There is no separate web console or GUI; the SDK is the complete interface for analysts.

The SDK provides:

CapabilityDescription
Mapping ManagementFull CRUD operations: create, read, update, delete, copy, and list mappings
Instance LifecycleCreate instances from mappings, terminate, update CPU, monitor status
Graph QueriesExecute Cypher queries with DataFrame results (Polars/Pandas)
Graph AlgorithmsRun native algorithms (PageRank, Louvain) and 500+ NetworkX algorithms
Schema DiscoveryBrowse Starburst catalogs, schemas, tables, and columns to design mappings
VisualizationInteractive graph visualization with PyVis and Plotly
Admin OperationsBulk delete, cluster health monitoring, configuration (role-based)
Zero-Config SetupAuto-discovery from environment variables in JupyterHub

Note: The SDK is notebook-first by design. This ensures all operations are reproducible, version-controllable, and seamlessly integrated with data science workflows.

Before diving into the SDK, understand these core concepts:

┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Mapping │────▶│ Instance │────▶│ Query │
│ (Schema) │ │ (Runtime) │ │ (Analysis) │
└─────────────┘ └─────────────┘ └─────────────┘
  • Mapping: Defines which tables become nodes and which become relationships.
  • Instance: A running graph database created from a mapping (snapshot managed internally).
  • Connection: A handle to an instance for executing Cypher queries and algorithms.

Note: Snapshots are managed internally when creating instances. Use client.instances.create_from_mapping() to create instances directly from mappings.


The SDK offers tiered installation options to match your needs:

OptionDependenciesCommandUse Case
Minimalhttpx, pydantic, tenacitypip install graph-olap-sdkAPI operations only
Analyst+ polars, pandas, pyvis, plotly, networkx, itablespip install graph-olap-sdk[all]Full notebook analytics
Interactive+ ipywidgetspip install graph-olap-sdk[all,interactive]Widget-based UI

For environments where you only need API operations without DataFrame or visualization support:

Terminal window
pip install graph-olap-sdk

Core dependencies:

  • httpx>=0.28.1 - HTTP client with connection pooling
  • pydantic>=2.12.5 - Data validation and serialization
  • tenacity>=9.1.2 - Retry logic for transient failures
  • graph-olap-schemas>=1.0.0 - Shared type definitions

For full notebook analytics with DataFrames and visualization:

Terminal window
pip install graph-olap-sdk[all]

Additional dependencies:

  • polars>=1.36.1 - Fast DataFrame library (default)
  • pandas>=2.3.3 - Traditional DataFrame support
  • networkx>=3.6.1 - Graph analysis library
  • pyvis>=0.3.2 - Interactive graph visualization
  • plotly>=6.5.0 - Interactive charts and graphs
  • itables>=2.6.2 - Interactive DataFrames in notebooks

After installation, verify the SDK is working:

import graph_olap
print(f"Graph OLAP SDK version: {graph_olap.__version__}")

This section demonstrates the most common workflow: connecting to the platform, creating resources, and running queries.

If you’re in a JupyterHub environment with pre-configured environment variables:

from graph_olap.notebook_setup import setup
# Auto-discovers configuration from environment
ctx = setup()
client = ctx.client
# List available mappings
mappings = client.mappings.list()
for mapping in mappings.items:
print(f"- {mapping.name} (ID: {mapping.id})")

For explicit control over connection settings:

from graph_olap import GraphOLAPClient
# Create client with explicit configuration
client = GraphOLAPClient(
username="[email protected]",
api_url="https://graph-olap.example.com",
use_case_id="fraud_analytics", # Optional; sent as X-Use-Case-Id (ADR-102)
)
# Or load from environment with overrides
client = GraphOLAPClient.from_env(
timeout=60.0, # Override default 30s timeout
max_retries=5, # Override default 3 retries
)

Here’s a complete workflow from mapping creation to analysis:

from graph_olap import GraphOLAPClient
from graph_olap_schemas import WrapperType, NodeDefinition, EdgeDefinition
# 1. Connect to the platform
client = GraphOLAPClient.from_env()
# 2. Create a mapping (defines which tables become nodes and edges)
mapping = client.mappings.create(
name="Customer Analysis",
description="Customer purchase relationships",
node_definitions=[
NodeDefinition(
label="Customer",
sql="SELECT id, name, region FROM customers",
primary_key={"name": "id", "type": "STRING"},
properties=[
{"name": "name", "type": "STRING"},
{"name": "region", "type": "STRING"},
],
),
NodeDefinition(
label="Product",
sql="SELECT id, name, category FROM products",
primary_key={"name": "id", "type": "STRING"},
properties=[
{"name": "name", "type": "STRING"},
{"name": "category", "type": "STRING"},
],
),
],
edge_definitions=[
EdgeDefinition(
type="PURCHASED",
from_node="Customer",
to_node="Product",
sql="SELECT customer_id, product_id, amount FROM orders",
from_key="customer_id",
to_key="product_id",
properties=[{"name": "amount", "type": "FLOAT"}],
),
],
)
print(f"Created mapping: {mapping.name} (ID: {mapping.id})")
# Alternatively, use an existing mapping:
# mappings = client.mappings.list(search="customer")
# mapping = mappings.items[0]
# 3. Create an instance directly from mapping (snapshot managed internally)
instance = client.instances.create_from_mapping_and_wait(
mapping_id=mapping.id,
name="Customer Analysis Instance",
wrapper_type=WrapperType.FALKORDB, # or WrapperType.RYUGRAPH
ttl=24, # Auto-terminate after 24 hours
timeout=600, # Wait up to 10 minutes for export + startup
)
print(f"Instance running: {instance.id}")
# 4. Connect and query
conn = client.instances.connect(instance.id)
# Simple query
result = conn.query("MATCH (n:Customer) RETURN n.name, n.id LIMIT 10")
print(result.to_polars())
# 5. Run algorithms
conn.algo.pagerank(
node_label="Customer",
property_name="influence_score",
damping=0.85,
)
# Query the results
top_customers = conn.query_df(
"MATCH (c:Customer) "
"RETURN c.name, c.influence_score "
"ORDER BY c.influence_score DESC "
"LIMIT 10"
)
print(top_customers)
# 6. Clean up when done
client.instances.terminate(instance.id)
client.close()

For rapid prototyping, use the quick_start method:

from graph_olap import GraphOLAPClient
from graph_olap_schemas import WrapperType
client = GraphOLAPClient.from_env()
# Creates instance from mapping in one call (snapshot managed internally)
conn = client.quick_start(
mapping_id=1,
wrapper_type=WrapperType.FALKORDB,
instance_name="Quick Instance",
)
# Ready to query immediately
count = conn.query_scalar("MATCH (n) RETURN count(n)")
print(f"Total nodes: {count}")
# Remember to terminate when done!
client.instances.terminate(conn.instance_id)

The SDK supports multiple authentication modes for different environments.

The recommended approach is to use environment variables:

VariableDescriptionRequired
GRAPH_OLAP_API_URLBase URL for the control plane APIYes
GRAPH_OLAP_INTERNAL_API_KEYInternal API key (X-Internal-Api-Key header)Internal services
GRAPH_OLAP_USERNAMEUsername sent as X-Username header (identity per ADR-104)Yes
GRAPH_OLAP_USE_CASE_IDUse-case identifier sent as X-Use-Case-Id (ADR-102). Defaults to e2e_test_roleNo

Example .env file:

Terminal window
GRAPH_OLAP_API_URL=https://graph-olap.example.com
GRAPH_OLAP_USERNAME=[email protected]

The SDK uses DB-backed user identity per ADR-104. The server looks up the user’s role column from the users table based on the X-Username header — no JWT or Bearer token parsing is performed at the SDK layer.

Identity is established via the X-Username header on every request:

client = GraphOLAPClient(
api_url="https://graph-olap.example.com",
username="[email protected]",
)
# Sends: X-Username: [email protected]

For internal services communicating within the platform:

client = GraphOLAPClient(
api_url="https://graph-olap.internal",
internal_api_key="internal-service-key",
)
# Sends: X-Internal-Api-Key: internal-service-key

Authentication Priority:

When multiple credentials are provided:

  1. internal_api_key takes precedence
  2. username (X-Username) is always sent if provided

Every request carries a use-case identifier in the X-Use-Case-Id header (ADR-102). The control plane records it alongside audit events so that usage can be attributed to a specific enterprise business use case.

Set it in one of three ways, in priority order:

  1. Pass it to the constructor: GraphOLAPClient(username=..., use_case_id="fraud_analytics").
  2. Export the GRAPH_OLAP_USE_CASE_ID environment variable before launching the notebook.
  3. Rely on the built-in default (e2e_test_role), which is only appropriate for test environments.

Production notebooks should always set an explicit use-case ID that matches the approved use case for the analyst’s role. Check with your platform operator for the correct value.

In JupyterHub environments, the SDK auto-discovers configuration:

from graph_olap.notebook_setup import setup
# Environment variables are pre-configured by JupyterHub
ctx = setup()
client = ctx.client

The JupyterHub deployment automatically injects:

  • GRAPH_OLAP_API_URL pointing to the control plane
  • JUPYTERHUB_USER with the authenticated user’s identity

Important: In production environments with authentication gateways (IAP/OIDC), the gateway strips user-supplied X-Username headers and injects the validated identity from the authenticated session. The username parameter is only effective in local development and E2E testing where no gateway is present.


Once connected to an instance, you have several query methods available:

MethodReturnsUse Case
query()QueryResultFull result with multiple format options
query_df()DataFrameDirect DataFrame (Polars or Pandas)
query_scalar()Single valueCOUNT, SUM, or single-value queries
query_one()Dict or NoneSingle row as dictionary
conn = client.instances.connect(instance_id)
# Full QueryResult with format options
result = conn.query("MATCH (n:Customer) RETURN n.name, n.age LIMIT 100")
polars_df = result.to_polars()
pandas_df = result.to_pandas()
# Direct DataFrame
df = conn.query_df(
"MATCH (n)-[r]->(m) RETURN n.name, type(r), m.name",
backend="polars", # or "pandas"
)
# Single scalar value
count = conn.query_scalar("MATCH (n:Customer) RETURN count(n)")
# Single row as dict
customer = conn.query_one(
"MATCH (c:Customer {id: $id}) RETURN c.name, c.email",
parameters={"id": "C001"}
)

Always use parameters for dynamic values to prevent injection:

# Good: Using parameters
result = conn.query(
"MATCH (c:Customer) WHERE c.region = $region RETURN c",
parameters={"region": "APAC"}
)
# Bad: String interpolation (vulnerable to injection)
region = "APAC"
result = conn.query(f"MATCH (c:Customer) WHERE c.region = '{region}' RETURN c")

The SDK raises specific exceptions for different error conditions:

from graph_olap.exceptions import (
NotFoundError,
ValidationError,
InvalidStateError,
TimeoutError,
AuthenticationError,
)
try:
instance = client.instances.get(999)
except NotFoundError:
print("Instance not found")
except AuthenticationError:
print("Invalid API key")
except ValidationError as e:
print(f"Invalid request: {e}")
ExceptionHTTP StatusDescription
NotFoundError404Resource doesn’t exist
ValidationError400/422Invalid request parameters
AuthenticationError401Invalid or missing credentials
PermissionDeniedError403Insufficient permissions
InvalidStateError409Resource in wrong state for operation
TimeoutError-Operation exceeded timeout
InstanceFailedError-Instance startup failed
SnapshotFailedError-Snapshot creation failed

Now that you have the SDK installed and configured, explore these topics:

TopicDocumentDescription
Core Concepts02-core-concepts.manual.mdResource hierarchy and workflows
API Reference03-api-reference.manual.mdFull SDK API documentation
Advanced Topics05-advanced-topics.manual.mdConnection management, error handling
Examples06-examples.manual.mdComplete workflow examples

If you encounter issues:

  1. Check the error message for specific guidance
  2. Review environment variable configuration
  3. Verify network connectivity to the API URL
  4. Check instance status before querying

Complete list of environment variables recognized by the SDK:

VariableDefaultDescription
GRAPH_OLAP_API_URLrequiredControl plane API base URL
GRAPH_OLAP_USERNAMErequiredUsername sent as X-Username header (ADR-104); role resolved from DB
GRAPH_OLAP_INTERNAL_API_KEYNoneInternal service API key
GRAPH_OLAP_USE_CASE_IDe2e_test_roleUse-case identifier sent as X-Use-Case-Id header (ADR-102)
GRAPH_OLAP_IN_CLUSTER_MODEfalseUse in-cluster DNS for wrapper connections
GRAPH_OLAP_NAMESPACEe2e-testKubernetes namespace for in-cluster mode
GRAPH_OLAP_SKIP_HEALTH_CHECKfalseSkip wrapper health checks (port-forward mode)

Document version: 1.0.0 | Last updated: 2025-01