Python SDK Architecture

Document Type: SDK Architecture Specification Version: 1.1 Status: Ready for Architectural Review Author: Graph OLAP Platform Team Last Updated: 2026-02-04

Document Structure

This architecture documentation is organized into five focused documents:

Document	Content
Detailed Architecture	Executive Summary + C4 Architecture Viewpoints + Resource Management
This document	Python SDK, Resource Managers, Authentication
Domain & Data Architecture	Domain Model, State Machines, Data Flows
Platform Operations	Technology, Security, Integration, Operations, NFRs
Authorization & Access Control	RBAC Roles, Permission Matrix, Ownership Model, Enforcement

The Graph OLAP Platform is notebook-first by design. All user interactions happen through the Python SDK in Jupyter notebooks—there is no separate web console or GUI.

1. SDK as Sole User Interface

Operation Category	SDK Resource	Key Methods
Mapping Management	`client.mappings`	`create()`, `list()`, `get()`, `update()`, `delete()`, `copy()`
Instance Lifecycle	`client.instances`	`create_and_wait()`, `terminate()`, `update_cpu()`, `list()`
Graph Queries	`conn.query()`	`query()`, `query_df()`, `query_scalar()`, `query_one()`
Graph Algorithms	`conn.algo` / `conn.networkx`	`pagerank()`, `louvain()`, `wcc()`, 500+ NetworkX algorithms
Schema Discovery	`client.schema`	`list_catalogs()`, `list_tables()`, `search_tables()`
Favorites	`client.favorites`	`add()`, `remove()`, `list()`
Operations (Ops)	`client.ops`	`get_cluster_health()`, `get_lifecycle_config()`, `trigger_job()`
Administration	`client.admin`	`bulk_delete()`

Why Notebook-First?

Reproducibility: All operations are code, making workflows reproducible and version-controllable
Automation: Scripts can automate common tasks without GUI interaction
Integration: Seamless integration with data science workflows (pandas, polars, visualization)
Auditability: Every operation is logged with the user who executed it

2. SDK Client Architecture

graph-olap-sdk-architecture

Mermaid Source

---
config:
  layout: elk
---
flowchart TB
    accTitle: Graph OLAP SDK Architecture
    accDescr: Shows SDK components from GraphOLAPClient through resource managers to Control Plane and Wrapper APIs

    classDef user fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#4A148C
    classDef client fill:#E1F5FE,stroke:#0277BD,stroke-width:2px,color:#01579B
    classDef resource fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20
    classDef http fill:#FFF8E1,stroke:#F57F17,stroke-width:2px,color:#E65100
    classDef api fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#0D47A1
    classDef conn fill:#FCE4EC,stroke:#C2185B,stroke-width:2px,color:#880E4F

    Jupyter["Jupyter Notebook<br/>(Analyst)"]:::user

    subgraph SDK["Python SDK (graph-olap-sdk)"]
        Client["GraphOLAPClient<br/>───────────<br/>Main entry point<br/>from_env() / direct init"]:::client

        subgraph Resources["Resource Managers"]
            Mappings["MappingResource<br/>CRUD + versioning"]:::resource
            Instances["InstanceResource<br/>lifecycle + CPU"]:::resource
            Schema["SchemaResource<br/>Starburst metadata"]:::resource
            Ops["OpsResource<br/>cluster config"]:::resource
            Admin["AdminResource<br/>bulk ops"]:::resource
            Health["HealthResource<br/>health checks"]:::resource
        end

        HTTP["HTTPClient<br/>───────────<br/>Retry logic<br/>Auth headers<br/>Error mapping"]:::http

        subgraph Connection["InstanceConnection"]
            Conn["Connection<br/>───────────<br/>Cypher queries<br/>query_df()"]:::conn
            Algo["AlgorithmManager<br/>───────────<br/>Native algorithms<br/>pagerank, louvain"]:::conn
            NX["NetworkXManager<br/>───────────<br/>500+ algorithms<br/>client-side graphs"]:::conn
        end
    end

    CP["Control Plane API<br/>/api/*"]:::api
    Wrapper["Wrapper Pod API<br/>/query, /algorithms"]:::api

    Jupyter --> Client
    Client --> Mappings & Instances & Schema & Ops & Admin & Health
    Mappings & Instances & Schema & Ops & Admin & Health --> HTTP
    HTTP --> CP
    Instances -.->|"connect()"| Conn
    Conn --> Algo & NX
    Conn --> Wrapper
    Algo --> Wrapper

3. API Capabilities Overview

This section provides a scannable reference of SDK capabilities for architects and technical leads. For each resource, method signatures show what operations are available.

3.1 Client Initialization

from graph_olap import GraphOLAPClient

# Production: reads GRAPH_OLAP_* environment variables
client = GraphOLAPClient.from_env()

# Direct configuration
client = GraphOLAPClient(
    api_url="https://graph.example.com",
    api_key="your-api-key",
    timeout=60.0,
)

3.2 Mappings — Define Your Graph

client.mappings — Mappings define what data to load from Starburst into a graph.

Method	Parameters	Returns	Description
`list`	`*, owner, search, sort_by, offset, limit`	`PaginatedList[Mapping]`	List mappings with filters
`get`	`mapping_id`	`Mapping`	Get mapping by ID
`create`	`name, description, node_definitions, edge_definitions`	`Mapping`	Create new mapping
`update`	`mapping_id, change_description, *, name, description, node_definitions, edge_definitions`	`Mapping`	Update mapping (creates new version)
`delete`	`mapping_id`	`None`	Delete mapping
`copy`	`mapping_id, new_name`	`Mapping`	Copy mapping with new name
`get_version`	`mapping_id, version`	`MappingVersion`	Get specific version
`list_versions`	`mapping_id`	`list[MappingVersion]`	List all versions
`diff`	`mapping_id, from_version, to_version`	`MappingDiff`	Compare two versions
`list_snapshots`	`mapping_id, *, offset, limit`	`PaginatedList[Snapshot]`	List snapshots for mapping
`list_instances`	`mapping_id, *, offset, limit`	`PaginatedList[Instance]`	List instances using mapping
`set_lifecycle`	`mapping_id, *, ttl, inactivity_timeout`	`Mapping`	Set auto-cleanup policy
`get_tree`	`mapping_id, *, include_instances, status`	`dict`	Get hierarchy (mapping → versions → snapshots → instances)

3.3 Instances — Run Your Graph

client.instances — Manage running graph instances (lifecycle, scaling, connectivity).

Method	Parameters	Returns	Description
`list`	`*, snapshot_id, owner, status, search, sort_by, offset, limit`	`PaginatedList[Instance]`	List instances with filters
`get`	`instance_id`	`Instance`	Get instance by ID
`create`	`mapping_id, name, wrapper_type, *, mapping_version, description, ttl, inactivity_timeout, cpu_cores`	`Instance`	Create instance (async)
`create_and_wait`	`mapping_id, name, wrapper_type, *, timeout, poll_interval, on_progress, ...`	`Instance`	Create and wait until running
`update`	`instance_id, *, name, description`	`Instance`	Update instance metadata
`terminate`	`instance_id`	`None`	Terminate and delete instance
`update_cpu`	`instance_id, cpu_cores`	`Instance`	Scale CPU (1-8 cores)
`update_memory`	`instance_id, memory_gb`	`Instance`	Upgrade memory (2-32 GB)
`extend_ttl`	`instance_id, hours=24`	`Instance`	Extend TTL from current expiry
`set_lifecycle`	`instance_id, *, ttl, inactivity_timeout`	`Instance`	Set lifecycle parameters
`get_progress`	`instance_id`	`InstanceProgress`	Get startup progress details
`get_health`	`instance_id, *, timeout`	`dict`	Get wrapper health status
`check_health`	`instance_id, *, timeout`	`bool`	Check if wrapper is healthy
`wait_until_running`	`instance_id, *, timeout, poll_interval`	`Instance`	Wait for running status
`connect`	`instance_id`	`InstanceConnection`	Get connection for queries

3.4 Schema Discovery

client.schema — Browse Starburst metadata (cached, refreshed every 24h).

Method	Parameters	Returns	Description
`list_catalogs`	—	`list[Catalog]`	List all Starburst catalogs
`list_schemas`	`catalog`	`list[Schema]`	List schemas in a catalog
`list_tables`	`catalog, schema`	`list[Table]`	List tables in a schema
`list_columns`	`catalog, schema, table`	`list[Column]`	Get columns for a table
`search_tables`	`pattern, limit=100`	`list[Table]`	Search tables by name pattern
`search_columns`	`pattern, limit=100`	`list[Column]`	Search columns by name pattern
`admin_refresh`	—	`dict`	Trigger cache refresh (admin)
`get_stats`	—	`CacheStats`	Get cache statistics (admin)

3.5 Operations & Configuration

client.ops — Cluster configuration, jobs, and metrics. Requires Ops role.

Method	Parameters	Returns	Description
`get_lifecycle_config`	—	`LifecycleConfig`	Get TTL defaults for all resource types
`update_lifecycle_config`	`*, mapping, snapshot, instance`	`bool`	Update lifecycle defaults
`get_concurrency_config`	—	`ConcurrencyConfig`	Get instance limits
`update_concurrency_config`	`*, per_analyst, cluster_total`	`ConcurrencyConfig`	Update instance limits
`get_maintenance_mode`	—	`MaintenanceMode`	Get maintenance status
`set_maintenance_mode`	`enabled, message=""`	`MaintenanceMode`	Enable/disable maintenance
`get_export_config`	—	`ExportConfig`	Get export settings
`update_export_config`	`*, max_duration_seconds`	`ExportConfig`	Update export timeout
`get_cluster_health`	—	`ClusterHealth`	Check cluster health
`get_cluster_instances`	—	`ClusterInstances`	Get cluster-wide instance summary
`get_metrics`	—	`str`	Get Prometheus metrics
`trigger_job`	`job_name, reason="manual-trigger"`	`dict`	Trigger background job
`get_job_status`	—	`dict`	Get all job statuses
`get_state`	—	`dict`	Get system state summary
`get_export_jobs`	`status=None, limit=100`	`list[dict]`	Get export jobs for debugging

3.6 Utilities (Favorites, Admin, Health)

client.favorites — User bookmarks for quick access.

Method	Parameters	Returns	Description
`list`	`resource_type=None`	`list[Favorite]`	List favorites
`add`	`resource_type, resource_id`	`Favorite`	Add to favorites
`remove`	`resource_type, resource_id`	`None`	Remove from favorites

client.admin — Privileged operations. Requires Admin role.

Method	Parameters	Returns	Description
`bulk_delete`	`resource_type, filters, reason, expected_count=None, dry_run=False`	`dict`	Bulk delete with safety checks

client.health — Health checks (no authentication required).

Method	Parameters	Returns	Description
`check`	—	`HealthStatus`	Basic health check
`ready`	—	`HealthStatus`	Readiness check with DB connectivity

3.7 Querying & Algorithms

conn = client.instances.connect(instance_id) — Query interface to a running instance.

Cypher Queries

Method	Parameters	Returns	Description
`query`	`cypher, parameters=None, *, timeout, coerce_types`	`QueryResult`	Execute Cypher query
`query_df`	`cypher, parameters=None, *, backend="polars"`	`DataFrame`	Query returning DataFrame
`query_scalar`	`cypher, parameters=None`	`Any`	Query returning single value
`query_one`	`cypher, parameters=None`	`dict \| None`	Query returning single row
`get_schema`	—	`Schema`	Get graph schema (labels, types, properties)
`get_lock`	—	`LockStatus`	Get current lock status
`status`	—	`dict`	Get instance status and resource usage

Native Algorithms — conn.algo

Method	Parameters	Returns	Description
`algorithms`	`category=None`	`list[dict]`	List available algorithms
`algorithm_info`	`algorithm`	`dict`	Get algorithm parameters
`run`	`algorithm, node_label, property_name, edge_type, *, params, timeout, wait`	`AlgorithmExecution`	Run any native algorithm
`pagerank`	`node_label, property_name, edge_type, *, damping, max_iterations, tolerance`	`AlgorithmExecution`	PageRank centrality
`louvain`	`node_label, property_name, *, edge_type, resolution`	`AlgorithmExecution`	Louvain community detection
`connected_components`	`node_label, property_name, edge_type`	`AlgorithmExecution`	Weakly connected components
`scc`	`node_label, property_name, *, edge_type`	`AlgorithmExecution`	Strongly connected components
`kcore`	`node_label, property_name, *, edge_type`	`AlgorithmExecution`	K-Core decomposition
`label_propagation`	`node_label, property_name, edge_type, *, max_iterations`	`AlgorithmExecution`	Label propagation
`triangle_count`	`node_label, property_name, edge_type`	`AlgorithmExecution`	Triangle count per node
`shortest_path`	`source_id, target_id, *, relationship_types, max_depth`	`AlgorithmExecution`	Find shortest path

NetworkX Algorithms — conn.networkx (500+ algorithms)

Method	Parameters	Returns	Description
`algorithms`	`category=None`	`list[dict]`	List available algorithms
`algorithm_info`	`algorithm`	`dict`	Get algorithm parameters
`run`	`algorithm, node_label, property_name, *, params, timeout, wait`	`AlgorithmExecution`	Run any NetworkX algorithm
`degree_centrality`	`node_label, property_name`	`AlgorithmExecution`	Degree centrality
`betweenness_centrality`	`node_label, property_name, *, k`	`AlgorithmExecution`	Betweenness centrality
`closeness_centrality`	`node_label, property_name`	`AlgorithmExecution`	Closeness centrality
`eigenvector_centrality`	`node_label, property_name, *, max_iter`	`AlgorithmExecution`	Eigenvector centrality
`clustering_coefficient`	`node_label, property_name`	`AlgorithmExecution`	Clustering coefficient

3.8 Query Results

QueryResult — Flexible output from Cypher queries.

Method	Parameters	Returns	Description
`to_polars`	—	`polars.DataFrame`	Convert to Polars DataFrame
`to_pandas`	—	`pandas.DataFrame`	Convert to Pandas DataFrame
`to_networkx`	—	`networkx.DiGraph`	Convert to NetworkX graph
`to_dicts`	—	`list[dict]`	Convert to list of dicts
`scalar`	—	`Any`	Get single scalar value
`to_csv`	`path`	`None`	Export to CSV file
`to_parquet`	`path`	`None`	Export to Parquet file
`show`	`max_rows=20`	—	Display in Jupyter (auto-visualization)

Iteration: for row in result: yields dict[str, Any] for each row.

4. Typical User Workflow

The SDK enables a complete analytical workflow from data discovery to graph analysis:

sdk-workflow-sequence

Mermaid Source

sequenceDiagram
    accTitle: SDK Workflow Sequence
    accDescr: Shows typical user workflow from schema discovery through instance creation to algorithm execution

    participant Analyst as Analyst<br/>(Jupyter)
    participant SDK as Python SDK
    participant CP as Control Plane
    participant Worker as Export Worker
    participant Starburst as Starburst
    participant GCS as GCS
    participant Wrapper as Graph Instance

    Note over Analyst,Wrapper: 1. Schema Discovery
    Analyst->>SDK: client.schema.search_tables("customer")
    SDK->>CP: GET /api/schema/search/tables?q=customer
    CP-->>SDK: Matching tables
    SDK-->>Analyst: Table list with columns

    Note over Analyst,Wrapper: 2. Create Mapping
    Analyst->>SDK: client.mappings.create(name, nodes, edges)
    SDK->>CP: POST /api/mappings
    CP-->>SDK: Mapping created
    SDK-->>Analyst: Mapping object

    Note over Analyst,Wrapper: 3. Create Instance (triggers export)
    Analyst->>SDK: client.instances.create_and_wait(mapping_id)
    SDK->>CP: POST /api/instances {mapping_id}
    CP->>CP: Create snapshot + export job
    Worker->>CP: Claim export job
    Worker->>Starburst: UNLOAD query
    Starburst->>GCS: Write Parquet files
    Worker->>CP: Mark complete
    CP->>Wrapper: Create Pod
    Wrapper->>GCS: COPY FROM Parquet
    Wrapper-->>CP: Ready
    CP-->>SDK: Instance ready
    SDK-->>Analyst: Instance object

    Note over Analyst,Wrapper: 4. Connect and Query
    Analyst->>SDK: conn = client.instances.connect(instance.id)
    Analyst->>SDK: conn.query_df("MATCH (n) RETURN n")
    SDK->>Wrapper: POST /query {cypher}
    Wrapper-->>SDK: Query results
    SDK-->>Analyst: pandas DataFrame

    Note over Analyst,Wrapper: 5. Run Algorithms
    Analyst->>SDK: conn.algo.pagerank("Customer", "score")
    SDK->>Wrapper: POST /algorithms/pagerank
    Wrapper->>Wrapper: Execute algorithm
    Wrapper-->>SDK: Execution complete
    SDK-->>Analyst: AlgorithmResult

5. Package Structure

graph-olap-sdk/
├── src/graph_olap/
│   ├── client.py            # GraphOLAPClient main entry point
│   ├── config.py            # Configuration and authentication
│   ├── notebook.py          # Jupyter integration (connect(), init())
│   ├── resources/
│   │   ├── mappings.py      # MappingResource
│   │   ├── instances.py     # InstanceResource
│   │   ├── schema.py        # SchemaResource (Starburst metadata)
│   │   ├── ops.py           # OpsResource (config, cluster, jobs)
│   │   ├── favorites.py     # FavoriteResource
│   │   ├── admin.py         # AdminResource (bulk delete)
│   │   └── health.py        # HealthResource
│   ├── instance/
│   │   ├── connection.py    # InstanceConnection class
│   │   └── algorithms.py    # Algorithm execution
│   ├── models/              # Pydantic models
│   ├── exceptions.py        # Exception hierarchy
│   └── http.py              # HTTP client wrapper
└── examples/
    ├── basic_workflow.ipynb
    ├── algorithms.ipynb
    └── visualization.ipynb

6. Key Design Decisions

Decision	Choice	Rationale
HTTP Client	httpx	Modern async support, connection pooling, HTTP/2
Models	Pydantic	Type safety, validation, JSON serialization
DataFrame Support	pandas + polars	Industry standard, analyst familiarity
API Style	Synchronous default	Notebook-friendly, with async support available
Error Handling	Typed exceptions	Clear, actionable error messages

7. Authentication Flow

Per ADR-104 and ADR-105, identity is carried end-to-end by a single canonical header, X-Username. Bearer tokens and internal API keys have been removed from the SDK and from the control-plane auth middleware.

Canonical identity header

Header	Status	Read by
`X-Username`	Canonical (ADR-105)	Control-plane middleware (`packages/control-plane/src/control_plane/middleware/identity.py`); wrapper dependencies (`packages/{ryugraph,falkordb}-wrapper/src/wrapper/dependencies.py`).
`X-User-ID`	Deprecated alias	Accepted only by the wrapper `get_user_id` dependency as a fallback when `X-Username` is absent, for backward compatibility with legacy callers. The control-plane does NOT accept this alias.
`X-User-Name`	Deprecated alias	Accepted only by the wrapper `get_user_name` dependency as a fallback when `X-Username` is absent, for backward compatibility with legacy callers. The control-plane does NOT accept this alias.

The SDK always sends X-Username (see packages/graph-olap-sdk/src/graph_olap/http.py:78 and instance/connection.py:82). New callers MUST send X-Username; the aliases above exist solely so that wrappers do not break during rolling upgrades from older SDK/tool versions.

Additional headers

Header	Purpose
`X-Use-Case-Id`	Starburst use-case identifier passed through the middleware (ADR-102)
`X-User-Role`	NOT sent by the SDK. Wrappers optionally read it if injected by an upstream component; absent → treated as `analyst` (see ADR-105 §F8).

Role resolution

Role is not carried in a header. The control plane resolves the authenticated user’s role from the users.role column after matching X-Username. Role hierarchy: Analyst < Admin < Ops. Each higher role inherits the permissions of the lower roles; Ops has exclusive access to config, cluster, and jobs endpoints.

See Authorization & Access Control for the complete permission matrix.

# Environment-based configuration (reads GRAPH_OLAP_API_URL, GRAPH_OLAP_USERNAME)
client = GraphOLAPClient.from_env()

# Explicit username (overrides GRAPH_OLAP_USERNAME and identity.DEFAULT_USERNAME)
client = GraphOLAPClient(
    api_url="https://graph.example.com",
    username="[email protected]",
    timeout=60.0,
)

# Notebook persona switching (see ADR-105 §F3)
import graph_olap.identity
graph_olap.identity.DEFAULT_USERNAME = "[email protected]"
bob_client = GraphOLAPClient(api_url="https://graph.example.com")

Detailed Architecture - Executive Summary + C4 Architecture Viewpoints + Resource Management
Domain & Data Architecture - Domain Model, State Machines, Data Flows
Platform Operations - Technology, Security, Integration, Operations, NFRs
Authorization & Access Control - RBAC Roles, Permission Matrix, Ownership Model, Enforcement

This is part of the Graph OLAP Platform architecture documentation. See also: Detailed Architecture, Domain & Data Architecture, Platform Operations, Authorization.

Python SDK Architecture

Python SDK Architecture

Document Structure

1. SDK as Sole User Interface

2. SDK Client Architecture

3. API Capabilities Overview

3.1 Client Initialization

3.2 Mappings — Define Your Graph

3.3 Instances — Run Your Graph

3.4 Schema Discovery

3.5 Operations & Configuration

3.6 Utilities (Favorites, Admin, Health)

3.7 Querying & Algorithms

3.8 Query Results

4. Typical User Workflow

5. Package Structure

6. Key Design Decisions

7. Authentication Flow

Canonical identity header

Additional headers

Role resolution

Related Documents