Python SDK Architecture
Python SDK Architecture
Section titled “Python SDK Architecture”Document Type: SDK Architecture Specification Version: 1.1 Status: Ready for Architectural Review Author: Graph OLAP Platform Team Last Updated: 2026-02-04
Document Structure
Section titled “Document Structure”This architecture documentation is organized into five focused documents:
| Document | Content |
|---|---|
| Detailed Architecture | Executive Summary + C4 Architecture Viewpoints + Resource Management |
| This document | Python SDK, Resource Managers, Authentication |
| Domain & Data Architecture | Domain Model, State Machines, Data Flows |
| Platform Operations | Technology, Security, Integration, Operations, NFRs |
| Authorization & Access Control | RBAC Roles, Permission Matrix, Ownership Model, Enforcement |
The Graph OLAP Platform is notebook-first by design. All user interactions happen through the Python SDK in Jupyter notebooks—there is no separate web console or GUI.
1. SDK as Sole User Interface
Section titled “1. SDK as Sole User Interface”| Operation Category | SDK Resource | Key Methods |
|---|---|---|
| Mapping Management | client.mappings | create(), list(), get(), update(), delete(), copy() |
| Instance Lifecycle | client.instances | create_and_wait(), terminate(), update_cpu(), list() |
| Graph Queries | conn.query() | query(), query_df(), query_scalar(), query_one() |
| Graph Algorithms | conn.algo / conn.networkx | pagerank(), louvain(), wcc(), 500+ NetworkX algorithms |
| Schema Discovery | client.schema | list_catalogs(), list_tables(), search_tables() |
| Favorites | client.favorites | add(), remove(), list() |
| Operations (Ops) | client.ops | get_cluster_health(), get_lifecycle_config(), trigger_job() |
| Administration | client.admin | bulk_delete() |
Why Notebook-First?
- Reproducibility: All operations are code, making workflows reproducible and version-controllable
- Automation: Scripts can automate common tasks without GUI interaction
- Integration: Seamless integration with data science workflows (pandas, polars, visualization)
- Auditability: Every operation is logged with the user who executed it
2. SDK Client Architecture
Section titled “2. SDK Client Architecture”Mermaid Source
---config: layout: elk---flowchart TB accTitle: Graph OLAP SDK Architecture accDescr: Shows SDK components from GraphOLAPClient through resource managers to Control Plane and Wrapper APIs
classDef user fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#4A148C classDef client fill:#E1F5FE,stroke:#0277BD,stroke-width:2px,color:#01579B classDef resource fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20 classDef http fill:#FFF8E1,stroke:#F57F17,stroke-width:2px,color:#E65100 classDef api fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#0D47A1 classDef conn fill:#FCE4EC,stroke:#C2185B,stroke-width:2px,color:#880E4F
Jupyter["Jupyter Notebook<br/>(Analyst)"]:::user
subgraph SDK["Python SDK (graph-olap-sdk)"] Client["GraphOLAPClient<br/>───────────<br/>Main entry point<br/>from_env() / direct init"]:::client
subgraph Resources["Resource Managers"] Mappings["MappingResource<br/>CRUD + versioning"]:::resource Instances["InstanceResource<br/>lifecycle + CPU"]:::resource Schema["SchemaResource<br/>Starburst metadata"]:::resource Ops["OpsResource<br/>cluster config"]:::resource Admin["AdminResource<br/>bulk ops"]:::resource Health["HealthResource<br/>health checks"]:::resource end
HTTP["HTTPClient<br/>───────────<br/>Retry logic<br/>Auth headers<br/>Error mapping"]:::http
subgraph Connection["InstanceConnection"] Conn["Connection<br/>───────────<br/>Cypher queries<br/>query_df()"]:::conn Algo["AlgorithmManager<br/>───────────<br/>Native algorithms<br/>pagerank, louvain"]:::conn NX["NetworkXManager<br/>───────────<br/>500+ algorithms<br/>client-side graphs"]:::conn end end
CP["Control Plane API<br/>/api/*"]:::api Wrapper["Wrapper Pod API<br/>/query, /algorithms"]:::api
Jupyter --> Client Client --> Mappings & Instances & Schema & Ops & Admin & Health Mappings & Instances & Schema & Ops & Admin & Health --> HTTP HTTP --> CP Instances -.->|"connect()"| Conn Conn --> Algo & NX Conn --> Wrapper Algo --> Wrapper3. API Capabilities Overview
Section titled “3. API Capabilities Overview”This section provides a scannable reference of SDK capabilities for architects and technical leads. For each resource, method signatures show what operations are available.
3.1 Client Initialization
Section titled “3.1 Client Initialization”from graph_olap import GraphOLAPClient
# Production: reads GRAPH_OLAP_* environment variablesclient = GraphOLAPClient.from_env()
# Direct configurationclient = GraphOLAPClient( api_url="https://graph.example.com", api_key="your-api-key", timeout=60.0,)3.2 Mappings — Define Your Graph
Section titled “3.2 Mappings — Define Your Graph”client.mappings — Mappings define what data to load from Starburst into a graph.
| Method | Parameters | Returns | Description |
|---|---|---|---|
list | *, owner, search, sort_by, offset, limit | PaginatedList[Mapping] | List mappings with filters |
get | mapping_id | Mapping | Get mapping by ID |
create | name, description, node_definitions, edge_definitions | Mapping | Create new mapping |
update | mapping_id, change_description, *, name, description, node_definitions, edge_definitions | Mapping | Update mapping (creates new version) |
delete | mapping_id | None | Delete mapping |
copy | mapping_id, new_name | Mapping | Copy mapping with new name |
get_version | mapping_id, version | MappingVersion | Get specific version |
list_versions | mapping_id | list[MappingVersion] | List all versions |
diff | mapping_id, from_version, to_version | MappingDiff | Compare two versions |
list_snapshots | mapping_id, *, offset, limit | PaginatedList[Snapshot] | List snapshots for mapping |
list_instances | mapping_id, *, offset, limit | PaginatedList[Instance] | List instances using mapping |
set_lifecycle | mapping_id, *, ttl, inactivity_timeout | Mapping | Set auto-cleanup policy |
get_tree | mapping_id, *, include_instances, status | dict | Get hierarchy (mapping → versions → snapshots → instances) |
3.3 Instances — Run Your Graph
Section titled “3.3 Instances — Run Your Graph”client.instances — Manage running graph instances (lifecycle, scaling, connectivity).
| Method | Parameters | Returns | Description |
|---|---|---|---|
list | *, snapshot_id, owner, status, search, sort_by, offset, limit | PaginatedList[Instance] | List instances with filters |
get | instance_id | Instance | Get instance by ID |
create | mapping_id, name, wrapper_type, *, mapping_version, description, ttl, inactivity_timeout, cpu_cores | Instance | Create instance (async) |
create_and_wait | mapping_id, name, wrapper_type, *, timeout, poll_interval, on_progress, ... | Instance | Create and wait until running |
update | instance_id, *, name, description | Instance | Update instance metadata |
terminate | instance_id | None | Terminate and delete instance |
update_cpu | instance_id, cpu_cores | Instance | Scale CPU (1-8 cores) |
update_memory | instance_id, memory_gb | Instance | Upgrade memory (2-32 GB) |
extend_ttl | instance_id, hours=24 | Instance | Extend TTL from current expiry |
set_lifecycle | instance_id, *, ttl, inactivity_timeout | Instance | Set lifecycle parameters |
get_progress | instance_id | InstanceProgress | Get startup progress details |
get_health | instance_id, *, timeout | dict | Get wrapper health status |
check_health | instance_id, *, timeout | bool | Check if wrapper is healthy |
wait_until_running | instance_id, *, timeout, poll_interval | Instance | Wait for running status |
connect | instance_id | InstanceConnection | Get connection for queries |
3.4 Schema Discovery
Section titled “3.4 Schema Discovery”client.schema — Browse Starburst metadata (cached, refreshed every 24h).
| Method | Parameters | Returns | Description |
|---|---|---|---|
list_catalogs | — | list[Catalog] | List all Starburst catalogs |
list_schemas | catalog | list[Schema] | List schemas in a catalog |
list_tables | catalog, schema | list[Table] | List tables in a schema |
list_columns | catalog, schema, table | list[Column] | Get columns for a table |
search_tables | pattern, limit=100 | list[Table] | Search tables by name pattern |
search_columns | pattern, limit=100 | list[Column] | Search columns by name pattern |
admin_refresh | — | dict | Trigger cache refresh (admin) |
get_stats | — | CacheStats | Get cache statistics (admin) |
3.5 Operations & Configuration
Section titled “3.5 Operations & Configuration”client.ops — Cluster configuration, jobs, and metrics. Requires Ops role.
| Method | Parameters | Returns | Description |
|---|---|---|---|
get_lifecycle_config | — | LifecycleConfig | Get TTL defaults for all resource types |
update_lifecycle_config | *, mapping, snapshot, instance | bool | Update lifecycle defaults |
get_concurrency_config | — | ConcurrencyConfig | Get instance limits |
update_concurrency_config | *, per_analyst, cluster_total | ConcurrencyConfig | Update instance limits |
get_maintenance_mode | — | MaintenanceMode | Get maintenance status |
set_maintenance_mode | enabled, message="" | MaintenanceMode | Enable/disable maintenance |
get_export_config | — | ExportConfig | Get export settings |
update_export_config | *, max_duration_seconds | ExportConfig | Update export timeout |
get_cluster_health | — | ClusterHealth | Check cluster health |
get_cluster_instances | — | ClusterInstances | Get cluster-wide instance summary |
get_metrics | — | str | Get Prometheus metrics |
trigger_job | job_name, reason="manual-trigger" | dict | Trigger background job |
get_job_status | — | dict | Get all job statuses |
get_state | — | dict | Get system state summary |
get_export_jobs | status=None, limit=100 | list[dict] | Get export jobs for debugging |
3.6 Utilities (Favorites, Admin, Health)
Section titled “3.6 Utilities (Favorites, Admin, Health)”client.favorites — User bookmarks for quick access.
| Method | Parameters | Returns | Description |
|---|---|---|---|
list | resource_type=None | list[Favorite] | List favorites |
add | resource_type, resource_id | Favorite | Add to favorites |
remove | resource_type, resource_id | None | Remove from favorites |
client.admin — Privileged operations. Requires Admin role.
| Method | Parameters | Returns | Description |
|---|---|---|---|
bulk_delete | resource_type, filters, reason, expected_count=None, dry_run=False | dict | Bulk delete with safety checks |
client.health — Health checks (no authentication required).
| Method | Parameters | Returns | Description |
|---|---|---|---|
check | — | HealthStatus | Basic health check |
ready | — | HealthStatus | Readiness check with DB connectivity |
3.7 Querying & Algorithms
Section titled “3.7 Querying & Algorithms”conn = client.instances.connect(instance_id) — Query interface to a running instance.
Cypher Queries
| Method | Parameters | Returns | Description |
|---|---|---|---|
query | cypher, parameters=None, *, timeout, coerce_types | QueryResult | Execute Cypher query |
query_df | cypher, parameters=None, *, backend="polars" | DataFrame | Query returning DataFrame |
query_scalar | cypher, parameters=None | Any | Query returning single value |
query_one | cypher, parameters=None | dict | None | Query returning single row |
get_schema | — | Schema | Get graph schema (labels, types, properties) |
get_lock | — | LockStatus | Get current lock status |
status | — | dict | Get instance status and resource usage |
Native Algorithms — conn.algo
| Method | Parameters | Returns | Description |
|---|---|---|---|
algorithms | category=None | list[dict] | List available algorithms |
algorithm_info | algorithm | dict | Get algorithm parameters |
run | algorithm, node_label, property_name, edge_type, *, params, timeout, wait | AlgorithmExecution | Run any native algorithm |
pagerank | node_label, property_name, edge_type, *, damping, max_iterations, tolerance | AlgorithmExecution | PageRank centrality |
louvain | node_label, property_name, *, edge_type, resolution | AlgorithmExecution | Louvain community detection |
connected_components | node_label, property_name, edge_type | AlgorithmExecution | Weakly connected components |
scc | node_label, property_name, *, edge_type | AlgorithmExecution | Strongly connected components |
kcore | node_label, property_name, *, edge_type | AlgorithmExecution | K-Core decomposition |
label_propagation | node_label, property_name, edge_type, *, max_iterations | AlgorithmExecution | Label propagation |
triangle_count | node_label, property_name, edge_type | AlgorithmExecution | Triangle count per node |
shortest_path | source_id, target_id, *, relationship_types, max_depth | AlgorithmExecution | Find shortest path |
NetworkX Algorithms — conn.networkx (500+ algorithms)
| Method | Parameters | Returns | Description |
|---|---|---|---|
algorithms | category=None | list[dict] | List available algorithms |
algorithm_info | algorithm | dict | Get algorithm parameters |
run | algorithm, node_label, property_name, *, params, timeout, wait | AlgorithmExecution | Run any NetworkX algorithm |
degree_centrality | node_label, property_name | AlgorithmExecution | Degree centrality |
betweenness_centrality | node_label, property_name, *, k | AlgorithmExecution | Betweenness centrality |
closeness_centrality | node_label, property_name | AlgorithmExecution | Closeness centrality |
eigenvector_centrality | node_label, property_name, *, max_iter | AlgorithmExecution | Eigenvector centrality |
clustering_coefficient | node_label, property_name | AlgorithmExecution | Clustering coefficient |
3.8 Query Results
Section titled “3.8 Query Results”QueryResult — Flexible output from Cypher queries.
| Method | Parameters | Returns | Description |
|---|---|---|---|
to_polars | — | polars.DataFrame | Convert to Polars DataFrame |
to_pandas | — | pandas.DataFrame | Convert to Pandas DataFrame |
to_networkx | — | networkx.DiGraph | Convert to NetworkX graph |
to_dicts | — | list[dict] | Convert to list of dicts |
scalar | — | Any | Get single scalar value |
to_csv | path | None | Export to CSV file |
to_parquet | path | None | Export to Parquet file |
show | max_rows=20 | — | Display in Jupyter (auto-visualization) |
Iteration: for row in result: yields dict[str, Any] for each row.
4. Typical User Workflow
Section titled “4. Typical User Workflow”The SDK enables a complete analytical workflow from data discovery to graph analysis:
Mermaid Source
sequenceDiagram accTitle: SDK Workflow Sequence accDescr: Shows typical user workflow from schema discovery through instance creation to algorithm execution
participant Analyst as Analyst<br/>(Jupyter) participant SDK as Python SDK participant CP as Control Plane participant Worker as Export Worker participant Starburst as Starburst participant GCS as GCS participant Wrapper as Graph Instance
Note over Analyst,Wrapper: 1. Schema Discovery Analyst->>SDK: client.schema.search_tables("customer") SDK->>CP: GET /api/schema/search/tables?q=customer CP-->>SDK: Matching tables SDK-->>Analyst: Table list with columns
Note over Analyst,Wrapper: 2. Create Mapping Analyst->>SDK: client.mappings.create(name, nodes, edges) SDK->>CP: POST /api/mappings CP-->>SDK: Mapping created SDK-->>Analyst: Mapping object
Note over Analyst,Wrapper: 3. Create Instance (triggers export) Analyst->>SDK: client.instances.create_and_wait(mapping_id) SDK->>CP: POST /api/instances {mapping_id} CP->>CP: Create snapshot + export job Worker->>CP: Claim export job Worker->>Starburst: UNLOAD query Starburst->>GCS: Write Parquet files Worker->>CP: Mark complete CP->>Wrapper: Create Pod Wrapper->>GCS: COPY FROM Parquet Wrapper-->>CP: Ready CP-->>SDK: Instance ready SDK-->>Analyst: Instance object
Note over Analyst,Wrapper: 4. Connect and Query Analyst->>SDK: conn = client.instances.connect(instance.id) Analyst->>SDK: conn.query_df("MATCH (n) RETURN n") SDK->>Wrapper: POST /query {cypher} Wrapper-->>SDK: Query results SDK-->>Analyst: pandas DataFrame
Note over Analyst,Wrapper: 5. Run Algorithms Analyst->>SDK: conn.algo.pagerank("Customer", "score") SDK->>Wrapper: POST /algorithms/pagerank Wrapper->>Wrapper: Execute algorithm Wrapper-->>SDK: Execution complete SDK-->>Analyst: AlgorithmResult5. Package Structure
Section titled “5. Package Structure”graph-olap-sdk/├── src/graph_olap/│ ├── client.py # GraphOLAPClient main entry point│ ├── config.py # Configuration and authentication│ ├── notebook.py # Jupyter integration (connect(), init())│ ├── resources/│ │ ├── mappings.py # MappingResource│ │ ├── instances.py # InstanceResource│ │ ├── schema.py # SchemaResource (Starburst metadata)│ │ ├── ops.py # OpsResource (config, cluster, jobs)│ │ ├── favorites.py # FavoriteResource│ │ ├── admin.py # AdminResource (bulk delete)│ │ └── health.py # HealthResource│ ├── instance/│ │ ├── connection.py # InstanceConnection class│ │ └── algorithms.py # Algorithm execution│ ├── models/ # Pydantic models│ ├── exceptions.py # Exception hierarchy│ └── http.py # HTTP client wrapper└── examples/ ├── basic_workflow.ipynb ├── algorithms.ipynb └── visualization.ipynb6. Key Design Decisions
Section titled “6. Key Design Decisions”| Decision | Choice | Rationale |
|---|---|---|
| HTTP Client | httpx | Modern async support, connection pooling, HTTP/2 |
| Models | Pydantic | Type safety, validation, JSON serialization |
| DataFrame Support | pandas + polars | Industry standard, analyst familiarity |
| API Style | Synchronous default | Notebook-friendly, with async support available |
| Error Handling | Typed exceptions | Clear, actionable error messages |
7. Authentication Flow
Section titled “7. Authentication Flow”Per ADR-104 and ADR-105, identity is carried end-to-end by a single canonical header, X-Username. Bearer tokens and internal API keys have been removed from the SDK and from the control-plane auth middleware.
Canonical identity header
Section titled “Canonical identity header”| Header | Status | Read by |
|---|---|---|
X-Username | Canonical (ADR-105) | Control-plane middleware (packages/control-plane/src/control_plane/middleware/identity.py); wrapper dependencies (packages/{ryugraph,falkordb}-wrapper/src/wrapper/dependencies.py). |
X-User-ID | Deprecated alias | Accepted only by the wrapper get_user_id dependency as a fallback when X-Username is absent, for backward compatibility with legacy callers. The control-plane does NOT accept this alias. |
X-User-Name | Deprecated alias | Accepted only by the wrapper get_user_name dependency as a fallback when X-Username is absent, for backward compatibility with legacy callers. The control-plane does NOT accept this alias. |
The SDK always sends X-Username (see packages/graph-olap-sdk/src/graph_olap/http.py:78 and instance/connection.py:82). New callers MUST send X-Username; the aliases above exist solely so that wrappers do not break during rolling upgrades from older SDK/tool versions.
Additional headers
Section titled “Additional headers”| Header | Purpose |
|---|---|
X-Use-Case-Id | Starburst use-case identifier passed through the middleware (ADR-102) |
X-User-Role | NOT sent by the SDK. Wrappers optionally read it if injected by an upstream component; absent → treated as analyst (see ADR-105 §F8). |
Role resolution
Section titled “Role resolution”Role is not carried in a header. The control plane resolves the authenticated user’s role from the users.role column after matching X-Username. Role hierarchy: Analyst < Admin < Ops. Each higher role inherits the permissions of the lower roles; Ops has exclusive access to config, cluster, and jobs endpoints.
See Authorization & Access Control for the complete permission matrix.
# Environment-based configuration (reads GRAPH_OLAP_API_URL, GRAPH_OLAP_USERNAME)client = GraphOLAPClient.from_env()
# Explicit username (overrides GRAPH_OLAP_USERNAME and identity.DEFAULT_USERNAME)client = GraphOLAPClient( api_url="https://graph.example.com", timeout=60.0,)
# Notebook persona switching (see ADR-105 §F3)import graph_olap.identitybob_client = GraphOLAPClient(api_url="https://graph.example.com")Related Documents
Section titled “Related Documents”- Detailed Architecture - Executive Summary + C4 Architecture Viewpoints + Resource Management
- Domain & Data Architecture - Domain Model, State Machines, Data Flows
- Platform Operations - Technology, Security, Integration, Operations, NFRs
- Authorization & Access Control - RBAC Roles, Permission Matrix, Ownership Model, Enforcement
This is part of the Graph OLAP Platform architecture documentation. See also: Detailed Architecture, Domain & Data Architecture, Platform Operations, Authorization.