System Architecture Design
System Architecture Design
Section titled “System Architecture Design”Overview
Section titled “Overview”The Graph OLAP Platform enables analysts to create ad-hoc graph instances from Starburst SQL queries, run graph algorithms, and share work across teams. Users interact with the platform exclusively through the Jupyter SDK (Python client library). The architecture follows a control plane + data plane pattern where the Control Plane manages resource lifecycle and the Data Plane (Ryugraph pods) handles graph operations.
Prerequisites
Section titled “Prerequisites”- requirements.md - Functional requirements
- architectural.guardrails.md - Technology constraints and patterns
Constraints
Section titled “Constraints”- All state managed through Control Plane API (no direct database access from workers/pods)
- Ryugraph runs embedded within Python wrapper (not as separate server)
- Parquet files as the only interchange format between Starburst Galaxy and Ryugraph
- PostgreSQL required in all environments (no SQLite support)
- Pod-per-instance model with ephemeral graph data
- All container images built for AMD64 architecture
Export Platform:
- ADR-070: Starburst Galaxy + BigQuery Export Platform
- ADR-071: PyArrow Fallback Export Strategy
Component Diagram
Section titled “Component Diagram”Mermaid Source
---config: layout: elk---flowchart TB accTitle: Graph OLAP Platform Component Architecture accDescr: Shows Jupyter SDK client, control plane, data plane, async processing, and external systems
classDef user fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#4A148C classDef service fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20 classDef data fill:#FFF8E1,stroke:#F57F17,stroke-width:2px,color:#E65100 classDef infra fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#0D47A1 classDef queue fill:#FCE4EC,stroke:#AD1457,stroke-width:2px,color:#880E4F classDef external fill:#ECEFF1,stroke:#455A64,stroke-width:2px,color:#263238
subgraph Clients["Clients"] Jupyter["Jupyter Notebook<br/>(Python SDK)"]:::user end
subgraph Ingress["Ingress Controller"] Router["/api/* → CP API | /{id}/* → Wrapper"]:::infra end
subgraph ControlPlane["Control Plane"] subgraph PythonBackend["Python Backend (FastAPI)"] RESTAPI["REST API"]:::service BGJobs["Background Jobs"]:::service end CloudSQL[("Cloud SQL<br/>PostgreSQL")]:::data end
subgraph DataPlane["Data Plane"] subgraph WrapperPod["Ryugraph Wrapper Pod (N)"] FastAPI["FastAPI REST API"]:::service Ryugraph["Ryugraph (embedded)"]:::data NetworkX["NetworkX (in-proc)"]:::service end PVC["PVC (buffer pool)"]:::data end
subgraph AsyncProc["Async Processing"] subgraph ExportWorkers["Export Workers (KEDA-scaled)"] Worker1["Export Worker"]:::service WorkerN["Export Worker...N"]:::service end KEDA["KEDA ScaledObject"]:::infra subgraph BGJobs2["Background Jobs"] ExportReconcile["Export Reconciliation<br/>Every 5 min"]:::service end end
GCS[("Google Cloud Storage<br/>gs://bucket/{owner}/{mapping_id}/v{version}/{snapshot_id}/")]:::data Starburst["Starburst Galaxy<br/>(Managed Trino SaaS)"]:::external
Jupyter -->|HTTPS| Ingress Ingress --> ControlPlane Ingress --> DataPlane PythonBackend --> CloudSQL KEDA -->|Query pending jobs| CloudSQL KEDA -->|Scale 0↔N| ExportWorkers Worker1 -->|POST /export-jobs/claim| RESTAPI Worker1 -->|Submit UNLOAD| Starburst Worker1 -->|Poll status| Starburst Worker1 -->|PATCH status| RESTAPI WrapperPod --> GCS FastAPI --> Ryugraph Ryugraph --> NetworkX WrapperPod --> PVC ExportReconcile --> CloudSQL ExportReconcile -->|Check orphans| Starburst Starburst -->|Write Parquet| GCSCross-Component Data Flow Matrix
Section titled “Cross-Component Data Flow Matrix”Consolidated view of all inter-component communication:
Data Flows
Section titled “Data Flows”| From | To | Data/Endpoint | Trigger | Error Handling |
|---|---|---|---|---|
| Jupyter SDK | Control Plane | HTTPS /api/* | User code | 4xx/5xx → SDK exception |
| Jupyter SDK | Wrapper Pod | HTTPS /{id}/* | User code | 4xx/5xx → SDK exception |
| Control Plane | Cloud SQL | SQL queries | API requests | 503 until recovered |
| Control Plane | Kubernetes | Create/Delete Pod | Instance lifecycle | Mark instance failed |
| Control Plane | Wrapper Pod | HTTP /shutdown | Terminate request | Force delete after timeout |
| KEDA | Cloud SQL | SELECT COUNT(*) pending jobs | Every 30s | Scale decision delayed |
| KEDA | Export Workers | Scale replicas 0↔N | Pending job count | Kubernetes handles restart |
| Export Worker | Control Plane | POST /export-jobs/claim | Worker loop (continuous) | Retry with backoff |
| Export Worker | Starburst Galaxy | POST /v1/statement (UNLOAD) | After claiming jobs | Retry 3x exp → PyArrow fallback → mark failed |
| Export Worker | Starburst Galaxy | GET nextUri (poll) | next_poll_at reached | Retry on next poll cycle |
| Export Worker | GCS | Count Parquet rows | Query FINISHED | Retry 3x exp |
| Export Worker | Control Plane | PATCH /export-jobs/:id | Status change | Retry 5x fixed 1s |
| Export Reconciliation | Cloud SQL | Reset stale claims, finalize | Every 5 min | Log, continue next run |
| Export Reconciliation | Starburst Galaxy | Check orphaned query status | Orphan detected | Update job status |
| Starburst Galaxy | GCS | Write Parquet | UNLOAD execution | Starburst handles |
| Wrapper Pod | GCS | COPY FROM Parquet | Startup | Fail fast → instance failed |
| Wrapper Pod | Control Plane | PUT /api/internal/instances/:id/status | Status change | Retry 5x fixed 1s |
| Wrapper Pod | Control Plane | PUT /api/internal/instances/:id/metrics | Periodic (60s) | Log, continue |
Error Recovery Matrix
Section titled “Error Recovery Matrix”| Failure Type | Detection | Recovery Action | Terminal State |
|---|---|---|---|
| Export Worker crash (claimed) | claimed_at > 10 min old | Reconciliation resets to pending | Job re-claimed by another worker |
| Export Worker crash (submitted) | next_poll_at stale > 10 min | Reconciliation checks Starburst | Job completed or reset |
| Starburst submission error | Worker exception | Retry 3x exponential → mark failed | Job status=failed |
| Starburst query timeout | Worker poll returns FAILED | Mark export_job failed | Snapshot status=failed |
| GCS count failure | Worker exception | Retry 3x exponential | Retry on reconciliation |
| All workers down | pending jobs accumulate | KEDA scales when load returns | Delayed export |
| Instance pod crash | CP health check (5 min) | Mark failed, delete pod | Instance status=failed |
| Instance startup timeout | CP reconciliation (10 min) | Mark failed | Instance status=failed |
| Control Plane crash | Kubernetes restart | Stateless recovery from DB | N/A (auto-recovers) |
| Export Worker OOM | Kubernetes restart | Pod restarted, jobs reset by reconciliation | None (auto-recovers) |
| Database unavailable | CP health check | 503 responses, workers pause | Until DB recovers |
Multi-Wrapper Architecture
Section titled “Multi-Wrapper Architecture”The platform supports multiple graph database backends through a pluggable wrapper architecture. See ADR-049: Multi-Wrapper Pluggable Architecture for full design rationale.
Wrapper Selection Flow
Section titled “Wrapper Selection Flow”Mermaid Source
flowchart LR accTitle: Wrapper Selection and Configuration Flow accDescr: Shows how wrapper_type flows from user request through WrapperFactory to K8s pod creation
User["User / SDK"] API["POST /api/instances<br/>{wrapper_type}"] CP["Control Plane<br/>Instance Service"] Factory["WrapperFactory"] K8s["K8s Service"] Pod1["Ryugraph Pod<br/>(8Gi RAM)"] Pod2["FalkorDB Pod<br/>(12Gi RAM)"]
User -->|"wrapper_type: ryugraph"| API User -->|"wrapper_type: falkordb"| API API --> CP CP -->|"wrapper_type"| Factory Factory -->|"RyugraphConfig"| K8s Factory -->|"FalkorDBConfig"| K8s K8s -->|"image: ryugraph-wrapper<br/>memory: 8Gi"| Pod1 K8s -->|"image: falkordb-wrapper<br/>memory: 12Gi"| Pod2
classDef user fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px classDef service fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px classDef pod fill:#E1F5FE,stroke:#0277BD,stroke-width:2px
class User user class API,CP,Factory,K8s service class Pod1,Pod2 podWrapperFactory Pattern
Section titled “WrapperFactory Pattern”Purpose: Centralize wrapper-specific configuration to avoid scattered conditional logic.
Location: packages/control-plane/src/control_plane/services/wrapper_factory.py
Responsibilities:
- Map
WrapperType→WrapperConfig(image, resources, env vars, ports) - Query wrapper capabilities from
WRAPPER_CAPABILITIESregistry - Provide single source of truth for wrapper configuration
Configuration Returned:
| Field | Ryugraph | FalkorDB |
|---|---|---|
image_name | ryugraph-wrapper | falkordb-wrapper |
image_tag | latest | latest |
container_port | 8000 | 8000 |
resource_limits | {memory: 8Gi, cpu: 4} | {memory: 12Gi, cpu: 4} |
resource_requests | {memory: 4Gi, cpu: 2} | {memory: 6Gi, cpu: 2} |
environment_variables | {WRAPPER_TYPE: ryugraph, BUFFER_POOL_SIZE: 2GB} | {WRAPPER_TYPE: falkordb, PYTHON_VERSION: 3.12} |
Usage in K8s Service:
wrapper_config = self._wrapper_factory.get_wrapper_config(wrapper_type)
pod_spec = { "containers": [{ "image": f"{wrapper_config.image_name}:{wrapper_config.image_tag}", "resources": { "requests": wrapper_config.resource_requests, "limits": wrapper_config.resource_limits, }, "env": [{"name": k, "value": v} for k, v in wrapper_config.environment_variables.items()], }]}Wrapper Capabilities Registry
Section titled “Wrapper Capabilities Registry”Purpose: Declarative feature discovery, prevent runtime errors from unsupported features.
Location: packages/graph-olap-schemas/src/graph_olap_schemas/wrapper_capabilities.py
Key Differences:
| Capability | Ryugraph | FalkorDB |
|---|---|---|
| NetworkX Support | ✓ Yes | ✗ No |
| Bulk Parquet Import | ✓ Yes (COPY FROM) | ✗ No (row-by-row) |
| Algorithm Invocation | REST API (/algo/*) | Cypher procedures (CALL algo.*) |
| Algorithm Result Mode | Property writeback | Query results |
| Native Algorithms | pagerank, wcc, louvain, kcore | BFS, betweennessCentrality, WCC, CDLP |
| Memory Model | Buffer pool + disk | In-memory only |
| Python Version | 3.11+ | 3.12+ |
Usage:
from graph_olap_schemas import get_wrapper_capabilities, WrapperType
caps = get_wrapper_capabilities(WrapperType.FALKORDB)if not caps.supports_networkx: raise UnsupportedOperationError("FalkorDB does not support NetworkX algorithms")Instance Lifecycle with Wrapper Selection
Section titled “Instance Lifecycle with Wrapper Selection”- User Request:
POST /api/instances {"wrapper_type": "falkordb", ...} - Validation: Control Plane validates wrapper_type against
WrapperTypeenum - Configuration:
WrapperFactory.get_wrapper_config(wrapper_type)→WrapperConfig - Pod Creation: K8s service creates pod with wrapper-specific image, resources, env vars
- Data Loading: Wrapper loads data from GCS Parquet (row-by-row for FalkorDB, bulk for Ryugraph)
- Ready State: Wrapper reports status to Control Plane, instance becomes queryable
- Query Routing: Ingress routes
/{url_slug}/*to correct wrapper pod
Adding New Wrapper Types
Section titled “Adding New Wrapper Types”Required Steps:
- Add enum value:
WrapperType.NEWDB = "newdb"ingraph-olap-schemas - Add capabilities: Entry in
WRAPPER_CAPABILITIESregistry - Add factory config: Case in
WrapperFactory.get_wrapper_config() - Create wrapper package:
packages/newdb-wrapper/following wrapper interface - Create deployment manifests: Kubernetes manifests for the new wrapper under
infrastructure/with resource specifications (applied viakubectl applyfrom the CD pipeline) - Update tests: Add wrapper_type to E2E tests, unit tests for WrapperFactory
No changes required to:
- Control Plane business logic
- K8s service (uses WrapperFactory abstraction)
- API contracts (wrapper_type is generic field)
- SDK (wrapper_type is enum value)
Data Flow Diagrams
Section titled “Data Flow Diagrams”1. Snapshot Creation Flow
Section titled “1. Snapshot Creation Flow”Stateless export workers with KEDA scaling and database polling. See ADR-025 for architecture details.
Mermaid Source
sequenceDiagram accTitle: Snapshot Creation Flow accDescr: Stateless export with KEDA-scaled workers and database polling
autonumber actor User participant CP as Control Plane participant DB as Cloud SQL participant KEDA participant Worker as Export Worker participant SB as Starburst participant GCS
%% Phase 1: User Request rect rgb(227, 242, 253) Note over User,DB: Phase 1: Instance from Mapping User->>+CP: POST /api/instances/from-mapping CP->>DB: INSERT snapshot (status=pending) CP->>DB: INSERT export_jobs (status=pending) × N CP->>DB: INSERT instance (status=waiting_for_snapshot) CP-->>-User: 201 Created {id, status: waiting_for_snapshot} end
%% Phase 2: KEDA Scaling rect rgb(232, 245, 233) Note over KEDA,Worker: Phase 2: Worker Scaling KEDA->>DB: SELECT COUNT(*) WHERE status='pending' KEDA->>Worker: Scale 0 → N pods end
%% Phase 3: Job Claiming rect rgb(255, 248, 225) Note over Worker,DB: Phase 3: Atomic Job Claiming Worker->>+CP: POST /export-jobs/claim {worker_id, limit: 10} CP->>DB: SELECT ... FOR UPDATE SKIP LOCKED CP->>DB: UPDATE SET claimed_by, claimed_at, status='claimed' CP-->>-Worker: [{id, sql, column_names, gcs_path}, ...] end
%% Phase 4: Starburst Submission rect rgb(225, 245, 254) Note over Worker,SB: Phase 4: Submit to Starburst loop For each claimed job Worker->>+SB: POST /v1/statement (UNLOAD)<br/>client_tags: [graph-olap-export] SB-->>-Worker: {query_id, nextUri} Worker->>CP: PATCH {status: submitted, query_id, next_poll_at} end end
%% Phase 5: Starburst executes Note over SB,GCS: Starburst executes UNLOAD queries (resource group throttled) SB->>GCS: Write Parquet files
%% Phase 6: Polling Loop rect rgb(243, 229, 245) Note over Worker,GCS: Phase 5: Stateless Polling Worker->>+CP: GET /export-jobs/pollable CP-->>-Worker: Jobs where next_poll_at <= now
loop For each pollable job Worker->>+SB: GET nextUri alt Query FINISHED SB-->>Worker: {state: FINISHED} Worker->>GCS: Count rows (Parquet metadata) Worker->>CP: PATCH {status: completed, row_count} else Query RUNNING SB-->>-Worker: {state: RUNNING} Worker->>CP: PATCH {next_poll_at: +fibonacci, poll_count++} end end end
%% Phase 7: Snapshot Finalization rect rgb(200, 230, 201) Note over CP,DB: Phase 6: Snapshot Finalization CP->>DB: All jobs completed? → UPDATE snapshot SET status='ready' end
%% Phase 8: Scale Down rect rgb(236, 239, 241) Note over KEDA,Worker: Phase 7: Scale Down KEDA->>DB: SELECT COUNT(*) = 0 KEDA->>Worker: Scale N → 0 pods end2. Instance Startup Flow
Section titled “2. Instance Startup Flow”Mermaid Source
sequenceDiagram accTitle: Instance Startup Flow accDescr: Shows pod creation and Ryugraph initialization from snapshot data
actor User participant CP as Control Plane participant K8s as Kubernetes participant Pod as Wrapper Pod participant GCS
User->>+CP: POST /api/instances CP->>CP: Validate snapshot status=ready CP->>CP: Validate concurrency limits CP->>CP: INSERT instance (status=starting) CP->>K8s: Create Pod spec CP-->>-User: 201 Created (status=starting)
K8s->>Pod: Schedule Pod
Note over Pod: Startup sequence Pod->>Pod: 1. Init Ryugraph DB Pod->>Pod: 2. CREATE NODE TABLE Pod->>Pod: 3. CREATE REL TABLE
Pod->>+GCS: COPY FROM nodes/*.parquet GCS-->>-Pod: Node data Pod->>+GCS: COPY FROM edges/*.parquet GCS-->>-Pod: Edge data
Pod->>CP: PUT /instances/:id/status (running, pod_ip)3. Query Execution Flow
Section titled “3. Query Execution Flow”Mermaid Source
sequenceDiagram accTitle: Query Execution Flow accDescr: Shows direct query path from Jupyter SDK through Control Plane to Wrapper Pod
participant SDK as Jupyter SDK participant CP as Control Plane participant Pod as Wrapper Pod
SDK->>+CP: GET /api/instances/:id CP-->>-SDK: 200 (instance_url)
SDK->>+Pod: POST /{instance-id}/query Note over Pod: conn = Connection(db)<br/>result = conn.execute(cypher) Pod-->>-SDK: 200 (query results as JSON)4. Algorithm Execution Flow (with Implicit Lock)
Section titled “4. Algorithm Execution Flow (with Implicit Lock)”Mermaid Source
sequenceDiagram accTitle: Algorithm Execution Flow with Implicit Lock accDescr: Shows async algorithm execution with in-memory lock management in Wrapper Pod
participant SDK as Jupyter SDK participant Pod as Wrapper Pod
SDK->>+Pod: POST /{instance-id}/algo/pagerank
Note over Pod: 1. Check lock (in-memory)<br/>→ If locked, return 409<br/>2. Acquire lock (in-memory)<br/>(user_id, algo, time)
Pod-->>SDK: 202 Accepted (algorithm started)
Note over Pod: 3. Execute algorithm<br/>- Extract to NetworkX<br/>- Run nx.pagerank()<br/>- Write back to props<br/>4. Release lock (in-memory)
deactivate Pod
SDK->>+Pod: GET /{instance-id}/lock (poll) Pod-->>-SDK: 200 (lock=null, algorithm complete)
SDK->>+Pod: MATCH (n) RETURN n.pagerank Pod-->>-SDK: 200 (query results)Note: Lock state is managed entirely within the Wrapper Pod (in-memory). Control Plane does not track locks.
Deployment Topology
Section titled “Deployment Topology”GKE Cluster Layout
Section titled “GKE Cluster Layout”Mermaid Source
---config: layout: elk elk: hierarchyHandling: INCLUDE_CHILDREN---flowchart TB accTitle: GKE Cluster Layout accDescr: Shows node pools and their workloads within the GKE cluster
classDef cluster fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#0D47A1 classDef nodepool fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20 classDef system fill:#ECEFF1,stroke:#455A64,stroke-width:2px,color:#263238 classDef cp fill:#E1F5FE,stroke:#0277BD,stroke-width:2px,color:#01579B classDef instance fill:#FFF8E1,stroke:#F57F17,stroke-width:2px,color:#E65100
subgraph GKE["GKE Cluster"] direction TB
subgraph SystemPool["System Node Pool"] direction LR ingress["ingress-nginx<br/>controller"]:::system certmgr["cert-manager"]:::system monitoring["prometheus<br/>grafana"]:::system end
subgraph CPPool["Control Plane Node Pool"] direction LR subgraph CPDeploy["control-plane Deployment (2 replicas)"] direction LR cp0["control-plane-0<br/>Python/FastAPI<br/>Port 8080"]:::cp cp1["control-plane-1<br/>Python/FastAPI<br/>Port 8080"]:::cp end end
subgraph InstancePool["Instance Node Pool (Autoscaling)"] direction LR subgraph inst1["graph-instance-001"] pod1["FastAPI + Ryugraph<br/>Port 8000"]:::instance pvc1["PVC: 10Gi"]:::instance end subgraph inst2["graph-instance-002"] pod2["FastAPI + Ryugraph<br/>Port 8000"]:::instance pvc2["PVC: 10Gi"]:::instance end subgraph instN["graph-instance-..."] podN["FastAPI + Ryugraph<br/>Port 8000"]:::instance pvcN["PVC: 10Gi"]:::instance end end endService Exposure
Section titled “Service Exposure”The platform follows a single architectural principle regardless of environment: only the Control Plane API requires external access. The Jupyter SDK is the sole client interface.
Service Access Matrix
Section titled “Service Access Matrix”| Service | Cluster-Internal Address | External Access? | Purpose |
|---|---|---|---|
| Control Plane API | http://control-plane:8080 (exposed via LoadBalancer + Ingress at platform hostname) | ✅ YES | Platform entry point for Jupyter SDK |
| Export Worker | http://control-plane:8080 | ❌ NO | Background service (cluster-internal) |
| Trino | Starburst Enterprise (VPC, via Private Service Connect) | ❌ NO | SQL engine (cluster-internal) |
| Postgres | Cloud SQL (Private IP, via Cloud SQL Proxy) | ❌ NO | Database (cluster-internal) |
| Wrapper Instances | http://wrapper-xxx:8000 | ❌ NO | Dynamic pods (cluster-internal) |
What NOT to expose:
- Trino/Starburst — accessed only from Export Workers and the Control Plane over the VPC
- Postgres — accessed only from the Control Plane over the private network
- Internal services — never need external access; use
kubectl port-forwardagainst the test cluster for debugging only
Production (GKE)
Section titled “Production (GKE)”LoadBalancer + Ingress:
# Control Plane ServiceapiVersion: v1kind: Servicemetadata: name: control-planespec: type: LoadBalancer # GKE provisions external IP selector: app: control-plane ports: - port: 443 targetPort: 8080
# Ingress with TLSapiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: control-plane-ingress annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod"spec: rules: - host: api.example.com http: paths: - path: / pathType: Prefix backend: service: name: control-plane port: number: 443 tls: - hosts: - api.example.com secretName: control-plane-tlsNetwork Security:
- Control Plane: Public LoadBalancer (authenticated via API keys/tokens)
- All other services: ClusterIP only (no external access)
- Starburst: Separate VPC, accessed via Private Service Connect
- Cloud SQL: Private IP only, accessed via Cloud SQL Proxy
Architectural Principle
Section titled “Architectural Principle”Single Entry Point Pattern:
- External clients (Jupyter SDK) → Control Plane API only
- Control Plane orchestrates all internal services
- Internal services never exposed externally (defense in depth)
- Same pattern across all deployment environments (consistency)
This minimizes attack surface and ensures all external access goes through authenticated Control Plane API.
JupyterHub Notebook Sync
Section titled “JupyterHub Notebook Sync”Reference: ADR-088: Notebook Sync Init Container
JupyterHub pods include an init container that synchronizes notebooks at startup:
singleuser: initContainers: - name: notebook-sync image: "${REGISTRY}/notebook-sync:${TAG}" env: - name: NOTEBOOK_REPO_URL value: "https://github.com/org/repo.git" - name: NOTEBOOK_BRANCH value: "main" volumeMounts: - name: home mountPath: /home/jovyanThe init container:
- Clones/pulls notebooks from git repository
- Copies CSS from SDK package to JupyterLab custom directory
- Sets proper permissions (jovyan user)
Kubernetes Resources per Instance
Section titled “Kubernetes Resources per Instance”# PodapiVersion: v1kind: Podmetadata: name: graph-instance-{instance_id} labels: app: graph-instance instance-id: "{instance_id}"spec: containers: - name: wrapper image: graph-olap/ryugraph-wrapper:latest ports: - containerPort: 8000 resources: # See ryugraph-performance.reference.md for sizing rationale requests: memory: "3Gi" # Burstable QoS: 3Gi base for buffer pool + overhead cpu: "1000m" # 1 vCPU baseline limits: memory: "8Gi" # Allow burst for NetworkX algorithms cpu: "4000m" # 4 vCPU for parallel operations volumeMounts: - name: data mountPath: /data env: - name: SNAPSHOT_GCS_PATH value: "gs://bucket/{owner}/{mapping_id}/v{mapping_version}/{snapshot_id}/" - name: BUFFER_POOL_SIZE value: "2147483648" # 2GB - see ryugraph-performance.reference.md - name: MAX_THREADS value: "16" # 4x CPU for I/O-bound GCS reads volumes: - name: data persistentVolumeClaim: claimName: graph-instance-{instance_id}-pvc
---# ServiceapiVersion: v1kind: Servicemetadata: name: graph-svc-{instance_id}spec: selector: instance-id: "{instance_id}" ports: - port: 8000 targetPort: 8000
---# Ingress path (added to main Ingress)# path: /{instance_id}/*# backend: graph-svc-{instance_id}:8000State Machine Diagrams
Section titled “State Machine Diagrams”Snapshot States
Section titled “Snapshot States”Mermaid Source
stateDiagram-v2 accTitle: Snapshot State Machine accDescr: Shows snapshot lifecycle from pending through creating to ready or failed
[*] --> pending: Record created pending --> creating: Worker picks up creating --> ready: Export complete creating --> failed: Error occurred failed --> creating: Retry trigger
note right of pending Export jobs created in DB Waiting for worker to claim end note
note right of creating Worker processing UNLOAD queries executing end note
note right of ready All Parquet files written counts/size recorded end note
note right of failed Starburst error GCS error Timeout end noteInstance States
Section titled “Instance States”Mermaid Source
stateDiagram-v2 accTitle: Instance State Machine accDescr: Shows instance lifecycle from starting through running to stopping or failed
[*] --> starting: Pod created starting --> running: Ready starting --> failed: Startup error running --> stopping: Terminate request stopping --> [*]: Pod deleted
note right of starting Ryugraph initializing Loading data from GCS end note
note right of running Ready to accept queries and algorithms end note
note right of stopping Shutdown initiated Pod terminating end note
note right of failed GCS error Schema error OOM end noteResource Deletion Rules
Section titled “Resource Deletion Rules”Resources have dependencies that must be respected during deletion:
Mermaid Source
flowchart TB accTitle: Deletion Dependency Chain accDescr: Shows resource dependencies and deletion rules for Mapping, Snapshot, and Instance
classDef resource fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#0D47A1 classDef action fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20 classDef check fill:#FFF8E1,stroke:#F57F17,stroke-width:2px,color:#E65100 classDef error fill:#FFEBEE,stroke:#C62828,stroke-width:2px,color:#B71C1C
subgraph Chain["DELETION DEPENDENCY CHAIN"] direction LR Mapping["Mapping"]:::resource Snapshot["Snapshot"]:::resource Instance["Instance"]:::resource Snapshot -->|depends on| Mapping Instance -->|depends on| Snapshot end
subgraph InstDel["DELETE Instance"] instAction["Always allowed"]:::action end
subgraph SnapDel["DELETE Snapshot"] snapCheck{"Active instances?<br/>(starting, running)"}:::check snapYes["409 RESOURCE_HAS_DEPENDENCIES"]:::error snapNo["DELETE + cleanup GCS"]:::action snapCheck -->|yes| snapYes snapCheck -->|no| snapNo end
subgraph MapDel["DELETE Mapping"] mapCheck{"Any snapshots exist?<br/>(any status, any version)"}:::check mapYes["409 RESOURCE_HAS_DEPENDENCIES"]:::error mapNo["DELETE (versions cascade)"]:::action mapCheck -->|yes| mapYes mapCheck -->|no| mapNo end
Instance --> InstDel Snapshot --> SnapDel Mapping --> MapDelDeletion Flows:
User wants to delete a Mapping:
- List snapshots for mapping (all versions)
- If snapshots exist → return 409 with snapshot_count
- User must delete each snapshot first
- For each snapshot:
- List instances for snapshot
- If active instances exist → return 409 with instance details
- User must terminate instances first (or wait for auto-cleanup)
- Once no active instances → delete snapshot + cleanup favorites
- Once no snapshots → delete mapping (versions cascade automatically) + cleanup favorites
Lifecycle auto-cleanup follows same rules:
- Expired instances terminated first
- Expired snapshots deleted only after their instances are gone
- Expired mappings deleted only after their snapshots are gone
Favorites Cleanup: When any resource is deleted, also delete all user_favorites referencing that resource:
DELETE FROM user_favorites WHERE resource_type = :type AND resource_id = :idNote: stopping and failed instances do NOT block snapshot deletion. Only starting and running instances block deletion.
Communication Protocols
Section titled “Communication Protocols”Synchronous (Request-Response)
Section titled “Synchronous (Request-Response)”| Source | Target | Protocol | Authentication |
|---|---|---|---|
| Jupyter SDK | Control Plane | HTTPS | API key header |
| Jupyter SDK | Wrapper Pod | HTTPS | API key header |
| Control Plane | Wrapper Pod | HTTP | Internal (cluster network) |
| Wrapper Pod | GCS | HTTPS | Workload Identity |
Asynchronous (Database Polling)
Section titled “Asynchronous (Database Polling)”Export work is coordinated through the export_jobs table rather than a message queue. The Export Worker polls the database for claimable jobs using FOR UPDATE SKIP LOCKED. See ADR-025 for the full architecture rationale.
| Mechanism | Publisher | Consumer | Details |
|---|---|---|---|
export_jobs table (status=pending) | Control Plane | Export Worker (APScheduler polling) | Worker claims jobs atomically; KEDA scales workers based on pending-job count |
See data.model.spec.md for the export_jobs schema and claim/poll query patterns.
Error Handling
Section titled “Error Handling”Failure Scenarios and Recovery
Section titled “Failure Scenarios and Recovery”| Scenario | Detection | Recovery |
|---|---|---|
| Export Worker crash (claimed) | claimed_at > 10 min old | Reconciliation resets job to pending; re-claimed by another worker |
| Starburst submission error | Worker catches exception | Status→failed, error_message set |
| Export Worker crash (submitted) | next_poll_at stale > 10 min | Reconciliation checks Starburst; job completed or reset |
| Starburst query timeout/error | Poller detects FAILED state | export_job→failed, snapshot→failed when all jobs checked |
| GCS count failure | Poller catches exception | Retry on next poll (Fibonacci backoff) |
| Partial export (some jobs complete) | Poller checks all jobs | Mark completed jobs, continue polling running jobs |
| Instance pod OOM | Kubernetes restarts pod | Status→failed (startup OOM) or maintained (runtime OOM with restart) |
| Instance pod crash | Control Plane health check | Status→failed, pod deleted |
| Control Plane crash | Kubernetes restarts pod | Stateless, recovers from database |
| Database unavailable | Control Plane health check fails | 503 responses until recovered |
Partial Failure Handling
Section titled “Partial Failure Handling”Snapshot Creation (multi-step UNLOAD):
- If any UNLOAD fails, entire snapshot fails
- Partial Parquet files remain in GCS (overwritten on retry)
- Worker updates status to failed with specific error
Instance Startup (multi-step COPY FROM):
- If any COPY FROM fails, instance fails
- Ryugraph database may be partially loaded
- Pod terminated, PVC deleted
Orphan Resource Cleanup
Section titled “Orphan Resource Cleanup”Control Plane runs a reconciliation job on startup and periodically to detect and clean orphan resources.
Startup Reconciliation:
- List all Kubernetes pods with label app=graph-instance
- For each pod, check if matching instance exists in database
- If no DB record: delete orphan pod
- If DB record exists but status=‘starting’ for >10 minutes: mark failed, delete pod
- List all DB instances with status=‘running’ or ‘starting’
- If no matching pod exists: mark instance as failed
Periodic Reconciliation (every 5 minutes):
- Find snapshots with status=‘creating’ for >1 hour
- Mark as failed with error “Export timeout - no progress update”
- Find instances with status=‘starting’ for >10 minutes
- Mark as failed with error “Startup timeout”
- Delete orphan pod if exists
- Find instances with status=‘stopping’ for >5 minutes
- Force delete pod
- Delete database record
GCS Cleanup (daily):
- List GCS paths for snapshots with status=‘failed’
- If snapshot failed >7 days ago and no retry attempted:
- Delete GCS files to reclaim storage
- Set gcs_path to NULL or mark as cleaned
Lifecycle Expiry Cleanup
Section titled “Lifecycle Expiry Cleanup”Control Plane runs a lifecycle cleanup job every 5 minutes to enforce TTL and inactivity timeouts.
TTL Enforcement:
-- Find expired instances (TTL elapsed since creation)SELECT id FROM instancesWHERE status IN ('starting', 'running') AND ttl IS NOT NULL AND datetime(created_at, '+' || ttl_hours || ' hours') < datetime('now');-- Note: ttl_hours computed from ISO 8601 duration
-- For each expired instance:1. Call POST /instances/:id/terminateInactivity Timeout Enforcement:
-- Find inactive instancesSELECT id FROM instancesWHERE status = 'running' AND inactivity_timeout IS NOT NULL AND datetime(last_activity_at, '+' || timeout_hours || ' hours') < datetime('now');
-- For each inactive instance:1. Call POST /instances/:id/terminateSnapshot and Mapping Lifecycle:
- Same pattern for snapshots (check last_used_at for inactivity)
- Same pattern for mappings (less common, typically no TTL)
Cascade consideration
- When snapshot expires, all its instances must be terminated first
- When mapping expires, all its snapshots must be deleted first
- Cleanup job handles dependencies in correct order:
- Terminate instances
- Delete snapshots (after instances gone)
- Delete mappings (after snapshots gone)
Cleanup Job Sequence:
- Find and terminate expired/inactive instances
- Wait for instances to reach ‘stopping’ or deleted state
- Find and delete expired/inactive snapshots (where no active instances)
- Find and delete expired/inactive mappings (where no snapshots)
- Log summary: “Cleanup completed: X instances, Y snapshots, Z mappings”
API/Interface Summary
Section titled “API/Interface Summary”API Classification:
- Client (Jupyter SDK): Consumes
/api/*endpoints with user credentials via Ingress. - Platform Components (Wrapper, Worker): Consume
/api/internal/*endpoints with service accounts via ClusterIP.
Control Plane API (consumed by Jupyter SDK)
Section titled “Control Plane API (consumed by Jupyter SDK)”| Endpoint Pattern | Purpose |
|---|---|
GET/POST /api/mappings | List, create mappings |
GET/PUT/DELETE /api/mappings/:id | Manage single mapping |
GET /api/mappings/:id/versions | List mapping versions |
GET /api/mappings/:id/tree | Get mapping resource tree |
GET /api/snapshots | List snapshots (read-only) |
GET /api/snapshots/:id | Get single snapshot (read-only) |
GET /api/snapshots/:id/progress | Get snapshot progress |
GET/POST /api/instances | List, create instances |
POST /api/instances/from-mapping | Create instance from mapping (auto-creates snapshot) |
GET/PUT/DELETE /api/instances/:id | Manage single instance |
POST /api/instances/:id/terminate | Terminate instance |
GET /api/instances/:id/progress | Get instance progress |
GET/POST/DELETE /api/favorites | Manage user favorites |
GET /api/schema/catalogs, /catalogs/{c}/schemas, /catalogs/{c}/schemas/{s}/tables, /catalogs/{c}/schemas/{s}/tables/{t}/columns | Schema browser (cached metadata) |
GET /api/schema/search/tables, /search/columns | Search tables and columns by name pattern |
POST /api/schema/admin/refresh, GET /api/schema/stats | Admin-only schema cache refresh and stats |
GET/PUT /api/config/* | Ops configuration |
GET /api/cluster/* | Cluster health/metrics |
GET /api/exports/* | Export queue management |
Wrapper Pod → External
Section titled “Wrapper Pod → External”| Endpoint | Purpose |
|---|---|
POST /query | Execute Cypher query |
POST /algo/{name} | Run Ryugraph native algorithm |
POST /networkx/{name} | Run NetworkX algorithm |
POST /subgraph | Extract subgraph |
GET /lock | Check lock status |
GET /schema | Get graph schema |
GET /health | Health check |
GET /status | Detailed status (memory, disk) |
POST /shutdown | Graceful shutdown (internal) |
Wrapper Pod → Control Plane (Internal API, platform component)
Section titled “Wrapper Pod → Control Plane (Internal API, platform component)”| Endpoint | Purpose |
|---|---|
PUT /api/internal/instances/:id/status | Report status changes |
PUT /api/internal/instances/:id/metrics | Report resource usage |
Export Workers → Control Plane (Internal API, platform component)
Section titled “Export Workers → Control Plane (Internal API, platform component)”| Endpoint | Purpose | Component |
|---|---|---|
PUT /api/internal/snapshots/:id/status | Update snapshot status | Both |
POST /api/internal/snapshots/:id/export-jobs | Create export job records | Submitter |
GET /api/internal/snapshots/:id/export-jobs | Get export jobs (optional status filter) | Poller |
PATCH /api/internal/export-jobs/:id | Update single export job | Poller |
Anti-Patterns
Section titled “Anti-Patterns”See architectural.guardrails.md for the authoritative list of anti-patterns.
Key sections relevant to system architecture:
- Database & Schema - No direct DB access from workers/pods
- Component Communication - All state updates via Control Plane API
- Concurrency & Pod Lifecycle - Single Ryugraph process per pod, implicit locking
- Data Handling & GCS - No GCS access from Control Plane
Open Questions
Section titled “Open Questions”See decision.log.md for consolidated open questions and architecture decision records.