Skip to content

Data Model Specification

Complete database schema specification for the Graph OLAP Platform Control Plane. PostgreSQL is required in all environments (local dev, DO Demo, GCP staging/production).

  • PostgreSQL is required in all environments (SQLite not supported)
  • Use TEXT for UUIDs, timestamps (ISO 8601), and durations (ISO 8601)
  • Use INTEGER for booleans (0/1)
  • Foreign keys enforced by PostgreSQL
  • PostgreSQL-specific features allowed (e.g., FOR UPDATE SKIP LOCKED)

Logical TypeSQL TypeFormatExample
User IDTEXTUsername from auth headeralice.smith, bob.jones
Resource IDINTEGERAuto-incrementing1, 42, 1337
TimestampTEXTISO 86012025-01-15T10:30:00Z
DurationTEXTISO 8601PT24H, P7D, PT30M
BooleanINTEGER0 or 11
EnumTEXTLowercase valuespending, ready, failed
JSONTEXTValid JSON{"key": "value"}

ID Strategy:

  • User IDs: Username from X-Username header (set by auth middleware, see api.common.spec.md#authentication)
  • Resource IDs: Database-generated auto-incrementing integers (mappings, snapshots, instances, etc.)

graph-olap-platform-entity-relationship-diagram

Mermaid Source
erDiagram
accTitle: Graph OLAP Platform Entity Relationship Diagram
accDescr: Shows database schema with users, mappings, snapshots, instances, and supporting tables
users ||--o{ mappings : owns
users ||--o{ snapshots : owns
users ||--o{ instances : owns
users ||--o{ user_favorites : has
users ||--o{ global_config : updates
mappings ||--|{ mapping_versions : has
mappings ||--o{ snapshots : has
mapping_versions ||--o{ snapshots : references
snapshots ||--o{ instances : has
snapshots ||--|{ export_jobs : has
instances ||--o{ instance_events : has
allowed_catalogs ||--o{ allowed_schemas : contains
allowed_schemas ||--o{ schema_metadata_cache : caches
users {
string username PK
string email UK
string display_name
string role
bool is_active
timestamp created_at
timestamp updated_at
timestamp last_login_at
}
mappings {
int id PK
string owner_username FK
string name
string description
int current_version
string ttl
string inactivity_timeout
timestamp created_at
timestamp updated_at
}
mapping_versions {
int mapping_id PK
int version PK
string change_description
json node_definitions
json edge_definitions
timestamp created_at
string created_by FK
}
snapshots {
int id PK
int mapping_id FK
int mapping_version FK
string owner_username FK
string name
string gcs_path
string status
json node_counts
json edge_counts
bigint size_bytes
string error_message
timestamp last_used_at
}
instances {
int id PK
int snapshot_id FK
int pending_snapshot_id FK
string owner_username FK
string wrapper_type
string name
string status
string instance_url
string pod_name
int cpu_cores
int memory_gb
bigint memory_usage_bytes
bigint disk_usage_bytes
timestamp last_activity_at
}
instance_events {
int id PK
int instance_id FK
string event_type
json details
timestamp created_at
}
global_config {
string key PK
string value
string description
timestamp updated_at
string updated_by FK
}
user_favorites {
string username PK
string resource_type PK
int resource_id PK
timestamp created_at
}
allowed_catalogs {
int id PK
string catalog_name UK
bool enabled
}
allowed_schemas {
int id PK
int catalog_id FK
string schema_name
bool enabled
}
schema_metadata_cache {
int id PK
int schema_id FK
string table_name
json columns
timestamp cached_at
}
export_jobs {
int id PK
int snapshot_id FK
string job_type
string entity_name
string status
string claimed_by
timestamp claimed_at
string starburst_query_id
string next_uri
timestamp next_poll_at
int poll_count
string sql
string gcs_path
int row_count
bigint size_bytes
string error_message
}

Stores user accounts for FK references and tracking. User ID is the username from X-Username header set by auth middleware. Roles are stored in the database (ADR-104) as analyst | admin | ops; the X-User-Role header path was superseded.

CREATE TABLE users (
username TEXT PRIMARY KEY, -- Username from X-Username header (e.g., 'alice.smith')
email TEXT NOT NULL UNIQUE,
display_name TEXT NOT NULL,
role TEXT NOT NULL DEFAULT 'analyst' -- CHECK: 'analyst', 'admin', 'ops' (ADR-104)
CHECK (role IN ('analyst', 'admin', 'ops')),
role_changed_at TEXT, -- ISO 8601 timestamp (nullable)
role_changed_by TEXT, -- Username who last changed role (nullable)
created_at TEXT NOT NULL, -- ISO 8601 timestamp
updated_at TEXT NOT NULL, -- ISO 8601 timestamp
last_login_at TEXT, -- ISO 8601 timestamp (nullable)
is_active INTEGER NOT NULL DEFAULT 1 -- 0 = disabled, 1 = active
);
CREATE INDEX idx_users_email ON users(email);

Notes:

  • username is extracted from X-Username header (primary key)
  • User record auto-created on first request if not exists
  • Role IS stored in database as of ADR-104; values: analyst, admin, ops (default: analyst)
  • is_active allows disabling without deletion
  • Users exist primarily for FK references (ownership) and timestamps

Header table for mapping definitions. Mutable fields separate from immutable versions.

CREATE TABLE mappings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
owner_username TEXT NOT NULL REFERENCES users(username),
name TEXT NOT NULL,
description TEXT,
current_version INTEGER NOT NULL DEFAULT 1,
created_at TEXT NOT NULL, -- ISO 8601 timestamp
updated_at TEXT NOT NULL, -- ISO 8601 timestamp
ttl TEXT, -- ISO 8601 duration (nullable = no expiry)
inactivity_timeout TEXT -- ISO 8601 duration (nullable = no timeout)
);
CREATE INDEX idx_mappings_owner ON mappings(owner_username);
CREATE INDEX idx_mappings_created_at ON mappings(created_at);
CREATE INDEX idx_mappings_name ON mappings(name);

Notes:

  • id is database-generated auto-incrementing integer
  • current_version points to latest version in mapping_versions
  • ttl and inactivity_timeout store durations as ISO 8601 (e.g., P7D, PT24H)
  • Name is not unique (different owners can have same name)

Immutable version records. Created on initial mapping creation and each edit.

CREATE TABLE mapping_versions (
mapping_id INTEGER NOT NULL REFERENCES mappings(id) ON DELETE CASCADE,
version INTEGER NOT NULL,
change_description TEXT, -- NULL for version 1, required for subsequent
node_definitions TEXT NOT NULL, -- JSON array
edge_definitions TEXT NOT NULL, -- JSON array
created_at TEXT NOT NULL, -- ISO 8601 timestamp
created_by TEXT NOT NULL REFERENCES users(username),
PRIMARY KEY (mapping_id, version)
);
CREATE INDEX idx_mapping_versions_created_at ON mapping_versions(created_at);

Notes:

  • Composite primary key (mapping_id, version)
  • change_description required for version > 1
  • node_definitions and edge_definitions are JSON arrays - see requirements.md for complete schema
  • Versions are never deleted individually; cascade on mapping delete

Data snapshots created from mapping exports.

CREATE TABLE snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
mapping_id INTEGER NOT NULL REFERENCES mappings(id),
mapping_version INTEGER NOT NULL,
owner_username TEXT NOT NULL REFERENCES users(username),
name TEXT NOT NULL,
description TEXT,
gcs_path TEXT NOT NULL, -- Full GCS path: gs://bucket/{owner}/{mapping_id}/v{mapping_version}/{snapshot_id}/
size_bytes INTEGER, -- Total storage size (set when ready)
node_counts TEXT, -- JSON object: {"Customer": 1000, "Product": 500}
edge_counts TEXT, -- JSON object: {"PURCHASED": 5000}
status TEXT NOT NULL CHECK (status IN ('pending', 'creating', 'ready', 'failed')),
progress TEXT, -- JSON: current export progress (see schema below)
error_message TEXT, -- Set when status='failed'
created_at TEXT NOT NULL, -- ISO 8601 timestamp
updated_at TEXT NOT NULL, -- ISO 8601 timestamp
ttl TEXT, -- ISO 8601 duration
inactivity_timeout TEXT, -- ISO 8601 duration
last_used_at TEXT, -- Last instance created from this snapshot
FOREIGN KEY (mapping_id, mapping_version) REFERENCES mapping_versions(mapping_id, version)
);
CREATE INDEX idx_snapshots_mapping_id ON snapshots(mapping_id);
CREATE INDEX idx_snapshots_owner ON snapshots(owner_username);
CREATE INDEX idx_snapshots_status ON snapshots(status);
CREATE INDEX idx_snapshots_created_at ON snapshots(created_at);

Notes:

  • id is database-generated auto-incrementing integer
  • mapping_version captures which version was used (immutable after creation)
  • node_counts and edge_counts are JSON for per-type counts
  • gcs_path includes snapshot owner’s username (not mapping owner)
  • last_used_at updated when instance created (for inactivity timeout)
  • Detailed export progress tracked in export_jobs table (one row per node/edge)
  • progress field provides summary view, derived from export_jobs at query time

Status Transitions:

pending → creating → ready
→ failed
→ cancelled
  • pending: Snapshot created, waiting for Export Worker
  • creating: UNLOAD queries submitted to Starburst, polling in progress
  • ready: All export_jobs completed successfully
  • failed: At least one export_job failed
  • cancelled: Export explicitly cancelled before completion

Progress JSON Schema (snapshots):

Derived from export_jobs table for API responses:

{
"phase": "submitting|creating|completed",
"started_at": "2025-01-15T10:30:00Z",
"jobs_total": 5,
"jobs_completed": 3,
"jobs_failed": 0,
"jobs": [
{"name": "Customer", "type": "node", "status": "completed", "row_count": 10000},
{"name": "Product", "type": "node", "status": "completed", "row_count": 500},
{"name": "Supplier", "type": "node", "status": "running", "row_count": null},
{"name": "PURCHASED", "type": "edge", "status": "pending", "row_count": null},
{"name": "SUPPLIES", "type": "edge", "status": "pending", "row_count": null}
]
}

Running graph instance pods.

CREATE TABLE instances (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER NOT NULL REFERENCES snapshots(id),
pending_snapshot_id INTEGER REFERENCES snapshots(id), -- Set when status='waiting_for_snapshot'
owner_username TEXT NOT NULL REFERENCES users(username),
wrapper_type TEXT NOT NULL, -- e.g. 'ryugraph', 'falkordb'
name TEXT NOT NULL,
description TEXT,
url_slug TEXT UNIQUE, -- UUID slug for external URL routing
instance_url TEXT, -- Set when running: https://{domain}/{slug}/
pod_name TEXT, -- Kubernetes pod name
pod_ip TEXT, -- Internal pod IP
status TEXT NOT NULL CHECK (status IN (
'waiting_for_snapshot', 'starting', 'running', 'stopping', 'failed'
)),
progress TEXT, -- JSON: current startup progress (see schema below)
error_message TEXT, -- Human-readable error (set when status='failed')
error_code TEXT, -- Machine-readable error code (set when status='failed')
stack_trace TEXT, -- Stack trace for debugging (set when status='failed')
created_at TEXT NOT NULL, -- ISO 8601 timestamp
updated_at TEXT NOT NULL, -- ISO 8601 timestamp
started_at TEXT, -- When status became 'running'
last_activity_at TEXT, -- Last query or algorithm execution
ttl TEXT, -- ISO 8601 duration
inactivity_timeout TEXT, -- ISO 8601 duration
cpu_cores INTEGER DEFAULT 2, -- Current CPU allocation
memory_gb INTEGER, -- Current memory allocation in GB (dynamic resize)
memory_usage_bytes INTEGER, -- Observed memory (updated periodically)
disk_usage_bytes INTEGER -- Observed disk (updated periodically)
);
CREATE INDEX idx_instances_snapshot_id ON instances(snapshot_id);
CREATE INDEX idx_instances_owner ON instances(owner_username);
CREATE INDEX idx_instances_status ON instances(status);
CREATE INDEX idx_instances_created_at ON instances(created_at);

Notes:

  • id is database-generated auto-incrementing integer
  • pending_snapshot_id is set when an instance is created from a mapping and waits for snapshot export; cleared when instance transitions to starting
  • wrapper_type identifies the graph engine pod type (required)
  • Lock state is NOT stored in Control Plane; query GET /{instance_id}/lock on the Wrapper Pod directly
  • status='stopping' is transient; leads to row deletion
  • instance_url set after pod becomes ready
  • progress updated by Wrapper Pod via internal API during startup

Progress JSON Schema (instances):

{
"phase": "initializing|loading_nodes|loading_edges|ready",
"started_at": "2025-01-15T10:30:00Z",
"steps": [
{"name": "pod_scheduled", "status": "completed", "completed_at": "2025-01-15T10:30:05Z"},
{"name": "schema_created", "status": "completed", "completed_at": "2025-01-15T10:30:10Z"},
{"name": "Customer", "type": "node", "status": "completed", "row_count": 10000},
{"name": "Product", "type": "node", "status": "in_progress", "row_count": null},
{"name": "PURCHASED", "type": "edge", "status": "pending", "row_count": null}
]
}

Resource monitoring events for instances (CPU updates, memory upgrades, OOM recoveries).

CREATE TABLE instance_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
instance_id INTEGER NOT NULL REFERENCES instances(id) ON DELETE CASCADE,
event_type TEXT NOT NULL CHECK (event_type IN (
'memory_upgraded', 'cpu_updated', 'oom_recovered', 'resize_failed'
)),
details TEXT, -- JSON with old/new values
created_at TEXT NOT NULL -- ISO 8601 timestamp
);
CREATE INDEX idx_instance_events_instance_id ON instance_events(instance_id);
CREATE INDEX idx_instance_events_type ON instance_events(event_type);

Notes:

  • Cascade-deleted when the parent instance is deleted
  • details JSON typically records {from: <old_value>, to: <new_value>} for resize events

Ops-managed global configuration.

CREATE TABLE global_config (
key TEXT PRIMARY KEY,
value TEXT NOT NULL, -- JSON or simple value
description TEXT,
updated_at TEXT NOT NULL, -- ISO 8601 timestamp
updated_by TEXT NOT NULL REFERENCES users(username)
);

Configuration Keys:

KeyValue FormatDescription
lifecycle.mapping.default_ttlISO 8601 durationDefault mapping TTL
lifecycle.mapping.default_inactivityISO 8601 durationDefault mapping inactivity timeout
lifecycle.mapping.max_ttlISO 8601 durationMaximum allowed mapping TTL
lifecycle.snapshot.default_ttlISO 8601 durationDefault snapshot TTL (e.g., P7D)
lifecycle.snapshot.default_inactivityISO 8601 durationDefault snapshot inactivity timeout
lifecycle.snapshot.max_ttlISO 8601 durationMaximum allowed snapshot TTL
lifecycle.instance.default_ttlISO 8601 durationDefault instance TTL (e.g., PT24H)
lifecycle.instance.default_inactivityISO 8601 durationDefault instance inactivity timeout
lifecycle.instance.max_ttlISO 8601 durationMaximum allowed instance TTL
concurrency.per_analystINTEGERMax instances per analyst
concurrency.cluster_totalINTEGERMax instances cluster-wide
maintenance.enabledBOOLEANMaintenance mode on/off
maintenance.messageTEXTMessage shown during maintenance
cache.metadata.ttl_hoursINTEGERSchema metadata cache TTL (default: 24)
export.max_duration_secondsINTEGERMax export job duration before timeout (default: 3600)

Starburst catalogs available for Schema Browser.

CREATE TABLE allowed_catalogs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
catalog_name TEXT NOT NULL UNIQUE,
enabled INTEGER NOT NULL DEFAULT 1, -- 0=disabled, 1=enabled
created_at TEXT NOT NULL, -- ISO 8601
updated_at TEXT NOT NULL -- ISO 8601
);

Schemas within allowed catalogs.

CREATE TABLE allowed_schemas (
id INTEGER PRIMARY KEY AUTOINCREMENT,
catalog_id INTEGER NOT NULL REFERENCES allowed_catalogs(id) ON DELETE CASCADE,
schema_name TEXT NOT NULL,
enabled INTEGER NOT NULL DEFAULT 1,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
UNIQUE(catalog_id, schema_name)
);

Cached table/column metadata from Starburst.

CREATE TABLE schema_metadata_cache (
id INTEGER PRIMARY KEY AUTOINCREMENT,
schema_id INTEGER NOT NULL REFERENCES allowed_schemas(id) ON DELETE CASCADE,
table_name TEXT NOT NULL,
columns TEXT NOT NULL, -- JSON: [{name, type, nullable}]
cached_at TEXT NOT NULL, -- ISO 8601
UNIQUE(schema_id, table_name)
);

Note: Audit logging is handled by the external observability stack, not stored in this database. See requirements for audit event categories.


User bookmarks for quick access.

CREATE TABLE user_favorites (
username TEXT NOT NULL REFERENCES users(username) ON DELETE CASCADE,
resource_type TEXT NOT NULL CHECK (resource_type IN ('mapping', 'snapshot', 'instance')),
resource_id INTEGER NOT NULL,
created_at TEXT NOT NULL, -- ISO 8601 timestamp
PRIMARY KEY (username, resource_type, resource_id)
);
CREATE INDEX idx_user_favorites_username ON user_favorites(username);
CREATE INDEX idx_user_favorites_resource ON user_favorites(resource_type, resource_id);

Notes:

  • No foreign key to resource tables (polymorphic reference)
  • When a resource is deleted, application MUST also delete matching favorites: DELETE FROM user_favorites WHERE resource_type = :type AND resource_id = :id

Tracks individual Starburst UNLOAD queries for each snapshot export. See ADR-025 for architecture details.

CREATE TABLE export_jobs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER NOT NULL REFERENCES snapshots(id) ON DELETE CASCADE,
job_type TEXT NOT NULL CHECK (job_type IN ('node', 'edge')),
entity_name TEXT NOT NULL, -- Node label or edge type name
status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'claimed', 'submitted', 'completed', 'failed')),
-- Job claiming (for stateless workers)
claimed_by TEXT, -- Worker ID that claimed this job
claimed_at TEXT, -- When job was claimed (for lease expiry)
-- Starburst tracking
starburst_query_id TEXT, -- Query ID from POST /v1/statement response
next_uri TEXT, -- Current polling URI for this query
-- Stateless polling state (persisted, not in-memory)
next_poll_at TEXT, -- When to poll next (ISO 8601)
poll_count INTEGER DEFAULT 0, -- Number of polls (for Fibonacci backoff)
-- Job definition (denormalized for worker efficiency)
sql TEXT, -- UNLOAD SQL query
column_names TEXT, -- JSON array of column names
starburst_catalog TEXT, -- Starburst catalog name
-- Results (set on completion)
gcs_path TEXT NOT NULL, -- gs://bucket/{owner}/{mapping_id}/v{mapping_version}/{snapshot_id}/nodes/{entity}/
row_count INTEGER, -- Row count from Parquet files
size_bytes INTEGER, -- Total size of Parquet files
-- Timestamps
submitted_at TEXT, -- When UNLOAD query was submitted to Starburst
completed_at TEXT, -- When query finished (success or failure)
error_message TEXT, -- Error details if status='failed'
created_at TEXT NOT NULL, -- ISO 8601 timestamp
updated_at TEXT NOT NULL -- ISO 8601 timestamp
);
CREATE INDEX idx_export_jobs_snapshot_id ON export_jobs(snapshot_id);
CREATE INDEX idx_export_jobs_status ON export_jobs(status);
CREATE INDEX idx_export_jobs_snapshot_status ON export_jobs(snapshot_id, status);
CREATE INDEX idx_export_jobs_claimable ON export_jobs(status, next_poll_at); -- For job claiming queries

Notes:

  • One row per node/edge definition in the mapping
  • Workers are stateless - all polling state persisted in database
  • claimed_by and claimed_at enable lease-based job ownership
  • next_poll_at and poll_count enable stateless Fibonacci backoff
  • sql, column_names, starburst_catalog denormalized so workers don’t need to fetch mapping
  • Snapshot status=‘ready’ only when ALL export_jobs for that snapshot are ‘completed’
  • Snapshot status=‘failed’ if ANY export_job has status=‘failed’

Status Transitions:

pending → claimed → submitted → completed
→ failed
↑__________| (lease expired, reset by reconciliation)
  • pending: Job created, waiting to be claimed
  • claimed: Worker claimed job, preparing to submit to Starburst
  • submitted: UNLOAD query submitted, polling for completion
  • completed: Query finished successfully, rows counted
  • failed: Query failed or error during processing

Job Claiming (Atomic):

-- Worker claims up to N pending jobs atomically
UPDATE export_jobs
SET status = 'claimed',
claimed_by = :worker_id,
claimed_at = :now
WHERE id IN (
SELECT id FROM export_jobs
WHERE status = 'pending'
ORDER BY created_at
LIMIT :limit
FOR UPDATE SKIP LOCKED
)
RETURNING *;

Pollable Jobs Query:

-- Get jobs ready for polling (submitted + poll time reached)
SELECT * FROM export_jobs
WHERE status = 'submitted'
AND next_poll_at <= :now
ORDER BY next_poll_at
LIMIT :limit
FOR UPDATE SKIP LOCKED;

Lease Expiry (Reconciliation):

-- Reset stale claimed jobs (lease expired after 10 minutes)
UPDATE export_jobs
SET status = 'pending',
claimed_by = NULL,
claimed_at = NULL
WHERE status = 'claimed'
AND claimed_at < :now - INTERVAL '10 minutes';

TableIndexPurpose
mappingsidx_mappings_ownerFilter by owner
mappingsidx_mappings_created_atSort by creation
snapshotsidx_snapshots_mapping_idList snapshots for mapping
snapshotsidx_snapshots_statusFilter by status
instancesidx_instances_ownerFilter by owner
instancesidx_instances_statusFilter by status
TableIndexPurpose
mappingsidx_mappings_nameText search on name
snapshotsidx_snapshots_ownerFilter by owner
snapshotsidx_snapshots_created_atSort by creation
instancesidx_instances_snapshot_idList instances for snapshot
instancesidx_instances_created_atSort by creation
export_jobsidx_export_jobs_snapshot_statusFind running jobs for snapshot

List mappings for owner with pagination:

SELECT m.*,
(SELECT COUNT(*) FROM snapshots s WHERE s.mapping_id = m.id) as snapshot_count
FROM mappings m
WHERE m.owner_username = ?
ORDER BY m.created_at DESC
LIMIT ? OFFSET ?;
-- Uses: idx_mappings_owner, idx_mappings_created_at

Get mapping with current version:

SELECT m.*, mv.node_definitions, mv.edge_definitions, mv.change_description, mv.created_at as version_created_at
FROM mappings m
JOIN mapping_versions mv ON m.id = mv.mapping_id AND m.current_version = mv.version
WHERE m.id = ?;

List snapshots for mapping (all versions):

SELECT s.*
FROM snapshots s
WHERE s.mapping_id = ?
ORDER BY s.created_at DESC
LIMIT ? OFFSET ?;
-- Uses: idx_snapshots_mapping_id

Check if mapping can be deleted (no snapshots exist):

SELECT COUNT(*) as snapshot_count
FROM snapshots
WHERE mapping_id = ?;
-- Deletion blocked if count > 0

Check if snapshot can be deleted (no active instances exist):

SELECT COUNT(*) as instance_count
FROM instances
WHERE snapshot_id = ?
AND status IN ('starting', 'running');
-- Deletion blocked if count > 0
-- Note: 'stopping' instances are being terminated; 'failed' instances are dead
-- Both can be cleaned up separately, so they don't block snapshot deletion

Count running instances for concurrency check:

-- Per-analyst limit
SELECT COUNT(*) as user_instance_count
FROM instances
WHERE owner_username = ?
AND status IN ('starting', 'running');
-- Cluster-wide limit
SELECT COUNT(*) as total_instance_count
FROM instances
WHERE status IN ('starting', 'running');

Find instances due for inactivity timeout:

SELECT id, name, owner_username, last_activity_at, inactivity_timeout
FROM instances
WHERE status = 'running'
AND inactivity_timeout IS NOT NULL
AND datetime(last_activity_at, '+' ||
CAST(SUBSTR(inactivity_timeout, 3, LENGTH(inactivity_timeout)-3) AS INTEGER) || ' hours')
< datetime('now');
-- Note: Duration parsing simplified; actual implementation needs proper ISO 8601 parsing

Search mappings by name/description:

SELECT *
FROM mappings
WHERE (name LIKE '%' || ? || '%' OR description LIKE '%' || ? || '%')
ORDER BY created_at DESC
LIMIT ? OFFSET ?;

-- migrations/001_initial_schema.sql
-- Users
CREATE TABLE users (...);
-- Mappings and versions
CREATE TABLE mappings (...);
CREATE TABLE mapping_versions (...);
-- Snapshots and instances
CREATE TABLE snapshots (...);
CREATE TABLE instances (...);
-- Configuration
CREATE TABLE global_config (...);
-- Favorites
CREATE TABLE user_favorites (...);
-- Export jobs
CREATE TABLE export_jobs (...);
-- All indexes
CREATE INDEX ...;

Format: {NNN}_{description}.sql

  • 001_initial_schema.sql
  • 002_add_user_favorites.sql
  • 003_add_export_jobs.sql

Each migration should have a corresponding down migration:

  • 001_initial_schema.down.sql - DROP all tables
  • 002_add_user_favorites.down.sql - DROP TABLE user_favorites
-- seeds/001_initial_config.sql
INSERT INTO global_config (key, value, description, updated_at, updated_by) VALUES
('lifecycle.mapping.default_ttl', NULL, 'Default mapping TTL (null = no expiry)', '2025-01-15T00:00:00Z', 'system'),
('lifecycle.mapping.default_inactivity', 'P30D', 'Default mapping inactivity timeout', '2025-01-15T00:00:00Z', 'system'),
('lifecycle.snapshot.default_ttl', 'P7D', 'Default snapshot TTL', '2025-01-15T00:00:00Z', 'system'),
('lifecycle.snapshot.default_inactivity', 'P3D', 'Default snapshot inactivity timeout', '2025-01-15T00:00:00Z', 'system'),
('lifecycle.instance.default_ttl', 'PT24H', 'Default instance TTL', '2025-01-15T00:00:00Z', 'system'),
('lifecycle.instance.default_inactivity', 'PT4H', 'Default instance inactivity timeout', '2025-01-15T00:00:00Z', 'system'),
('concurrency.per_analyst', '5', 'Max instances per analyst', '2025-01-15T00:00:00Z', 'system'),
('concurrency.cluster_total', '50', 'Max instances cluster-wide', '2025-01-15T00:00:00Z', 'system'),
('export.max_duration_seconds', '3600', 'Max export job duration before timeout (1 hour)', '2025-01-15T00:00:00Z', 'system');

These constraints must be enforced in application code:

ConstraintEnforcement
Mapping name max length255 characters
Description max length4000 characters
change_description required for version > 1Check on INSERT
TTL ≤ hard limitCompare against global_config
Cannot delete mapping with snapshotsQuery count before DELETE
Cannot delete snapshot with instancesQuery count before DELETE
ConstraintTypeTable
Primary keysUNIQUE, NOT NULLAll tables
Foreign keysREFERENCESSee table definitions
Status enumCHECKsnapshots, instances
Status enumCHECKinstances, export_jobs

See architectural.guardrails.md for the authoritative list of anti-patterns.

Key sections relevant to data model:

  • Database & Schema - No PostgreSQL-specific features, no algorithm results in DB
  • Query Patterns - No SELECT *, no ORM lazy loading, always use LIMIT
  • Resource Lifecycle - Respect deletion dependencies (instances → snapshots → mappings)
  • Authentication & Authorization - No password storage, encrypt sensitive data

Base-schema bootstrap (authoritative):

The base schema (users, mappings, snapshots, instances, export_jobs, global_config, etc.) is created at control-plane startup by SQLAlchemy’s metadata.create_all(), invoked from packages/control-plane/src/control_plane/infrastructure/database.py (init_database()await conn.run_sync(metadata.create_all)). The same path runs in E2E tests via packages/control-plane/tests/conftest.py.

DatabaseManager.startup() in that module calls init_database() only when settings.debug is true (local and E2E). In production, metadata.create_all is still the first-boot creation path, but Alembic is expected to be run once at install time against an empty database. The Alembic config acknowledges this dual path (packages/control-plane/alembic.ini):

In development/testing, tables are auto-created via metadata.create_all().

Alembic migrations (delta-only):

Alembic migrations under packages/control-plane/alembic/versions/ cover schema deltas on top of the baseline, not the baseline itself. As of ADR-149 the directory contains three migrations (the fresh-install baseline is still the SQLAlchemy metadata):

FileIntent
20250127_1502_440c6421ad9d_add_pending_snapshot_id_and_waiting_for.pyAdds instances.pending_snapshot_id FK and the waiting_for_snapshot status for the “create instance from mapping” feature.
20260204_add_memory_gb_column.pyAdds the nullable instances.memory_gb column for Phase 3 dynamic memory upgrade.
20260330_add_role_column_to_users.pyAdds users.role per ADR-104 (database-backed role management).

Bootstrap order for a fresh database:

  1. Run metadata.create_all() (via init_database() or the control-plane startup path with debug=True on a dedicated bootstrap run) to create the baseline tables.
  2. Run alembic upgrade head so the alembic_version table is stamped at the latest revision and any delta columns/constraints that aren’t in the SQLAlchemy model are applied. (For the three migrations above the columns and role are already in the SQLAlchemy models, so this is primarily to stamp the version table.)
  3. Start the control plane normally.

Known risk — migration-only production deploys would fail to bootstrap a fresh database. Because there is no alembic revision --autogenerate-style baseline migration covering the initial CREATE TABLE statements, running alembic upgrade head against an empty PostgreSQL database will fail: the first migration tries to ALTER TABLE instances ADD COLUMN pending_snapshot_id … before instances exists. Any production deploy path that relies on Alembic alone (for example, a future CI check that forbids metadata.create_all in production) must first introduce an explicit baseline migration that recreates the full schema from infrastructure.tables.metadata. This is tracked as part of ADR-149’s Tier-C.16 follow-ups.


See decision.log.md for consolidated open questions and architecture decision records.