API Specification: Admin and Ops Endpoints
API Specification: Admin and Ops Endpoints
Section titled “API Specification: Admin and Ops Endpoints”Overview
Section titled “Overview”REST API specification for administrative and operational endpoints in the Graph OLAP Platform Control Plane. Includes configuration management, cluster operations, and export queue management.
Note: Audit logging is handled by the external observability stack, not this API.
Prerequisites
Section titled “Prerequisites”- api.common.spec.md - Authentication, base URL, data formats, response patterns, error codes
- requirements.md - User roles (Admin, Ops)
- data.model.spec.md - Database schema for global_config, user_favorites
Role Requirements
Section titled “Role Requirements”| Endpoint Category | Required Role |
|---|---|
Config (/config/*) | Ops |
Cluster (/cluster/*) | Ops |
Background Jobs (/ops/jobs/*) | Ops |
System State (/ops/state, /ops/export-jobs) | Ops |
Schema Metadata (/schema/*) | Any authenticated user |
Schema Admin (/schema/admin/*, /schema/stats) | Admin |
Bulk Operations (/admin/resources/*) | Admin or Ops |
Export Jobs (scoped) (/api/export-jobs) | All authenticated (scoped) |
E2E Cleanup (/admin/e2e-cleanup) | Admin or Ops |
Note: For Export Jobs (scoped), Analyst sees own jobs only; Admin/Ops see all jobs.
Note: Favorites moved to api.favorites.spec.md (all authenticated users).
See
../authorization.spec.mdfor the complete RBAC matrix.
Configuration Endpoints (Ops Only)
Section titled “Configuration Endpoints (Ops Only)”Get Lifecycle Configuration
Section titled “Get Lifecycle Configuration”GET /config/lifecycleResponse: 200 OK
{ "data": { "mapping": { "default_ttl": null, "default_inactivity": "P30D", "max_ttl": "P365D" }, "snapshot": { "default_ttl": "P7D", "default_inactivity": "P3D", "max_ttl": "P30D" }, "instance": { "default_ttl": "PT24H", "default_inactivity": "PT4H", "max_ttl": "P7D" } }}Update Lifecycle Configuration
Section titled “Update Lifecycle Configuration”PUT /config/lifecycleRequest Body:
{ "mapping": { "default_ttl": null, "default_inactivity": "P30D", "max_ttl": "P365D" }, "snapshot": { "default_ttl": "P7D", "default_inactivity": "P3D", "max_ttl": "P30D" }, "instance": { "default_ttl": "PT24H", "default_inactivity": "PT4H", "max_ttl": "P7D" }}Response: 200 OK
{ "data": { "updated": true, "updated_at": "2025-01-15T10:30:00Z" }}Get Concurrency Configuration
Section titled “Get Concurrency Configuration”GET /config/concurrencyResponse: 200 OK
{ "data": { "per_analyst": 5, "cluster_total": 50 }}Update Concurrency Configuration
Section titled “Update Concurrency Configuration”PUT /config/concurrencyRequest Body:
{ "per_analyst": 10, "cluster_total": 100}Response: 200 OK
{ "data": { "per_analyst": 10, "cluster_total": 100, "updated_at": "2025-01-15T10:30:00Z" }}Get Maintenance Mode
Section titled “Get Maintenance Mode”GET /config/maintenanceResponse: 200 OK
{ "data": { "enabled": false, "message": "", "updated_at": "2025-01-15T10:30:00Z", "updated_by": "ops_user" }}Set Maintenance Mode
Section titled “Set Maintenance Mode”PUT /config/maintenanceRequest:
{ "enabled": true, "message": "Scheduled maintenance until 14:00 UTC"}Response: 200 OK - Returns updated status.
Get Export Configuration
Section titled “Get Export Configuration”GET /config/exportResponse: 200 OK
{ "data": { "max_duration_seconds": 3600, "updated_at": "2025-01-15T10:30:00Z", "updated_by": "ops_user" }}Update Export Configuration
Section titled “Update Export Configuration”PUT /config/exportRequest:
{ "max_duration_seconds": 7200}| Field | Type | Required | Description |
|---|---|---|---|
| max_duration_seconds | integer | No | Maximum time for a single export job before timeout (default: 3600). Jobs exceeding this are marked failed by reconciliation. |
Response: 200 OK
{ "data": { "max_duration_seconds": 7200, "updated_at": "2025-01-15T10:35:00Z", "updated_by": "ops_user" }}Cluster Endpoints (Ops Only)
Section titled “Cluster Endpoints (Ops Only)”Get Cluster Health
Section titled “Get Cluster Health”GET /cluster/healthResponse: 200 OK
{ "data": { "status": "healthy", "components": { "database": {"status": "connected", "latency_ms": 5}, "kubernetes": {"status": "connected"}, "starburst": {"status": "connected", "latency_ms": 120} }, "checked_at": "2025-01-15T10:30:00Z" }}Response: 503 Service Unavailable
{ "data": { "status": "degraded", "components": { "database": {"status": "connected", "latency_ms": 5}, "kubernetes": {"status": "connected"}, "starburst": {"status": "unreachable", "error": "connection timeout"} }, "checked_at": "2025-01-15T10:30:00Z" }}Get Cluster Instances Summary
Section titled “Get Cluster Instances Summary”GET /cluster/instancesResponse: 200 OK
{ "data": { "total": 25, "by_status": { "starting": 2, "running": 20, "stopping": 1, "failed": 2 }, "by_owner": [ {"owner_username": "alice", "count": 5}, {"owner_username": "bob", "count": 3} ], "limits": { "per_analyst": 5, "cluster_total": 50, "cluster_used": 25, "cluster_available": 25 } }}Background Jobs Endpoints (Ops Only)
Section titled “Background Jobs Endpoints (Ops Only)”Trigger Background Job
Section titled “Trigger Background Job”POST /api/ops/jobs/triggerManually triggers a background job for immediate execution. Useful for debugging, smoke tests after deployment, and incident response.
Rate Limiting: 1 request per minute per job (prevents accidental job spam).
Request Body:
{ "job_name": "reconciliation", "reason": "post-deployment smoke test"}| Field | Type | Required | Description |
|---|---|---|---|
| job_name | string | Yes | Job to trigger: reconciliation, lifecycle, export_reconciliation, schema_cache, resource_monitor |
| reason | string | Yes | Reason for manual trigger (audit log, 1-500 chars) |
Response: 200 OK
{ "data": { "job_name": "reconciliation", "status": "queued", "triggered_at": "2025-01-15T10:30:00Z", "triggered_by": "ops.user", "reason": "post-deployment smoke test" }}Error: 400 Bad Request - Invalid job name or missing reason
{ "error": { "code": "VALIDATION_ERROR", "message": "Invalid job_name. Must be one of: reconciliation, lifecycle, export_reconciliation, schema_cache, resource_monitor" }}Error: 429 Too Many Requests - Rate limit exceeded (1 per minute per job)
{ "error": { "code": "RATE_LIMIT_EXCEEDED", "message": "Job 'reconciliation' was triggered 30 seconds ago. Please wait 30 seconds before triggering again.", "details": { "retry_after_seconds": 30 } }}Error: 403 Forbidden - Requires ops role
Get Background Job Status
Section titled “Get Background Job Status”GET /api/ops/jobs/statusReturns status of all background jobs including health and last execution time.
Response: 200 OK
{ "data": { "jobs": [ { "name": "reconciliation", "schedule": "every 5 minutes", "last_success_at": "2025-01-15T10:25:00Z", "last_failure_at": null, "consecutive_failures": 0, "health_status": "healthy" }, { "name": "lifecycle", "schedule": "every 5 minutes", "last_success_at": "2025-01-15T10:24:00Z", "last_failure_at": null, "consecutive_failures": 0, "health_status": "healthy" }, { "name": "export_reconciliation", "schedule": "every 5 seconds", "last_success_at": "2025-01-15T10:23:00Z", "last_failure_at": "2025-01-15T10:18:00Z", "consecutive_failures": 0, "health_status": "healthy" }, { "name": "schema_cache", "schedule": "every 24 hours", "last_success_at": "2025-01-15T02:00:00Z", "last_failure_at": null, "consecutive_failures": 0, "health_status": "healthy" }, { "name": "resource_monitor", "schedule": "every 60 seconds", "last_success_at": "2025-01-15T10:29:00Z", "last_failure_at": null, "consecutive_failures": 0, "health_status": "healthy" } ], "retrieved_at": "2025-01-15T10:30:00Z" }}Health Status Values:
healthy- Job is executing successfully (consecutive_failures < 3)unhealthy- Job has failed 3+ times consecutively
Error: 403 Forbidden - Requires ops role
Get System State
Section titled “Get System State”GET /api/ops/stateReturns current system state snapshot including resource counts by status. Useful for operational dashboards and debugging.
Response: 200 OK
{ "data": { "instances": { "total": 25, "by_status": { "starting": 2, "running": 20, "stopping": 1, "stopped": 0, "failed": 2, "terminated": 0 }, "without_pod_name": 0 }, "snapshots": { "total": 150, "by_status": { "pending": 2, "creating": 1, "ready": 140, "failed": 7 } }, "export_jobs": { "by_status": { "pending": 5, "claimed": 2, "completed": 140, "failed": 3 } }, "retrieved_at": "2025-01-15T10:30:00Z" }}Use Cases:
- Verify lifecycle job enforcement (instances should transition to terminated)
- Verify reconciliation job cleanup (instances_without_pod_name should be 0)
- Monitor export job queue depth
- E2E test assertions
Error: 403 Forbidden - Requires ops role
Get Export Jobs for Debugging
Section titled “Get Export Jobs for Debugging”GET /api/ops/export-jobsReturns export jobs for debugging export worker issues. Similar to /exports but optimized for ops troubleshooting.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
| status | string | - | Filter: pending, claimed, completed, failed |
| limit | integer | 100 | Max records (max: 100) |
Response: 200 OK
{ "data": [ { "id": 123, "snapshot_id": "snapshot-uuid", "entity_type": "node", "entity_name": "Customer", "status": "pending", "attempts": 0, "created_at": "2025-01-15T10:30:00Z", "claimed_at": null, "completed_at": null, "failed_at": null, "error_message": null }, { "id": 122, "snapshot_id": "snapshot-uuid", "entity_type": "edge", "entity_name": "PURCHASED", "status": "failed", "attempts": 3, "created_at": "2025-01-15T10:25:00Z", "claimed_at": "2025-01-15T10:25:10Z", "completed_at": null, "failed_at": "2025-01-15T10:26:00Z", "error_message": "Table not found: analytics.purchases" } ]}Error: 403 Forbidden - Requires ops role
Favorites
Section titled “Favorites”See api.favorites.spec.md for favorites endpoints (available to all authenticated users).
Admin Bulk Operations (Admin Only)
Section titled “Admin Bulk Operations (Admin Only)”Bulk Delete Resources
Section titled “Bulk Delete Resources”DELETE /api/admin/resources/bulkSafely deletes multiple resources matching filters. Designed for test cleanup and operational maintenance with comprehensive safety mechanisms.
Safety Features:
- Admin role required
- At least one filter required (prevents accidental “delete all”)
- Max 100 deletions per request
- Expected count validation (confirm you know what you’re deleting)
- Dry run mode (preview before deleting)
- Full audit logging (who, what, when, why)
- Per-resource error tracking (partial failures don’t block others)
- Filter validation (prevent overly broad matches)
- Authorization checks (admin can delete any, owner-only for non-admins)
Implementation Behavior (ADR-043):
For instances, bulk delete performs complete, synchronous resource cleanup:
- Deletes Kubernetes resources (pod, service, ingress) FIRST
- Deletes database record LAST
- Returns 200 OK when resources are GONE, not “eventually gone”
- Parallel execution (10 concurrent deletions) for performance (~3 seconds for 10 instances)
- No orphaned K8s resources left behind (unlike previous lazy cleanup pattern)
For snapshots and mappings, bulk delete performs simple database deletion (no Kubernetes resources to clean up).
See Also:
- InstanceService.delete() - Deletion implementation
- ADR-043 - Architecture decision
- Background Jobs - Reconciliation job role change
Request Body:
{ "resource_type": "instance", "filters": { "name_prefix": "E2ETest-", "older_than_hours": 24, "status": "terminated", "created_by": "e2e-test-user" }, "reason": "cleanup old e2e test instances", "expected_count": 15, "dry_run": false}| Field | Type | Required | Description |
|---|---|---|---|
| resource_type | string | Yes | Resource type: instance, snapshot, mapping |
| filters | object | Yes | Filters (at least one required) |
| filters.name_prefix | string | No | Match resources starting with prefix |
| filters.created_by | string | No | Match resources created by username |
| filters.older_than_hours | integer | No | Match resources older than N hours |
| filters.status | string | No | Match resources with specific status |
| reason | string | Yes | Reason for deletion (audit log, 1-500 chars) |
| expected_count | integer | No | Expected number of matches (safety check) |
| dry_run | boolean | No | If true, return matches without deleting (default: false) |
Recommended Workflow:
- Step 1: Dry run - Preview what would be deleted
- Step 2: Verify - Check matched_ids and matched_count
- Step 3: Delete - Use expected_count from dry run for safety
Response: 200 OK (Dry Run)
{ "data": { "dry_run": true, "matched_count": 15, "matched_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], "deleted_count": 0, "deleted_ids": [], "failed_ids": [], "errors": [] }}Response: 200 OK (Actual Delete)
{ "data": { "dry_run": false, "matched_count": 15, "matched_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], "deleted_count": 14, "deleted_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], "failed_ids": [15], "errors": [ { "resource_id": 15, "error": "Cannot delete instance with active pod" } ] }}Error: 400 Bad Request - No filters provided
{ "error": { "code": "VALIDATION_ERROR", "message": "At least one filter is required to prevent accidental bulk deletion", "details": { "available_filters": ["name_prefix", "created_by", "older_than_hours", "status"] } }}Error: 400 Bad Request - Too many matches (> 100)
{ "error": { "code": "VALIDATION_ERROR", "message": "Matched 250 resources. Bulk delete is limited to 100 resources per request. Use more specific filters.", "details": { "matched_count": 250, "max_allowed": 100 } }}Error: 400 Bad Request - Expected count mismatch
{ "error": { "code": "VALIDATION_ERROR", "message": "Expected count mismatch. Found 15 resources, but expected 10. Data may have changed since dry run.", "details": { "expected_count": 10, "actual_count": 15 } }}Error: 403 Forbidden - Requires admin role
{ "error": { "code": "FORBIDDEN", "message": "Bulk delete requires admin role" }}Error: 422 Unprocessable Entity - Invalid resource type
{ "error": { "code": "VALIDATION_ERROR", "message": "Invalid resource_type. Must be one of: instance, snapshot, mapping" }}E2E Test Cleanup
Section titled “E2E Test Cleanup”DELETE /api/admin/e2e-cleanupDeletes ALL resources owned by E2E test users. Called before and after E2E test runs to ensure clean state.
Requires: Admin or Ops role
Cleanup Order:
- Instances (including K8s wrapper pods)
- Snapshots (including GCS files)
- Mappings
- Force-terminate any orphaned K8s pods by owner-email label
Response: 200 OK
{ "data": { "instances_deleted": 5, "snapshots_deleted": 3, "mappings_deleted": 2, "pods_terminated": 1, "gcs_files_deleted": 15, "gcs_bytes_deleted": 1073741824, "errors": [], "success": true }}Response: 200 OK (with partial failures)
{ "data": { "instances_deleted": 4, "snapshots_deleted": 3, "mappings_deleted": 2, "pods_terminated": 0, "gcs_files_deleted": 10, "gcs_bytes_deleted": 536870912, "errors": [ "Failed to delete instance 123: timeout waiting for pod termination" ], "success": false }}Error: 403 Forbidden - Requires admin or ops role
{ "error": { "code": "FORBIDDEN", "message": "E2E cleanup requires admin or ops role" }}Notes:
- E2E test users are configured via
E2E_TEST_USER_EMAILSenvironment variable - This endpoint is idempotent - safe to call multiple times
- Errors are collected but don’t stop the cleanup process
- GCS cleanup requires configured
GCS_BUCKETandGCP_PROJECT
Schema Metadata API
Section titled “Schema Metadata API”The Schema Metadata API provides read-only access to cached Starburst schema metadata for the mapping builder UI. All data is served from an in-memory cache refreshed every 24 hours.
Performance: ~1μs for lookups, ~100μs for searches (in-memory)
List Catalogs
Section titled “List Catalogs”GET /api/schema/catalogsReturns all cached Starburst catalogs.
Response: 200 OK
{ "data": [ { "catalog_name": "hive", "schema_count": 15, "cached_at": "2025-01-15T02:00:00Z" }, { "catalog_name": "iceberg", "schema_count": 8, "cached_at": "2025-01-15T02:00:00Z" } ]}List Schemas
Section titled “List Schemas”GET /api/schema/catalogs/:catalog/schemasReturns all schemas in a catalog.
Response: 200 OK
{ "data": [ { "catalog_name": "hive", "schema_name": "analytics", "table_count": 45, "cached_at": "2025-01-15T02:00:00Z" }, { "catalog_name": "hive", "schema_name": "raw_data", "table_count": 120, "cached_at": "2025-01-15T02:00:00Z" } ]}Error: 404 Not Found - Catalog not found in cache
List Tables
Section titled “List Tables”GET /api/schema/catalogs/:catalog/schemas/:schema/tablesReturns all tables in a schema.
Response: 200 OK
{ "data": [ { "catalog_name": "hive", "schema_name": "analytics", "table_name": "customers", "table_type": "TABLE", "column_count": 12, "cached_at": "2025-01-15T02:00:00Z" }, { "catalog_name": "hive", "schema_name": "analytics", "table_name": "orders", "table_type": "TABLE", "column_count": 8, "cached_at": "2025-01-15T02:00:00Z" } ]}Error: 404 Not Found - Schema not found in cache
List Columns
Section titled “List Columns”GET /api/schema/catalogs/:catalog/schemas/:schema/tables/:table/columnsReturns all columns for a table.
Response: 200 OK
{ "data": [ { "catalog_name": "hive", "schema_name": "analytics", "table_name": "customers", "column_name": "customer_id", "data_type": "varchar", "is_nullable": false, "ordinal_position": 1, "column_default": null, "cached_at": "2025-01-15T02:00:00Z" }, { "catalog_name": "hive", "schema_name": "analytics", "table_name": "customers", "column_name": "name", "data_type": "varchar", "is_nullable": true, "ordinal_position": 2, "column_default": null, "cached_at": "2025-01-15T02:00:00Z" } ]}Error: 404 Not Found - Table not found in cache
Search Tables
Section titled “Search Tables”GET /api/schema/search/tablesSearch tables by name pattern (prefix match, case-insensitive).
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
| q | string | - | Required. Search pattern (prefix match) |
| limit | integer | 100 | Max results (max: 1000) |
Response: 200 OK
{ "data": [ { "catalog_name": "hive", "schema_name": "analytics", "table_name": "customers", "table_type": "TABLE", "column_count": 12, "cached_at": "2025-01-15T02:00:00Z" }, { "catalog_name": "hive", "schema_name": "analytics", "table_name": "customer_orders", "table_type": "TABLE", "column_count": 8, "cached_at": "2025-01-15T02:00:00Z" } ]}Search Columns
Section titled “Search Columns”GET /api/schema/search/columnsSearch columns by name pattern (prefix match, case-insensitive).
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
| q | string | - | Required. Search pattern (prefix match) |
| limit | integer | 100 | Max results (max: 1000) |
Response: 200 OK
{ "data": [ { "catalog_name": "hive", "schema_name": "analytics", "table_name": "customers", "column_name": "email", "data_type": "varchar", "is_nullable": true, "ordinal_position": 3, "column_default": null, "cached_at": "2025-01-15T02:00:00Z" }, { "catalog_name": "hive", "schema_name": "analytics", "table_name": "users", "column_name": "email_address", "data_type": "varchar", "is_nullable": false, "ordinal_position": 2, "column_default": null, "cached_at": "2025-01-15T02:00:00Z" } ]}Trigger Cache Refresh (Admin Only)
Section titled “Trigger Cache Refresh (Admin Only)”POST /api/schema/admin/refreshManually triggers schema cache refresh. Starts background task and returns immediately.
Requires: Admin role
Response: 200 OK
{ "data": { "status": "refresh triggered" }}Error: 403 Forbidden - Requires admin role
Get Cache Statistics (Admin Only)
Section titled “Get Cache Statistics (Admin Only)”GET /api/schema/statsReturns schema cache statistics.
Requires: Admin role
Response: 200 OK
{ "data": { "total_catalogs": 3, "total_schemas": 25, "total_tables": 450, "total_columns": 3200, "last_refresh": "2025-01-15T02:00:00Z", "index_size_bytes": 1048576 }}Error: 403 Forbidden - Requires admin role
Error Codes
Section titled “Error Codes”See api.common.spec.md for the complete error codes reference.