Skip to content

API Specification: Admin and Ops Endpoints

API Specification: Admin and Ops Endpoints

Section titled “API Specification: Admin and Ops Endpoints”

REST API specification for administrative and operational endpoints in the Graph OLAP Platform Control Plane. Includes configuration management, cluster operations, and export queue management.

Note: Audit logging is handled by the external observability stack, not this API.

Endpoint CategoryRequired Role
Config (/config/*)Ops
Cluster (/cluster/*)Ops
Background Jobs (/ops/jobs/*)Ops
System State (/ops/state, /ops/export-jobs)Ops
Schema Metadata (/schema/*)Any authenticated user
Schema Admin (/schema/admin/*, /schema/stats)Admin
Bulk Operations (/admin/resources/*)Admin or Ops
Export Jobs (scoped) (/api/export-jobs)All authenticated (scoped)
E2E Cleanup (/admin/e2e-cleanup)Admin or Ops

Note: For Export Jobs (scoped), Analyst sees own jobs only; Admin/Ops see all jobs.

Note: Favorites moved to api.favorites.spec.md (all authenticated users).

See ../authorization.spec.md for the complete RBAC matrix.


GET /config/lifecycle

Response: 200 OK

{
"data": {
"mapping": {
"default_ttl": null,
"default_inactivity": "P30D",
"max_ttl": "P365D"
},
"snapshot": {
"default_ttl": "P7D",
"default_inactivity": "P3D",
"max_ttl": "P30D"
},
"instance": {
"default_ttl": "PT24H",
"default_inactivity": "PT4H",
"max_ttl": "P7D"
}
}
}

PUT /config/lifecycle

Request Body:

{
"mapping": {
"default_ttl": null,
"default_inactivity": "P30D",
"max_ttl": "P365D"
},
"snapshot": {
"default_ttl": "P7D",
"default_inactivity": "P3D",
"max_ttl": "P30D"
},
"instance": {
"default_ttl": "PT24H",
"default_inactivity": "PT4H",
"max_ttl": "P7D"
}
}

Response: 200 OK

{
"data": {
"updated": true,
"updated_at": "2025-01-15T10:30:00Z"
}
}

GET /config/concurrency

Response: 200 OK

{
"data": {
"per_analyst": 5,
"cluster_total": 50
}
}

PUT /config/concurrency

Request Body:

{
"per_analyst": 10,
"cluster_total": 100
}

Response: 200 OK

{
"data": {
"per_analyst": 10,
"cluster_total": 100,
"updated_at": "2025-01-15T10:30:00Z"
}
}

GET /config/maintenance

Response: 200 OK

{
"data": {
"enabled": false,
"message": "",
"updated_at": "2025-01-15T10:30:00Z",
"updated_by": "ops_user"
}
}

PUT /config/maintenance

Request:

{
"enabled": true,
"message": "Scheduled maintenance until 14:00 UTC"
}

Response: 200 OK - Returns updated status.


GET /config/export

Response: 200 OK

{
"data": {
"max_duration_seconds": 3600,
"updated_at": "2025-01-15T10:30:00Z",
"updated_by": "ops_user"
}
}

PUT /config/export

Request:

{
"max_duration_seconds": 7200
}
FieldTypeRequiredDescription
max_duration_secondsintegerNoMaximum time for a single export job before timeout (default: 3600). Jobs exceeding this are marked failed by reconciliation.

Response: 200 OK

{
"data": {
"max_duration_seconds": 7200,
"updated_at": "2025-01-15T10:35:00Z",
"updated_by": "ops_user"
}
}

GET /cluster/health

Response: 200 OK

{
"data": {
"status": "healthy",
"components": {
"database": {"status": "connected", "latency_ms": 5},
"kubernetes": {"status": "connected"},
"starburst": {"status": "connected", "latency_ms": 120}
},
"checked_at": "2025-01-15T10:30:00Z"
}
}

Response: 503 Service Unavailable

{
"data": {
"status": "degraded",
"components": {
"database": {"status": "connected", "latency_ms": 5},
"kubernetes": {"status": "connected"},
"starburst": {"status": "unreachable", "error": "connection timeout"}
},
"checked_at": "2025-01-15T10:30:00Z"
}
}

GET /cluster/instances

Response: 200 OK

{
"data": {
"total": 25,
"by_status": {
"starting": 2,
"running": 20,
"stopping": 1,
"failed": 2
},
"by_owner": [
{"owner_username": "alice", "count": 5},
{"owner_username": "bob", "count": 3}
],
"limits": {
"per_analyst": 5,
"cluster_total": 50,
"cluster_used": 25,
"cluster_available": 25
}
}
}

POST /api/ops/jobs/trigger

Manually triggers a background job for immediate execution. Useful for debugging, smoke tests after deployment, and incident response.

Rate Limiting: 1 request per minute per job (prevents accidental job spam).

Request Body:

{
"job_name": "reconciliation",
"reason": "post-deployment smoke test"
}
FieldTypeRequiredDescription
job_namestringYesJob to trigger: reconciliation, lifecycle, export_reconciliation, schema_cache, resource_monitor
reasonstringYesReason for manual trigger (audit log, 1-500 chars)

Response: 200 OK

{
"data": {
"job_name": "reconciliation",
"status": "queued",
"triggered_at": "2025-01-15T10:30:00Z",
"triggered_by": "ops.user",
"reason": "post-deployment smoke test"
}
}

Error: 400 Bad Request - Invalid job name or missing reason

{
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid job_name. Must be one of: reconciliation, lifecycle, export_reconciliation, schema_cache, resource_monitor"
}
}

Error: 429 Too Many Requests - Rate limit exceeded (1 per minute per job)

{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Job 'reconciliation' was triggered 30 seconds ago. Please wait 30 seconds before triggering again.",
"details": {
"retry_after_seconds": 30
}
}
}

Error: 403 Forbidden - Requires ops role


GET /api/ops/jobs/status

Returns status of all background jobs including health and last execution time.

Response: 200 OK

{
"data": {
"jobs": [
{
"name": "reconciliation",
"schedule": "every 5 minutes",
"last_success_at": "2025-01-15T10:25:00Z",
"last_failure_at": null,
"consecutive_failures": 0,
"health_status": "healthy"
},
{
"name": "lifecycle",
"schedule": "every 5 minutes",
"last_success_at": "2025-01-15T10:24:00Z",
"last_failure_at": null,
"consecutive_failures": 0,
"health_status": "healthy"
},
{
"name": "export_reconciliation",
"schedule": "every 5 seconds",
"last_success_at": "2025-01-15T10:23:00Z",
"last_failure_at": "2025-01-15T10:18:00Z",
"consecutive_failures": 0,
"health_status": "healthy"
},
{
"name": "schema_cache",
"schedule": "every 24 hours",
"last_success_at": "2025-01-15T02:00:00Z",
"last_failure_at": null,
"consecutive_failures": 0,
"health_status": "healthy"
},
{
"name": "resource_monitor",
"schedule": "every 60 seconds",
"last_success_at": "2025-01-15T10:29:00Z",
"last_failure_at": null,
"consecutive_failures": 0,
"health_status": "healthy"
}
],
"retrieved_at": "2025-01-15T10:30:00Z"
}
}

Health Status Values:

  • healthy - Job is executing successfully (consecutive_failures < 3)
  • unhealthy - Job has failed 3+ times consecutively

Error: 403 Forbidden - Requires ops role


GET /api/ops/state

Returns current system state snapshot including resource counts by status. Useful for operational dashboards and debugging.

Response: 200 OK

{
"data": {
"instances": {
"total": 25,
"by_status": {
"starting": 2,
"running": 20,
"stopping": 1,
"stopped": 0,
"failed": 2,
"terminated": 0
},
"without_pod_name": 0
},
"snapshots": {
"total": 150,
"by_status": {
"pending": 2,
"creating": 1,
"ready": 140,
"failed": 7
}
},
"export_jobs": {
"by_status": {
"pending": 5,
"claimed": 2,
"completed": 140,
"failed": 3
}
},
"retrieved_at": "2025-01-15T10:30:00Z"
}
}

Use Cases:

  • Verify lifecycle job enforcement (instances should transition to terminated)
  • Verify reconciliation job cleanup (instances_without_pod_name should be 0)
  • Monitor export job queue depth
  • E2E test assertions

Error: 403 Forbidden - Requires ops role


GET /api/ops/export-jobs

Returns export jobs for debugging export worker issues. Similar to /exports but optimized for ops troubleshooting.

Query Parameters:

ParameterTypeDefaultDescription
statusstring-Filter: pending, claimed, completed, failed
limitinteger100Max records (max: 100)

Response: 200 OK

{
"data": [
{
"id": 123,
"snapshot_id": "snapshot-uuid",
"entity_type": "node",
"entity_name": "Customer",
"status": "pending",
"attempts": 0,
"created_at": "2025-01-15T10:30:00Z",
"claimed_at": null,
"completed_at": null,
"failed_at": null,
"error_message": null
},
{
"id": 122,
"snapshot_id": "snapshot-uuid",
"entity_type": "edge",
"entity_name": "PURCHASED",
"status": "failed",
"attempts": 3,
"created_at": "2025-01-15T10:25:00Z",
"claimed_at": "2025-01-15T10:25:10Z",
"completed_at": null,
"failed_at": "2025-01-15T10:26:00Z",
"error_message": "Table not found: analytics.purchases"
}
]
}

Error: 403 Forbidden - Requires ops role


See api.favorites.spec.md for favorites endpoints (available to all authenticated users).


DELETE /api/admin/resources/bulk

Safely deletes multiple resources matching filters. Designed for test cleanup and operational maintenance with comprehensive safety mechanisms.

Safety Features:

  1. Admin role required
  2. At least one filter required (prevents accidental “delete all”)
  3. Max 100 deletions per request
  4. Expected count validation (confirm you know what you’re deleting)
  5. Dry run mode (preview before deleting)
  6. Full audit logging (who, what, when, why)
  7. Per-resource error tracking (partial failures don’t block others)
  8. Filter validation (prevent overly broad matches)
  9. Authorization checks (admin can delete any, owner-only for non-admins)

Implementation Behavior (ADR-043):

For instances, bulk delete performs complete, synchronous resource cleanup:

  • Deletes Kubernetes resources (pod, service, ingress) FIRST
  • Deletes database record LAST
  • Returns 200 OK when resources are GONE, not “eventually gone”
  • Parallel execution (10 concurrent deletions) for performance (~3 seconds for 10 instances)
  • No orphaned K8s resources left behind (unlike previous lazy cleanup pattern)

For snapshots and mappings, bulk delete performs simple database deletion (no Kubernetes resources to clean up).

See Also:

Request Body:

{
"resource_type": "instance",
"filters": {
"name_prefix": "E2ETest-",
"older_than_hours": 24,
"status": "terminated",
"created_by": "e2e-test-user"
},
"reason": "cleanup old e2e test instances",
"expected_count": 15,
"dry_run": false
}
FieldTypeRequiredDescription
resource_typestringYesResource type: instance, snapshot, mapping
filtersobjectYesFilters (at least one required)
filters.name_prefixstringNoMatch resources starting with prefix
filters.created_bystringNoMatch resources created by username
filters.older_than_hoursintegerNoMatch resources older than N hours
filters.statusstringNoMatch resources with specific status
reasonstringYesReason for deletion (audit log, 1-500 chars)
expected_countintegerNoExpected number of matches (safety check)
dry_runbooleanNoIf true, return matches without deleting (default: false)

Recommended Workflow:

  1. Step 1: Dry run - Preview what would be deleted
  2. Step 2: Verify - Check matched_ids and matched_count
  3. Step 3: Delete - Use expected_count from dry run for safety

Response: 200 OK (Dry Run)

{
"data": {
"dry_run": true,
"matched_count": 15,
"matched_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
"deleted_count": 0,
"deleted_ids": [],
"failed_ids": [],
"errors": []
}
}

Response: 200 OK (Actual Delete)

{
"data": {
"dry_run": false,
"matched_count": 15,
"matched_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
"deleted_count": 14,
"deleted_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
"failed_ids": [15],
"errors": [
{
"resource_id": 15,
"error": "Cannot delete instance with active pod"
}
]
}
}

Error: 400 Bad Request - No filters provided

{
"error": {
"code": "VALIDATION_ERROR",
"message": "At least one filter is required to prevent accidental bulk deletion",
"details": {
"available_filters": ["name_prefix", "created_by", "older_than_hours", "status"]
}
}
}

Error: 400 Bad Request - Too many matches (> 100)

{
"error": {
"code": "VALIDATION_ERROR",
"message": "Matched 250 resources. Bulk delete is limited to 100 resources per request. Use more specific filters.",
"details": {
"matched_count": 250,
"max_allowed": 100
}
}
}

Error: 400 Bad Request - Expected count mismatch

{
"error": {
"code": "VALIDATION_ERROR",
"message": "Expected count mismatch. Found 15 resources, but expected 10. Data may have changed since dry run.",
"details": {
"expected_count": 10,
"actual_count": 15
}
}
}

Error: 403 Forbidden - Requires admin role

{
"error": {
"code": "FORBIDDEN",
"message": "Bulk delete requires admin role"
}
}

Error: 422 Unprocessable Entity - Invalid resource type

{
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid resource_type. Must be one of: instance, snapshot, mapping"
}
}

DELETE /api/admin/e2e-cleanup

Deletes ALL resources owned by E2E test users. Called before and after E2E test runs to ensure clean state.

Requires: Admin or Ops role

Cleanup Order:

  1. Instances (including K8s wrapper pods)
  2. Snapshots (including GCS files)
  3. Mappings
  4. Force-terminate any orphaned K8s pods by owner-email label

Response: 200 OK

{
"data": {
"users_processed": ["[email protected]"],
"instances_deleted": 5,
"snapshots_deleted": 3,
"mappings_deleted": 2,
"pods_terminated": 1,
"gcs_files_deleted": 15,
"gcs_bytes_deleted": 1073741824,
"errors": [],
"success": true
}
}

Response: 200 OK (with partial failures)

{
"data": {
"users_processed": ["[email protected]"],
"instances_deleted": 4,
"snapshots_deleted": 3,
"mappings_deleted": 2,
"pods_terminated": 0,
"gcs_files_deleted": 10,
"gcs_bytes_deleted": 536870912,
"errors": [
"Failed to delete instance 123: timeout waiting for pod termination"
],
"success": false
}
}

Error: 403 Forbidden - Requires admin or ops role

{
"error": {
"code": "FORBIDDEN",
"message": "E2E cleanup requires admin or ops role"
}
}

Notes:

  • E2E test users are configured via E2E_TEST_USER_EMAILS environment variable
  • This endpoint is idempotent - safe to call multiple times
  • Errors are collected but don’t stop the cleanup process
  • GCS cleanup requires configured GCS_BUCKET and GCP_PROJECT

The Schema Metadata API provides read-only access to cached Starburst schema metadata for the mapping builder UI. All data is served from an in-memory cache refreshed every 24 hours.

Performance: ~1μs for lookups, ~100μs for searches (in-memory)

GET /api/schema/catalogs

Returns all cached Starburst catalogs.

Response: 200 OK

{
"data": [
{
"catalog_name": "hive",
"schema_count": 15,
"cached_at": "2025-01-15T02:00:00Z"
},
{
"catalog_name": "iceberg",
"schema_count": 8,
"cached_at": "2025-01-15T02:00:00Z"
}
]
}

GET /api/schema/catalogs/:catalog/schemas

Returns all schemas in a catalog.

Response: 200 OK

{
"data": [
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_count": 45,
"cached_at": "2025-01-15T02:00:00Z"
},
{
"catalog_name": "hive",
"schema_name": "raw_data",
"table_count": 120,
"cached_at": "2025-01-15T02:00:00Z"
}
]
}

Error: 404 Not Found - Catalog not found in cache


GET /api/schema/catalogs/:catalog/schemas/:schema/tables

Returns all tables in a schema.

Response: 200 OK

{
"data": [
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_name": "customers",
"table_type": "TABLE",
"column_count": 12,
"cached_at": "2025-01-15T02:00:00Z"
},
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_name": "orders",
"table_type": "TABLE",
"column_count": 8,
"cached_at": "2025-01-15T02:00:00Z"
}
]
}

Error: 404 Not Found - Schema not found in cache


GET /api/schema/catalogs/:catalog/schemas/:schema/tables/:table/columns

Returns all columns for a table.

Response: 200 OK

{
"data": [
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_name": "customers",
"column_name": "customer_id",
"data_type": "varchar",
"is_nullable": false,
"ordinal_position": 1,
"column_default": null,
"cached_at": "2025-01-15T02:00:00Z"
},
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_name": "customers",
"column_name": "name",
"data_type": "varchar",
"is_nullable": true,
"ordinal_position": 2,
"column_default": null,
"cached_at": "2025-01-15T02:00:00Z"
}
]
}

Error: 404 Not Found - Table not found in cache


GET /api/schema/search/tables

Search tables by name pattern (prefix match, case-insensitive).

Query Parameters:

ParameterTypeDefaultDescription
qstring-Required. Search pattern (prefix match)
limitinteger100Max results (max: 1000)

Response: 200 OK

{
"data": [
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_name": "customers",
"table_type": "TABLE",
"column_count": 12,
"cached_at": "2025-01-15T02:00:00Z"
},
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_name": "customer_orders",
"table_type": "TABLE",
"column_count": 8,
"cached_at": "2025-01-15T02:00:00Z"
}
]
}

GET /api/schema/search/columns

Search columns by name pattern (prefix match, case-insensitive).

Query Parameters:

ParameterTypeDefaultDescription
qstring-Required. Search pattern (prefix match)
limitinteger100Max results (max: 1000)

Response: 200 OK

{
"data": [
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_name": "customers",
"column_name": "email",
"data_type": "varchar",
"is_nullable": true,
"ordinal_position": 3,
"column_default": null,
"cached_at": "2025-01-15T02:00:00Z"
},
{
"catalog_name": "hive",
"schema_name": "analytics",
"table_name": "users",
"column_name": "email_address",
"data_type": "varchar",
"is_nullable": false,
"ordinal_position": 2,
"column_default": null,
"cached_at": "2025-01-15T02:00:00Z"
}
]
}

POST /api/schema/admin/refresh

Manually triggers schema cache refresh. Starts background task and returns immediately.

Requires: Admin role

Response: 200 OK

{
"data": {
"status": "refresh triggered"
}
}

Error: 403 Forbidden - Requires admin role


GET /api/schema/stats

Returns schema cache statistics.

Requires: Admin role

Response: 200 OK

{
"data": {
"total_catalogs": 3,
"total_schemas": 25,
"total_tables": 450,
"total_columns": 3200,
"last_refresh": "2025-01-15T02:00:00Z",
"index_size_bytes": 1048576
}
}

Error: 403 Forbidden - Requires admin role


See api.common.spec.md for the complete error codes reference.