Skip to content

Observability Rail

Platform monitoring, SLO tracking, and distributed tracing.

The Observability Rail provides endpoints for monitoring platform health, SLO tracking, distributed tracing, and metrics collection.

/api/v1/observability
GET /api/v1/observability/health

Get platform health status.

Response:

{
"data": {
"status": "HEALTHY",
"timestamp": "2025-01-15T10:00:00Z",
"components": {
"api": { "status": "HEALTHY", "latency": 45 },
"database": { "status": "HEALTHY", "latency": 12 },
"cache": { "status": "HEALTHY", "latency": 2 },
"queue": { "status": "HEALTHY", "latency": 5 }
},
"version": "2.0.0"
}
}
GET /api/v1/observability/slos

Get Service Level Objective status.

Response:

{
"data": [
{
"sloId": "api_availability",
"name": "API Availability",
"target": 0.999,
"current": 0.9995,
"status": "MET",
"period": "30d",
"errorBudgetRemaining": 0.0005
},
{
"sloId": "api_latency_p99",
"name": "API Latency P99",
"target": 500,
"current": 245,
"unit": "ms",
"status": "MET",
"period": "30d"
}
]
}
GET /api/v1/observability/metrics

Get platform metrics.

Query Parameters:

ParameterTypeDescription
metricstringMetric name
periodstringTime period
aggregationstringavg, sum, max, min, p99

Response:

{
"data": {
"metric": "api_requests_total",
"period": "24h",
"aggregation": "sum",
"values": [
{ "timestamp": "2025-01-14T10:00:00Z", "value": 150000 },
{ "timestamp": "2025-01-14T11:00:00Z", "value": 175000 }
],
"total": 3500000
}
}
GET /api/v1/observability/traces

Get distributed traces.

Query Parameters:

ParameterTypeDescription
servicestringFilter by service
operationstringFilter by operation
minDurationnumberMin duration (ms)
statusstringOK, ERROR
fromstringStart timestamp

Response:

{
"data": [
{
"traceId": "trace_abc123",
"rootSpan": "POST /api/v1/contracts",
"service": "rail-api",
"duration": 245,
"status": "OK",
"spanCount": 12,
"timestamp": "2025-01-15T10:00:00Z"
}
]
}
GET /api/v1/observability/traces/:traceId

Get detailed trace with all spans.

Response:

{
"data": {
"traceId": "trace_abc123",
"spans": [
{
"spanId": "span_1",
"parentSpanId": null,
"operation": "POST /api/v1/contracts",
"service": "rail-api",
"duration": 245,
"status": "OK",
"tags": {
"http.method": "POST",
"http.status_code": 201
}
},
{
"spanId": "span_2",
"parentSpanId": "span_1",
"operation": "db.insert",
"service": "postgresql",
"duration": 45,
"status": "OK"
}
]
}
}
GET /api/v1/observability/alerts

Get active alerts.

Response:

{
"data": [
{
"alertId": "alert_xyz",
"name": "High Error Rate",
"severity": "WARNING",
"status": "FIRING",
"message": "Error rate > 1% for contracts rail",
"startedAt": "2025-01-15T09:45:00Z",
"labels": {
"rail": "contracts",
"environment": "production"
}
}
]
}
POST /api/v1/observability/alerts/:alertId/acknowledge

Acknowledge an alert.

GET /api/v1/observability/errors

Get error rates by rail/endpoint.

Response:

{
"data": {
"period": "1h",
"totalRequests": 150000,
"totalErrors": 150,
"errorRate": 0.001,
"byRail": {
"contracts": { "requests": 50000, "errors": 50, "rate": 0.001 },
"kyc": { "requests": 30000, "errors": 30, "rate": 0.001 }
},
"topErrors": [
{ "code": "VALIDATION_ERROR", "count": 100 },
{ "code": "NOT_FOUND", "count": 35 }
]
}
}
MetricDescription
api_requests_totalTotal API requests
api_request_duration_msRequest duration
api_errors_totalTotal errors
db_connections_activeActive DB connections
cache_hit_ratioCache hit ratio
queue_depthMessage queue depth
TypeDescription
AvailabilityService availability
LatencyResponse time
Error RateError percentage
ThroughputRequest volume
SeverityDescription
CRITICALImmediate action required
WARNINGAttention needed
INFOInformational
EventDescription
observability.alert.firedAlert triggered
observability.alert.resolvedAlert resolved
observability.slo.breachSLO breached