Observability Guide
This guide covers the logging, metrics, and distributed tracing infrastructure for the Open Paws API Gateway.
Overview
The API Gateway uses DataDog as the primary observability platform, providing:
- Structured Logging: All request/response logs forwarded to DataDog Logs
- Metrics: Request latency, payload sizes, and status codes tracked in DataDog Metrics
- Distributed Tracing: Correlation IDs propagated across all 14 backend services
Architecture
Code
Configuration
Environment Variables
| Variable | Type | Description |
|---|---|---|
DATADOG_API_KEY | Secret | DataDog API key for authentication |
ENVIRONMENT | Config | Environment name for tagging (production, staging, preview) |
LOG_LEVEL | Config | Optional log level override (error, warn, info, debug) |
The three Auth0 environment variables (ZUPLO_PUBLIC_AUTH0_*) are separate from observability configuration.
Those variables are used for developer portal authentication, not API gateway authentication.
See the Deployment Checklist for Auth0 setup details.
Runtime Configuration
The observability stack is configured in modules/zuplo.runtime.ts:
Code
Accessing Logs in DataDog
Basic Log Search
Navigate to Logs → Search and use these queries:
Code
Common Log Fields
| Field | Description |
|---|---|
requestId | Zuplo's unique request identifier |
correlationId | Distributed tracing ID (propagated to backends) |
service | Always "api-gateway" |
environment | production, staging, or preview |
method | HTTP method (GET, POST, etc.) |
path | Request path |
statusCode | HTTP response status |
latencyMs | Request duration in milliseconds |
Filtering by Backend Service
When using the logging helpers with backend service tags:
Code
Accessing Metrics in DataDog
Available Metrics
The following metrics are automatically collected:
| Metric | Description | Tags |
|---|---|---|
zuplo.request.latency | Request duration | method, status_code, path |
zuplo.request.content_length | Request body size | method, path |
zuplo.response.content_length | Response body size | method, status_code, path |
Creating Dashboards
Navigate to Dashboards → New Dashboard and add widgets for:
Request Latency (P95)
Code
Error Rate
Code
Request Volume
Code
Correlation ID Tracing
How It Works
- Client makes request to API Gateway
- Gateway checks for
x-correlation-idheader - If missing, generates new UUID
- Correlation ID stored in
context.customProperties.correlationId - Correlation ID added to
x-correlation-idheader for backend requests - All logs include the correlation ID for tracing
Tracing a Request
- Get Correlation ID: Check the
x-correlation-idresponse header - Search DataDog Logs:
correlationId:<your-id> - View Full Trace: All logs from gateway and backends with same ID appear together
Implementing in Backend Services
Backend services should:
- Read
x-correlation-idheader from incoming requests - Include correlation ID in all log entries
- Pass correlation ID to any downstream services
Example (Node.js):
Code
Log Level Configuration
Default Behavior
- Production: Only
errorandwarnlogs are sent (reduces costs) - Staging/Preview:
infolevel logs enabled for debugging
Override Log Level
Set the LOG_LEVEL environment variable:
| Value | Logs Included |
|---|---|
error | Errors only |
warn | Errors + warnings |
info | Errors + warnings + info |
debug | All logs (verbose) |
Note: Higher verbosity increases DataDog costs. Use info or debug only for debugging specific issues.
Troubleshooting
Logs Not Appearing in DataDog
- Check API Key: Verify
DATADOG_API_KEYis correctly set as a Secret in Zuplo - Check Environment: Ensure
ENVIRONMENTis set - Wait for Propagation: Logs may take 1-2 minutes to appear
- Check Zuplo Logs: View raw Zuplo logs to confirm logging is working
Metrics Not Appearing
- Metrics Plugin is Beta: Ensure your Zuplo plan supports metrics
- Check Tags: Verify metrics are tagged with
app:open-paws-api-gateway - Wait for Aggregation: Metrics may take 5-10 minutes to aggregate
High DataDog Costs
- Reduce Log Volume: Set
LOG_LEVEL=errorin production - Review Log Retention: Configure shorter retention for verbose logs
- Use Log Archives: Archive old logs to cheaper storage
Recommended DataDog Monitors
Critical Alerts
High Error Rate
Code
Elevated Latency
Code
Backend Service Down
Code
Warning Alerts
Unusual Traffic Spike
Code
Using Logging Helpers
The gateway provides structured logging utilities in modules/logging-helper.ts.
Basic Usage
Code
Available Functions
| Function | Description |
|---|---|
logRequest(context, request) | Log incoming request details |
logResponse(context, response, startTime) | Log response with latency |
logBackendCall(context, service, endpoint, method) | Log backend service call |
logBackendResponse(context, service, statusCode, latencyMs) | Log backend response |
logError(context, error, errorContext) | Log errors with stack traces |
logWarning(context, message) | Log warning conditions |
setMetricsTags(context, tags) | Add custom DataDog metric tags |
Best Practices
- Always include correlation ID: Ensures traceability across services
- Use structured logs: JSON format enables better filtering and analysis
- Tag by backend service: Makes it easy to identify problematic backends
- Set appropriate log levels: Reduce noise and costs in production
- Create focused dashboards: One dashboard per concern (latency, errors, traffic)
- Configure alerts early: Don't wait for production incidents
- Review logs regularly: Identify patterns before they become problems