Guides

Observability Guide

This guide covers the logging, metrics, and distributed tracing infrastructure for the Open Paws API Gateway.

Overview

The API Gateway uses DataDog as the primary observability platform, providing:

Structured Logging: All request/response logs forwarded to DataDog Logs
Metrics: Request latency, payload sizes, and status codes tracked in DataDog Metrics
Distributed Tracing: Correlation IDs propagated across all 14 backend services

Architecture

Code
┌─────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Client    │────▶│  Zuplo Gateway  │────▶│ Backend Service │
└─────────────┘     └─────────────────┘     └─────────────────┘
                           │                        │
                           │ x-correlation-id       │
                           ▼                        ▼
                    ┌─────────────────────────────────────┐
                    │            DataDog                   │
                    │  ┌──────────┐    ┌──────────────┐   │
                    │  │   Logs   │    │   Metrics    │   │
                    │  └──────────┘    └──────────────┘   │
                    └─────────────────────────────────────┘

Configuration

Environment Variables

Variable	Type	Description
`DATADOG_API_KEY`	Secret	DataDog API key for authentication
`ENVIRONMENT`	Config	Environment name for tagging (production, staging, preview)
`LOG_LEVEL`	Config	Optional log level override (error, warn, info, debug)

The three Auth0 environment variables (ZUPLO_PUBLIC_AUTH0_*) are separate from observability configuration. Those variables are used for developer portal authentication, not API gateway authentication. See the Deployment Checklist for Auth0 setup details.

Runtime Configuration

The observability stack is configured in modules/zuplo.runtime.ts:


Code
// DataDog Logging Plugin
runtime.addPlugin(
  new DataDogLoggingPlugin({
    url: "https://http-intake.logs.datadoghq.com/api/v2/logs",
    apiKey: environment.DATADOG_API_KEY,
    source: "OpenPawsAPIGateway",
    tags: [`environment:${currentEnvironment}`],
    fields: {
      service: "api-gateway",
      version: "1.0",
      application: "open-paws",
    },
  })
);

// DataDog Metrics Plugin
runtime.addPlugin(
  new DataDogMetricsPlugin({
    apiKey: environment.DATADOG_API_KEY,
    tags: [`app:open-paws-api-gateway`, `environment:${currentEnvironment}`],
    metrics: {
      latency: true,
      requestContentLength: true,
      responseContentLength: true,
    },
  })
);

Accessing Logs in DataDog

Basic Log Search

Navigate to Logs → Search and use these queries:

Code
# All API Gateway logs
service:api-gateway

# Filter by environment
service:api-gateway environment:production

# Filter by log level
service:api-gateway status:error

# Search by correlation ID
correlationId:abc123-def456-ghi789

# Search by request ID (Zuplo's built-in ID)
requestId:zp-abc123

Common Log Fields

Field	Description
`requestId`	Zuplo's unique request identifier
`correlationId`	Distributed tracing ID (propagated to backends)
`service`	Always "api-gateway"
`environment`	production, staging, or preview
`method`	HTTP method (GET, POST, etc.)
`path`	Request path
`statusCode`	HTTP response status
`latencyMs`	Request duration in milliseconds

Filtering by Backend Service

When using the logging helpers with backend service tags:

Code
# Logs related to n8n backend
backend.service:n8n

# Logs for failed backend calls
backend.statusCode:>=500

# Slow backend responses
backend.latencyMs:>1000

Accessing Metrics in DataDog

Available Metrics

The following metrics are automatically collected:

Metric	Description	Tags
`zuplo.request.latency`	Request duration	method, status_code, path
`zuplo.request.content_length`	Request body size	method, path
`zuplo.response.content_length`	Response body size	method, status_code, path

Creating Dashboards

Navigate to Dashboards → New Dashboard and add widgets for:

Request Latency (P95)

Code
avg:zuplo.request.latency{app:open-paws-api-gateway} by {path}

Error Rate

Code
sum:zuplo.request.latency{status_code:5*} / sum:zuplo.request.latency{*}

Request Volume

Code
count:zuplo.request.latency{app:open-paws-api-gateway} by {path}.as_count()

Correlation ID Tracing

How It Works

Client makes request to API Gateway
Gateway checks for x-correlation-id header
If missing, generates new UUID
Correlation ID stored in context.customProperties.correlationId
Correlation ID added to x-correlation-id header for backend requests
All logs include the correlation ID for tracing

Tracing a Request

Get Correlation ID: Check the x-correlation-id response header
Search DataDog Logs: correlationId:<your-id>
View Full Trace: All logs from gateway and backends with same ID appear together

Implementing in Backend Services

Backend services should:

Read x-correlation-id header from incoming requests
Include correlation ID in all log entries
Pass correlation ID to any downstream services

Example (Node.js):


Code
app.use((req, res, next) => {
  const correlationId = req.headers['x-correlation-id'];
  req.correlationId = correlationId;
  logger.info({ correlationId, message: 'Request received' });
  next();
});

Log Level Configuration

Default Behavior

Production: Only error and warn logs are sent (reduces costs)
Staging/Preview: info level logs enabled for debugging

Override Log Level

Set the LOG_LEVEL environment variable:

Value	Logs Included
`error`	Errors only
`warn`	Errors + warnings
`info`	Errors + warnings + info
`debug`	All logs (verbose)

Note: Higher verbosity increases DataDog costs. Use info or debug only for debugging specific issues.

Troubleshooting

Logs Not Appearing in DataDog

Check API Key: Verify DATADOG_API_KEY is correctly set as a Secret in Zuplo
Check Environment: Ensure ENVIRONMENT is set
Wait for Propagation: Logs may take 1-2 minutes to appear
Check Zuplo Logs: View raw Zuplo logs to confirm logging is working

Metrics Not Appearing

Metrics Plugin is Beta: Ensure your Zuplo plan supports metrics
Check Tags: Verify metrics are tagged with app:open-paws-api-gateway
Wait for Aggregation: Metrics may take 5-10 minutes to aggregate

High DataDog Costs

Reduce Log Volume: Set LOG_LEVEL=error in production
Review Log Retention: Configure shorter retention for verbose logs
Use Log Archives: Archive old logs to cheaper storage

Recommended DataDog Monitors

Critical Alerts

High Error Rate

Code
Monitor: Metric Alert
Query: sum:zuplo.request.latency{status_code:5*,app:open-paws-api-gateway}.as_count() /
       sum:zuplo.request.latency{app:open-paws-api-gateway}.as_count() > 0.05
Alert: Error rate > 5%

Elevated Latency

Code
Monitor: Metric Alert
Query: percentile:zuplo.request.latency{app:open-paws-api-gateway} by {path}.p95 > 2000
Alert: P95 latency > 2 seconds

Backend Service Down

Code
Monitor: Log Alert
Query: logs("service:api-gateway backend.statusCode:>=500").rollup("count").by("backend.service").last("5m") > 10
Alert: More than 10 backend errors in 5 minutes

Warning Alerts

Unusual Traffic Spike

Code
Monitor: Anomaly Detection
Query: count:zuplo.request.latency{app:open-paws-api-gateway}.as_count()
Alert: Traffic deviates significantly from baseline

Using Logging Helpers

The gateway provides structured logging utilities in modules/logging-helper.ts.

Basic Usage


Code
import {
  logRequest,
  logResponse,
  logBackendCall,
  logError,
  setMetricsTags,
  BackendServices
} from "./logging-helper";

export default async function handler(request: ZuploRequest, context: ZuploContext) {
  const startTime = Date.now();

  // Log incoming request
  logRequest(context, request);

  // Set custom metrics tags
  setMetricsTags(context, { backend: BackendServices.N8N, operation: "investigate" });

  try {
    // Log backend call
    logBackendCall(context, BackendServices.N8N, "/webhook/company", "POST");

    const response = await fetch(backendUrl, { ... });

    // Log response
    logResponse(context, response, startTime);

    return response;
  } catch (error) {
    // Log error with full context
    logError(context, error, "Failed to call n8n backend");
    throw error;
  }
}

Available Functions

Function	Description
`logRequest(context, request)`	Log incoming request details
`logResponse(context, response, startTime)`	Log response with latency
`logBackendCall(context, service, endpoint, method)`	Log backend service call
`logBackendResponse(context, service, statusCode, latencyMs)`	Log backend response
`logError(context, error, errorContext)`	Log errors with stack traces
`logWarning(context, message)`	Log warning conditions
`setMetricsTags(context, tags)`	Add custom DataDog metric tags

Best Practices

Always include correlation ID: Ensures traceability across services
Use structured logs: JSON format enables better filtering and analysis
Tag by backend service: Makes it easy to identify problematic backends
Set appropriate log levels: Reduce noise and costs in production
Create focused dashboards: One dashboard per concern (latency, errors, traffic)
Configure alerts early: Don't wait for production incidents
Review logs regularly: Identify patterns before they become problems

Last modified on May 4, 2026

Changelog Deployment Checklist