CoordinatorMonitoring

Monitoring

Monitor coordinator health, execution throughput, and abuse-control behavior.

Base URL

Replace {COORDINATOR_URL} with your coordinator service URL.

Local Development:

http://localhost:3001/api

Production:

https://your-coordinator-domain.com/api

Health Check

Check coordinator status:

curl {COORDINATOR_URL}/api/health

Response:

{
  "status": "healthy",
  "database": "connected",
  "timestamp": 1777500000000,
  "schedulesRegistered": 4
}

When the database is unavailable, the coordinator responds with 503 and an unhealthy payload.

Logs

Monitor coordinator logs for:

  • schedule polling
  • execution run creation
  • stage-level attempts (delegate, claim, commit)
  • retries and exhausted runs
  • schedule registration success / rejection
  • rate-limit decisions
  • Redis limiter connectivity and fallback events

Development:

npm run dev

Production:

npm run start

Database Monitoring

Use Drizzle Studio to inspect the coordinator database:

npm run db:studio

Current tables:

  • schedules - registered schedules, recipient payloads, Merkle proofs
  • execution_runs - execution-run state for each scheduled payout window
  • execution_attempts - stage-level attempt history

Metrics

The coordinator exports Prometheus metrics at GET /api/metrics.

By default this endpoint is protected. If METRICS_PUBLIC=false, callers must send:

Authorization: Bearer <METRICS_AUTH_TOKEN>

Example:

curl {COORDINATOR_URL}/api/metrics \
  -H "Authorization: Bearer <METRICS_AUTH_TOKEN>"

Key metrics exposed today:

  • veil_scheduler_polls_total
  • veil_schedules_detected_due_total
  • veil_execution_runs_created_total
  • veil_execution_stage_total
  • veil_claim_results_total
  • veil_api_requests_total
  • veil_api_request_duration_seconds
  • veil_api_rate_limit_decisions_total
  • veil_api_concurrency_limit_decisions_total
  • veil_api_rate_limit_backend_events_total

Rate-Limit Monitoring

If rate limiting is enabled, watch for:

  • repeated 429 responses on POST /api/schedules
  • bursts of limited outcomes in veil_api_rate_limit_decisions_total
  • repeated concurrency limiting on registration requests
  • memory-fallback backend events, which indicate Redis is unavailable

In dry-run mode, the coordinator logs what would have been limited without blocking the request.

Troubleshooting

Coordinator not executing schedules

  • check logs for execution-stage failures
  • verify database connectivity
  • verify ER authority keypair loading
  • verify Solana RPC connectivity

/api/metrics returns 403

  • confirm METRICS_PUBLIC=false is intentional
  • send Authorization: Bearer <METRICS_AUTH_TOKEN>
  • verify METRICS_AUTH_TOKEN is set on the deployed service

Database errors

  • check DATABASE_URL
  • verify PostgreSQL is reachable
  • run migrations with npm run db:migrate

Redis limiter errors

  • confirm RATE_LIMIT_REDIS_URL is valid
  • expect one slower connect if your managed Redis instance was sleeping
  • if Redis is unavailable, registration falls back to in-memory limiting and read routes fail open