Skip to Content
API ReferenceMetrics API

Metrics API

The Metrics API provides real-time analytics and aggregated metrics about events flowing through Eventara.

Get Comprehensive Metrics

Retrieve all available metrics in a single response.

Endpoint

GET /api/v1/metrics

Example Request

curl "http://localhost:8080/api/v1/metrics"

Success Response

Status Code: 200 OK

{ "summary": { "totalEvents": 15000, "eventsPerSecond": 25.3, "avgLatency": 12.5, "errorRate": 2.1, "uniqueUsers": 1250, "uniqueSources": 8, "uniqueEventTypes": 15 }, "eventsByType": { "user.login": 5000, "payment.success": 3500, "payment.failed": 150, "order.created": 2800, "user.signup": 1200, "api.error": 320 }, "eventsBySource": { "auth-service": 6200, "payment-service": 3650, "order-service": 2800, "api-gateway": 1500, "notification-service": 850 }, "eventsBySeverity": { "INFO": 13500, "WARNING": 980, "ERROR": 420, "CRITICAL": 100 }, "latencyMetrics": { "overall": { "p50": 8.5, "p95": 24.3, "p99": 45.2, "avg": 12.5, "min": 2.1, "max": 89.4, "sampleCount": 1000 }, "byType": { "user.login": { "p50": 6.2, "p95": 18.5, "p99": 32.1, "avg": 9.3 }, "payment.failed": { "p50": 15.3, "p95": 42.7, "p99": 68.2, "avg": 22.4 } }, "bySource": { "auth-service": { "p50": 7.1, "p95": 19.8, "p99": 35.6, "avg": 10.5 }, "payment-service": { "p50": 12.4, "p95": 35.2, "p99": 58.9, "avg": 18.7 } } }, "errorMetrics": { "totalErrors": 520, "errorRate": 2.1, "errorsByType": { "payment.failed": 150, "api.error": 320, "database.timeout": 50 }, "errorsBySource": { "payment-service": 150, "api-gateway": 320, "database-service": 50 } }, "throughput": { "current": 25.3, "peak": 87.5, "peakTimestamp": "2026-01-12T14:23:15.789Z", "avg": 18.7, "trend": "increasing" }, "timeWindows": { "last5Minutes": { "events": 7590, "avgLatency": 11.8, "errorRate": 1.9 }, "last15Minutes": { "events": 22770, "avgLatency": 12.2, "errorRate": 2.0 }, "last1Hour": { "events": 91080, "avgLatency": 12.5, "errorRate": 2.1 } }, "topUsers": [ { "userId": "user_1234", "eventCount": 245, "lastSeen": "2026-01-12T15:45:23.456Z" }, { "userId": "user_5678", "eventCount": 198, "lastSeen": "2026-01-12T15:44:56.789Z" } ] }

Response Structure

Summary Metrics

FieldTypeDescription
totalEventsnumberTotal number of events processed
eventsPerSecondnumberCurrent throughput rate
avgLatencynumberAverage processing latency in milliseconds
errorRatenumberPercentage of events with ERROR or CRITICAL severity
uniqueUsersnumberCount of distinct user IDs
uniqueSourcesnumberCount of distinct source services
uniqueEventTypesnumberCount of distinct event types

Events by Type

Map of event types to their occurrence counts.

Record<string, number>

Events by Source

Map of source services to their event counts.

Record<string, number>

Events by Severity

Counts for each severity level.

{ INFO: number; WARNING: number; ERROR: number; CRITICAL: number; }

Latency Metrics

Performance metrics with percentile distributions.

{ overall: { p50: number; // Median latency p95: number; // 95th percentile p99: number; // 99th percentile avg: number; // Average min: number; // Minimum max: number; // Maximum sampleCount: number; // Number of samples }; byType: Record<string, LatencyStats>; // Per event type bySource: Record<string, LatencyStats>; // Per source }

Error Metrics

Error tracking and analysis.

{ totalErrors: number; errorRate: number; // Percentage errorsByType: Record<string, number>; errorsBySource: Record<string, number>; }

Throughput

Real-time and historical throughput data.

{ current: number; // Events per second now peak: number; // Highest recorded throughput peakTimestamp: string; // When peak occurred avg: number; // Average throughput trend: "increasing" | "decreasing" | "stable"; }

Time Windows

Metrics aggregated over different time periods.

{ last5Minutes: WindowMetrics; last15Minutes: WindowMetrics; last1Hour: WindowMetrics; } interface WindowMetrics { events: number; avgLatency: number; errorRate: number; }

Top Users

Most active users by event count.

Array<{ userId: string; eventCount: number; lastSeen: string; // ISO 8601 timestamp }>

Reset Metrics

Reset all in-memory metrics to zero.

Endpoint

POST /api/v1/metrics/reset

Example Request

curl -X POST "http://localhost:8080/api/v1/metrics/reset"

Success Response

Status Code: 200 OK

"Metrics reset successfully"

Important Notes

  • This endpoint resets all counters, latency tracking, and aggregations
  • Historical data in PostgreSQL is NOT affected
  • Use with caution, especially in production
  • Primarily intended for testing and development

Metrics Update Frequency

Real-Time Updates

Metrics are updated immediately as events are processed:

  • Event consumption from Kafka
  • Persistence to PostgreSQL
  • Metrics aggregation

WebSocket Broadcast

Metrics are broadcast to connected dashboard clients:

  • Frequency: Every 1 second
  • Protocol: STOMP over WebSocket
  • Topic: /topic/metrics
  • See WebSocket API for details

Scheduled Aggregation

Some aggregations run on a schedule:

  • Frequency: Every 60 seconds
  • Purpose: Time-window calculations, trending analysis

Metrics Retention

In-Memory Storage

  • Duration: Last 24 hours
  • Latency samples: Last 1000 measurements
  • Event timestamps: Sliding window

Persistence

  • All events permanently stored in PostgreSQL
  • Historical metrics can be recalculated from event data
  • In-memory metrics lost on application restart

Performance Considerations

Response Time

  • Typical response: < 10ms
  • All metrics served from memory
  • No database queries required

Data Volume

  • Metrics update on every event (no sampling)
  • Memory usage scales with:
    • Number of unique event types
    • Number of unique sources
    • Number of unique users
    • Latency sample size (capped at 1000)

Scalability Limitations

Current implementation uses in-memory aggregation:

  • Single instance only (not horizontally scalable)
  • Lost on restart
  • Limited by JVM heap size

Future Enhancement: Redis-backed metrics for distributed deployments.


Use Cases

Monitoring Dashboard

Poll this endpoint for periodic updates or use WebSocket for real-time streaming.

External Monitoring

Integrate with external monitoring systems (Prometheus, Grafana, Datadog):

# Example: Scrape metrics every 30 seconds */30 * * * * curl -s http://eventara:8080/api/v1/metrics | jq '.summary'

Alerting

Use metrics to trigger custom alerts:

const metrics = await fetch('http://localhost:8080/api/v1/metrics').then(r => r.json()); if (metrics.summary.errorRate > 5.0) { sendAlert('High error rate detected: ' + metrics.summary.errorRate + '%'); } if (metrics.throughput.current < 10 && metrics.summary.totalEvents > 1000) { sendAlert('Low throughput: ' + metrics.throughput.current + ' events/sec'); }

Capacity Planning

Analyze throughput and latency trends:

# Monitor peak throughput curl -s http://localhost:8080/api/v1/metrics | jq '.throughput' # Check p99 latency curl -s http://localhost:8080/api/v1/metrics | jq '.latencyMetrics.overall.p99'
Last updated on