Metrics API

The Metrics API provides real-time analytics and aggregated metrics about events flowing through Eventara.

Get Comprehensive Metrics

Retrieve all available metrics in a single response.

Endpoint


GET /api/v1/metrics

Example Request


curl "http://localhost:8080/api/v1/metrics"

Success Response

Status Code: 200 OK


{
  "summary": {
    "totalEvents": 15000,
    "eventsPerSecond": 25.3,
    "avgLatency": 12.5,
    "errorRate": 2.1,
    "uniqueUsers": 1250,
    "uniqueSources": 8,
    "uniqueEventTypes": 15
  },
  "eventsByType": {
    "user.login": 5000,
    "payment.success": 3500,
    "payment.failed": 150,
    "order.created": 2800,
    "user.signup": 1200,
    "api.error": 320
  },
  "eventsBySource": {
    "auth-service": 6200,
    "payment-service": 3650,
    "order-service": 2800,
    "api-gateway": 1500,
    "notification-service": 850
  },
  "eventsBySeverity": {
    "INFO": 13500,
    "WARNING": 980,
    "ERROR": 420,
    "CRITICAL": 100
  },
  "latencyMetrics": {
    "overall": {
      "p50": 8.5,
      "p95": 24.3,
      "p99": 45.2,
      "avg": 12.5,
      "min": 2.1,
      "max": 89.4,
      "sampleCount": 1000
    },
    "byType": {
      "user.login": {
        "p50": 6.2,
        "p95": 18.5,
        "p99": 32.1,
        "avg": 9.3
      },
      "payment.failed": {
        "p50": 15.3,
        "p95": 42.7,
        "p99": 68.2,
        "avg": 22.4
      }
    },
    "bySource": {
      "auth-service": {
        "p50": 7.1,
        "p95": 19.8,
        "p99": 35.6,
        "avg": 10.5
      },
      "payment-service": {
        "p50": 12.4,
        "p95": 35.2,
        "p99": 58.9,
        "avg": 18.7
      }
    }
  },
  "errorMetrics": {
    "totalErrors": 520,
    "errorRate": 2.1,
    "errorsByType": {
      "payment.failed": 150,
      "api.error": 320,
      "database.timeout": 50
    },
    "errorsBySource": {
      "payment-service": 150,
      "api-gateway": 320,
      "database-service": 50
    }
  },
  "throughput": {
    "current": 25.3,
    "peak": 87.5,
    "peakTimestamp": "2026-01-12T14:23:15.789Z",
    "avg": 18.7,
    "trend": "increasing"
  },
  "timeWindows": {
    "last5Minutes": {
      "events": 7590,
      "avgLatency": 11.8,
      "errorRate": 1.9
    },
    "last15Minutes": {
      "events": 22770,
      "avgLatency": 12.2,
      "errorRate": 2.0
    },
    "last1Hour": {
      "events": 91080,
      "avgLatency": 12.5,
      "errorRate": 2.1
    }
  },
  "topUsers": [
    {
      "userId": "user_1234",
      "eventCount": 245,
      "lastSeen": "2026-01-12T15:45:23.456Z"
    },
    {
      "userId": "user_5678",
      "eventCount": 198,
      "lastSeen": "2026-01-12T15:44:56.789Z"
    }
  ]
}

Response Structure

Summary Metrics

Field	Type	Description
totalEvents	number	Total number of events processed
eventsPerSecond	number	Current throughput rate
avgLatency	number	Average processing latency in milliseconds
errorRate	number	Percentage of events with ERROR or CRITICAL severity
uniqueUsers	number	Count of distinct user IDs
uniqueSources	number	Count of distinct source services
uniqueEventTypes	number	Count of distinct event types

Events by Type

Map of event types to their occurrence counts.


Record<string, number>

Events by Source

Map of source services to their event counts.


Record<string, number>

Events by Severity

Counts for each severity level.


{
  INFO: number;
  WARNING: number;
  ERROR: number;
  CRITICAL: number;
}

Latency Metrics

Performance metrics with percentile distributions.


{
  overall: {
    p50: number;    // Median latency
    p95: number;    // 95th percentile
    p99: number;    // 99th percentile
    avg: number;    // Average
    min: number;    // Minimum
    max: number;    // Maximum
    sampleCount: number;  // Number of samples
  };
  byType: Record<string, LatencyStats>;    // Per event type
  bySource: Record<string, LatencyStats>;  // Per source
}

Error Metrics

Error tracking and analysis.


{
  totalErrors: number;
  errorRate: number;  // Percentage
  errorsByType: Record<string, number>;
  errorsBySource: Record<string, number>;
}

Throughput

Real-time and historical throughput data.


{
  current: number;     // Events per second now
  peak: number;        // Highest recorded throughput
  peakTimestamp: string;  // When peak occurred
  avg: number;         // Average throughput
  trend: "increasing" | "decreasing" | "stable";
}

Time Windows

Metrics aggregated over different time periods.


{
  last5Minutes: WindowMetrics;
  last15Minutes: WindowMetrics;
  last1Hour: WindowMetrics;
}
 
interface WindowMetrics {
  events: number;
  avgLatency: number;
  errorRate: number;
}

Top Users

Most active users by event count.


Array<{
  userId: string;
  eventCount: number;
  lastSeen: string;  // ISO 8601 timestamp
}>

Reset Metrics

Reset all in-memory metrics to zero.

Endpoint


POST /api/v1/metrics/reset

Example Request


curl -X POST "http://localhost:8080/api/v1/metrics/reset"

Success Response

Status Code: 200 OK


"Metrics reset successfully"

Important Notes

This endpoint resets all counters, latency tracking, and aggregations
Historical data in PostgreSQL is NOT affected
Use with caution, especially in production
Primarily intended for testing and development

Metrics Update Frequency

Real-Time Updates

Metrics are updated immediately as events are processed:

Event consumption from Kafka
Persistence to PostgreSQL
Metrics aggregation

WebSocket Broadcast

Metrics are broadcast to connected dashboard clients:

Frequency: Every 1 second
Protocol: STOMP over WebSocket
Topic: /topic/metrics
See WebSocket API for details

Scheduled Aggregation

Some aggregations run on a schedule:

Frequency: Every 60 seconds
Purpose: Time-window calculations, trending analysis

Metrics Retention

In-Memory Storage

Duration: Last 24 hours
Latency samples: Last 1000 measurements
Event timestamps: Sliding window

Persistence

All events permanently stored in PostgreSQL
Historical metrics can be recalculated from event data
In-memory metrics lost on application restart

Performance Considerations

Response Time

Typical response: < 10ms
All metrics served from memory
No database queries required

Data Volume

Metrics update on every event (no sampling)
Memory usage scales with:
- Number of unique event types
- Number of unique sources
- Number of unique users
- Latency sample size (capped at 1000)

Scalability Limitations

Current implementation uses in-memory aggregation:

Single instance only (not horizontally scalable)
Lost on restart
Limited by JVM heap size

Future Enhancement: Redis-backed metrics for distributed deployments.

Use Cases

Monitoring Dashboard

Poll this endpoint for periodic updates or use WebSocket for real-time streaming.

External Monitoring

Integrate with external monitoring systems (Prometheus, Grafana, Datadog):


# Example: Scrape metrics every 30 seconds
*/30 * * * * curl -s http://eventara:8080/api/v1/metrics | jq '.summary'

Alerting

Use metrics to trigger custom alerts:


const metrics = await fetch('http://localhost:8080/api/v1/metrics').then(r => r.json());
 
if (metrics.summary.errorRate > 5.0) {
  sendAlert('High error rate detected: ' + metrics.summary.errorRate + '%');
}
 
if (metrics.throughput.current < 10 && metrics.summary.totalEvents > 1000) {
  sendAlert('Low throughput: ' + metrics.throughput.current + ' events/sec');
}

Capacity Planning

Analyze throughput and latency trends:


# Monitor peak throughput
curl -s http://localhost:8080/api/v1/metrics | jq '.throughput'
 
# Check p99 latency
curl -s http://localhost:8080/api/v1/metrics | jq '.latencyMetrics.overall.p99'