Metrics API
The Metrics API provides real-time analytics and aggregated metrics about events flowing through Eventara.
Get Comprehensive Metrics
Retrieve all available metrics in a single response.
Endpoint
GET /api/v1/metricsExample Request
curl "http://localhost:8080/api/v1/metrics"Success Response
Status Code: 200 OK
{
"summary": {
"totalEvents": 15000,
"eventsPerSecond": 25.3,
"avgLatency": 12.5,
"errorRate": 2.1,
"uniqueUsers": 1250,
"uniqueSources": 8,
"uniqueEventTypes": 15
},
"eventsByType": {
"user.login": 5000,
"payment.success": 3500,
"payment.failed": 150,
"order.created": 2800,
"user.signup": 1200,
"api.error": 320
},
"eventsBySource": {
"auth-service": 6200,
"payment-service": 3650,
"order-service": 2800,
"api-gateway": 1500,
"notification-service": 850
},
"eventsBySeverity": {
"INFO": 13500,
"WARNING": 980,
"ERROR": 420,
"CRITICAL": 100
},
"latencyMetrics": {
"overall": {
"p50": 8.5,
"p95": 24.3,
"p99": 45.2,
"avg": 12.5,
"min": 2.1,
"max": 89.4,
"sampleCount": 1000
},
"byType": {
"user.login": {
"p50": 6.2,
"p95": 18.5,
"p99": 32.1,
"avg": 9.3
},
"payment.failed": {
"p50": 15.3,
"p95": 42.7,
"p99": 68.2,
"avg": 22.4
}
},
"bySource": {
"auth-service": {
"p50": 7.1,
"p95": 19.8,
"p99": 35.6,
"avg": 10.5
},
"payment-service": {
"p50": 12.4,
"p95": 35.2,
"p99": 58.9,
"avg": 18.7
}
}
},
"errorMetrics": {
"totalErrors": 520,
"errorRate": 2.1,
"errorsByType": {
"payment.failed": 150,
"api.error": 320,
"database.timeout": 50
},
"errorsBySource": {
"payment-service": 150,
"api-gateway": 320,
"database-service": 50
}
},
"throughput": {
"current": 25.3,
"peak": 87.5,
"peakTimestamp": "2026-01-12T14:23:15.789Z",
"avg": 18.7,
"trend": "increasing"
},
"timeWindows": {
"last5Minutes": {
"events": 7590,
"avgLatency": 11.8,
"errorRate": 1.9
},
"last15Minutes": {
"events": 22770,
"avgLatency": 12.2,
"errorRate": 2.0
},
"last1Hour": {
"events": 91080,
"avgLatency": 12.5,
"errorRate": 2.1
}
},
"topUsers": [
{
"userId": "user_1234",
"eventCount": 245,
"lastSeen": "2026-01-12T15:45:23.456Z"
},
{
"userId": "user_5678",
"eventCount": 198,
"lastSeen": "2026-01-12T15:44:56.789Z"
}
]
}Response Structure
Summary Metrics
| Field | Type | Description |
|---|---|---|
| totalEvents | number | Total number of events processed |
| eventsPerSecond | number | Current throughput rate |
| avgLatency | number | Average processing latency in milliseconds |
| errorRate | number | Percentage of events with ERROR or CRITICAL severity |
| uniqueUsers | number | Count of distinct user IDs |
| uniqueSources | number | Count of distinct source services |
| uniqueEventTypes | number | Count of distinct event types |
Events by Type
Map of event types to their occurrence counts.
Record<string, number>Events by Source
Map of source services to their event counts.
Record<string, number>Events by Severity
Counts for each severity level.
{
INFO: number;
WARNING: number;
ERROR: number;
CRITICAL: number;
}Latency Metrics
Performance metrics with percentile distributions.
{
overall: {
p50: number; // Median latency
p95: number; // 95th percentile
p99: number; // 99th percentile
avg: number; // Average
min: number; // Minimum
max: number; // Maximum
sampleCount: number; // Number of samples
};
byType: Record<string, LatencyStats>; // Per event type
bySource: Record<string, LatencyStats>; // Per source
}Error Metrics
Error tracking and analysis.
{
totalErrors: number;
errorRate: number; // Percentage
errorsByType: Record<string, number>;
errorsBySource: Record<string, number>;
}Throughput
Real-time and historical throughput data.
{
current: number; // Events per second now
peak: number; // Highest recorded throughput
peakTimestamp: string; // When peak occurred
avg: number; // Average throughput
trend: "increasing" | "decreasing" | "stable";
}Time Windows
Metrics aggregated over different time periods.
{
last5Minutes: WindowMetrics;
last15Minutes: WindowMetrics;
last1Hour: WindowMetrics;
}
interface WindowMetrics {
events: number;
avgLatency: number;
errorRate: number;
}Top Users
Most active users by event count.
Array<{
userId: string;
eventCount: number;
lastSeen: string; // ISO 8601 timestamp
}>Reset Metrics
Reset all in-memory metrics to zero.
Endpoint
POST /api/v1/metrics/resetExample Request
curl -X POST "http://localhost:8080/api/v1/metrics/reset"Success Response
Status Code: 200 OK
"Metrics reset successfully"Important Notes
- This endpoint resets all counters, latency tracking, and aggregations
- Historical data in PostgreSQL is NOT affected
- Use with caution, especially in production
- Primarily intended for testing and development
Metrics Update Frequency
Real-Time Updates
Metrics are updated immediately as events are processed:
- Event consumption from Kafka
- Persistence to PostgreSQL
- Metrics aggregation
WebSocket Broadcast
Metrics are broadcast to connected dashboard clients:
- Frequency: Every 1 second
- Protocol: STOMP over WebSocket
- Topic:
/topic/metrics - See WebSocket API for details
Scheduled Aggregation
Some aggregations run on a schedule:
- Frequency: Every 60 seconds
- Purpose: Time-window calculations, trending analysis
Metrics Retention
In-Memory Storage
- Duration: Last 24 hours
- Latency samples: Last 1000 measurements
- Event timestamps: Sliding window
Persistence
- All events permanently stored in PostgreSQL
- Historical metrics can be recalculated from event data
- In-memory metrics lost on application restart
Performance Considerations
Response Time
- Typical response: < 10ms
- All metrics served from memory
- No database queries required
Data Volume
- Metrics update on every event (no sampling)
- Memory usage scales with:
- Number of unique event types
- Number of unique sources
- Number of unique users
- Latency sample size (capped at 1000)
Scalability Limitations
Current implementation uses in-memory aggregation:
- Single instance only (not horizontally scalable)
- Lost on restart
- Limited by JVM heap size
Future Enhancement: Redis-backed metrics for distributed deployments.
Use Cases
Monitoring Dashboard
Poll this endpoint for periodic updates or use WebSocket for real-time streaming.
External Monitoring
Integrate with external monitoring systems (Prometheus, Grafana, Datadog):
# Example: Scrape metrics every 30 seconds
*/30 * * * * curl -s http://eventara:8080/api/v1/metrics | jq '.summary'Alerting
Use metrics to trigger custom alerts:
const metrics = await fetch('http://localhost:8080/api/v1/metrics').then(r => r.json());
if (metrics.summary.errorRate > 5.0) {
sendAlert('High error rate detected: ' + metrics.summary.errorRate + '%');
}
if (metrics.throughput.current < 10 && metrics.summary.totalEvents > 1000) {
sendAlert('Low throughput: ' + metrics.throughput.current + ' events/sec');
}Capacity Planning
Analyze throughput and latency trends:
# Monitor peak throughput
curl -s http://localhost:8080/api/v1/metrics | jq '.throughput'
# Check p99 latency
curl -s http://localhost:8080/api/v1/metrics | jq '.latencyMetrics.overall.p99'