Architecture
This document describes the technical architecture of Eventara, including its components, data flow, and design decisions.
System Overview
Eventara follows a streaming architecture pattern, designed for high throughput, reliability, and real-time processing of events.
┌─────────────────────────────────────────────────────────────┐
│ External Applications │
│ (Microservices, APIs, IoT Devices, Mobile Apps) │
└────────────────────────────┬────────────────────────────────┘
│ HTTP POST
▼
┌─────────────────────────────────────────────────────────────┐
│ Eventara Platform │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Ingestion │───▶│ Kafka │───▶│ Analytics │ │
│ │ Service │ │ (Streaming) │ │ Engine │ │
│ │ (Spring Boot)│ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PostgreSQL │ │ Drools │ │ WebSocket │ │
│ │ (Storage) │ │ Rule Engine │ │ (STOMP) │ │
│ └──────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
└───────────────────────────────────────────────────┼──────────┘
│
▼
┌──────────────┐
│ React │
│ Dashboard │
└──────────────┘Core Components
1. Ingestion Service
Technology: Spring Boot 3.5.7, Java 21
Responsibilities:
- Expose REST API for event ingestion
- Validate incoming event payloads
- Produce events to Kafka topics
- Provide health and status endpoints
Key Features:
- Request validation using Jakarta Bean Validation
- Automatic timestamp generation
- Unique event ID generation (
evt_prefix) - Error handling and meaningful error responses
- OpenAPI/Swagger documentation
API Endpoints:
POST /api/v1/events- Ingest single eventGET /api/v1/events- Query events with paginationGET /api/v1/events/{eventId}- Get event by IDGET /api/v1/events/type/{eventType}- Filter by event typeGET /api/v1/events/health- Health checkGET /api/v1/events/test- Service status
2. Apache Kafka
Version: Confluent Platform 7.5.0
Configuration:
- Brokers: Single broker (scalable to multiple)
- Topics:
eventara.events.raw- Raw events from ingestioneventara.events.processed- Processed events (planned)
- Partitions: 3 per topic
- Replication Factor: 1 (single-node setup)
Purpose:
- Decouple producers from consumers
- Provide event buffering and replay capability
- Enable horizontal scaling of consumers
- Ensure reliable event delivery with acknowledgments
3. Event Consumer
Technology: Spring Kafka
Responsibilities:
- Consume events from Kafka topics
- Persist events to PostgreSQL
- Update in-memory metrics
- Handle deduplication
- Manual offset management
Processing Flow:
- Receive event from Kafka
- Check for duplicates using
eventId - Save to PostgreSQL
- Update metrics aggregation
- Manually acknowledge offset
Error Handling:
- Failed events are not acknowledged
- Kafka retries delivery
- Future: Dead-letter queue for permanent failures
4. PostgreSQL Database
Version: PostgreSQL 14
Schema Management: Flyway migrations
Key Tables:
events
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
event_id VARCHAR(50) UNIQUE NOT NULL,
event_type VARCHAR(100) NOT NULL,
timestamp TIMESTAMP NOT NULL,
source VARCHAR(100) NOT NULL,
user_id VARCHAR(100),
session_id VARCHAR(100),
severity VARCHAR(20),
tags JSONB,
metadata JSONB,
received_at TIMESTAMP NOT NULL
);
-- Indexes for query optimization
CREATE INDEX idx_event_type ON events(event_type);
CREATE INDEX idx_timestamp ON events(timestamp);
CREATE INDEX idx_user_id ON events(user_id);
CREATE INDEX idx_source ON events(source);alert_rules
Stores user-defined alert rules with Drools DRL configuration.
alert_history
Logs of triggered alerts with timestamps and details.
notification_channels
Configuration for notification destinations (email, Slack, webhooks).
rule_execution_log
Audit trail of rule evaluations.
notification_log
Delivery status of notifications.
5. Analytics Engine
Technology: In-memory metrics aggregation
Metrics Tracked:
Summary Metrics:
- Total events
- Events per second
- Average latency
- Error rate
Dimensional Metrics:
- Events by type (count per event type)
- Events by source (count per source service)
- Events by severity (INFO, WARNING, ERROR, CRITICAL)
- Events by user (unique users tracked)
Performance Metrics:
- Latency percentiles (p50, p95, p99)
- Latency by event type
- Latency by source
Error Tracking:
- Total errors
- Errors by type
- Errors by source
- Error rate percentage
Throughput Metrics:
- Current throughput (events/second)
- Peak throughput
- Peak timestamp
Data Retention:
- Sliding window of last 24 hours
- Last 1000 latency measurements per metric
Update Frequency:
- Real-time updates on every event
- Aggregation scheduled every 60 seconds
6. Rule Engine
Technology: Drools 8.44.0 (KIE)
Purpose:
- Evaluate dynamic alert rules against metrics
- Support complex conditions without code changes
- Enable user-defined alerting logic
Rule Types:
- Threshold-based (e.g., error rate > 5%)
- Anomaly detection (planned)
- Pattern matching (planned)
Workflow:
- User creates rule via REST API
- Rule configuration stored in PostgreSQL
- DRL (Drools Rule Language) generated from JSON config
- Rule compiled and loaded into KIE container
- Executed asynchronously against metrics
- Triggers alerts when conditions match
Example Rule Configuration:
{
"name": "High Error Rate Alert",
"ruleType": "THRESHOLD",
"ruleConfig": {
"metric": "errorRate",
"operator": "GREATER_THAN",
"threshold": 5.0,
"windowMinutes": 5
},
"severity": "CRITICAL",
"priority": 1
}7. WebSocket Service
Technology: Spring WebSocket with STOMP
Configuration:
- Protocol: STOMP over SockJS
- Endpoint:
/ws - Topic:
/topic/metrics - Broadcast interval: 1 second
Message Format:
{
"summary": {
"totalEvents": 15000,
"eventsPerSecond": 25.3,
"avgLatency": 12.5,
"errorRate": 2.1
},
"eventsByType": { ... },
"eventsBySource": { ... },
"latencyMetrics": { ... },
"throughput": { ... }
}Client Reconnection:
- Automatic reconnection with exponential backoff
- Maximum 10 reconnection attempts
- 3-second interval between attempts
8. React Dashboard
Technology: React 19.2, TypeScript, Vite
Features:
- Real-time metric updates via WebSocket
- Interactive charts using Chart.js
- Multiple dashboard views:
- Overview
- Real-Time Monitoring
- Event Analytics
- Source Analytics
- User Analytics
- Performance Metrics
- Error Analysis
- Alerts and Anomalies
State Management:
- WebSocket connection state
- Metrics state updated from WebSocket
- Auto-reconnection logic
Styling:
- Tailwind CSS with blue color theme
- Responsive design
- Clean, professional UI
Data Flow
Event Ingestion Flow
-
Client sends event
POST /api/v1/events Content-Type: application/json { "eventType": "payment.failed", "source": "payment-service", "userId": "user_123", "severity": "ERROR" } -
EventController validates request
- Required fields:
eventType,source - Optional fields:
userId,severity,tags,metadata - Auto-generated:
timestamp,eventId,receivedAt
- Required fields:
-
EventService processes event
- Creates Event entity
- Generates unique
eventId - Sets default severity (INFO) if not provided
-
KafkaProducer publishes to Kafka
- Topic:
eventara.events.raw - Key:
eventId - Value: Event JSON
- Acknowledgment:
all(ensures write to all replicas)
- Topic:
-
Response returned to client
{ "status": "accepted", "eventId": "evt_a1b2c3d4", "eventType": "payment.failed", "timestamp": "2026-01-12T10:30:45.123Z", "message": "Event accepted for processing" }
Event Processing Flow
-
EventConsumer receives from Kafka
@KafkaListeneroneventara.events.raw- Manual acknowledgment enabled
-
Deduplication check
- Query:
existsByEventId(eventId) - Skip if already processed
- Query:
-
Persist to PostgreSQL
- Insert into
eventstable - Transaction managed by Spring
- Insert into
-
Update metrics
ComprehensiveMetricsService.recordEvent()- Increments counters
- Updates latency tracking
- Updates error rates
-
Acknowledge Kafka offset
- Manual
acknowledgment.acknowledge() - Prevents reprocessing
- Manual
Real-Time Broadcast Flow
-
MetricsWebSocketController scheduled task
- Runs every 1 second (
@Scheduled(fixedRate = 1000))
- Runs every 1 second (
-
Fetch current metrics
ComprehensiveMetricsService.getComprehensiveMetrics()
-
Broadcast to all connected clients
SimpMessagingTemplate.convertAndSend("/topic/metrics", metrics)
-
Dashboard receives update
- React WebSocket hook processes message
- Updates state
- Triggers re-render of charts and metrics
Alert Rule Evaluation Flow
-
RuleExecutionService scheduled or event-triggered
- Executes asynchronously
-
Load active rules from database
- Query:
findByStatus(RuleStatus.ACTIVE)
- Query:
-
Build KIE session with rules
- Load DRL files into KieFileSystem
- Compile with KieBuilder
- Create KieSession
-
Insert facts and fire rules
- Insert
MetricsFactobject - Call
kieSession.fireAllRules()
- Insert
-
Handle triggered alerts
AlertTriggerHandlerprocesses matches- Create
AlertHistoryrecord - Send notifications (if configured)
- Update rule statistics
Design Decisions
Why Kafka?
- Decoupling: Producers and consumers operate independently
- Scalability: Horizontal scaling of consumers
- Reliability: Durable storage and replay capability
- Backpressure handling: Consumers process at their own pace
Why PostgreSQL with JSONB?
- Structured data: Core fields indexed and queryable
- Flexibility: JSONB for tags and metadata without schema changes
- Performance: GIN indexes on JSONB for fast queries
- Maturity: Battle-tested, reliable, feature-rich
Why In-Memory Metrics?
- Speed: Sub-millisecond metric updates
- Real-time: No database roundtrips for dashboards
- Simplicity: Reduces infrastructure complexity
Trade-offs:
- Not horizontally scalable (single instance)
- Lost on restart (acceptable for real-time metrics)
- Future: Redis for distributed metrics
Why Drools?
- Dynamic rules: Change alert logic without code deployment
- Expressive: Complex conditions in declarative language
- Proven: Enterprise-grade rule engine
- Performance: Optimized pattern matching (Rete algorithm)
Why WebSocket?
- Real-time: Push updates without polling
- Efficiency: Single connection for multiple updates
- Scalability: Server-initiated messages
Scalability Considerations
Current Limitations
- Single Kafka broker
- Single PostgreSQL instance
- In-memory metrics (not distributed)
- Single Spring Boot instance
Scaling Path
Horizontal Scaling:
- Add more Kafka brokers and increase replication
- Add consumer group instances
- Use Redis for distributed metrics
- Load balance multiple API instances
- PostgreSQL read replicas for queries
Vertical Scaling:
- Increase JVM heap for metrics storage
- Larger PostgreSQL instance
- More Kafka partitions
Expected Capacity:
- Current setup: ~1000 events/second
- Optimized: ~10,000+ events/second
Security Considerations
Current State:
- No authentication on API endpoints
- No TLS/SSL encryption
- CORS enabled for dashboard
Production Requirements:
- API authentication (JWT, OAuth2)
- TLS for all HTTP traffic
- Kafka SASL authentication
- PostgreSQL SSL connections
- Network isolation
- Rate limiting
Monitoring and Observability
Currently Available:
- Health check endpoints
- Application logs (SLF4J)
- Kafka UI for topic inspection
- Swagger UI for API exploration
Planned:
- Prometheus metrics export
- Grafana dashboards
- Distributed tracing (OpenTelemetry)
- Alert on system health
Next Steps
- API Reference - Detailed endpoint documentation
- Configuration - Environment and settings
- Deployment - Production deployment guide