Architecture

This document describes the technical architecture of Eventara, including its components, data flow, and design decisions.

System Overview

Eventara follows a streaming architecture pattern, designed for high throughput, reliability, and real-time processing of events.


┌─────────────────────────────────────────────────────────────┐
│                    External Applications                      │
│      (Microservices, APIs, IoT Devices, Mobile Apps)        │
└────────────────────────────┬────────────────────────────────┘
                             │ HTTP POST
                             ▼
┌─────────────────────────────────────────────────────────────┐
│                      Eventara Platform                        │
│                                                               │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │  Ingestion   │───▶│    Kafka     │───▶│  Analytics   │  │
│  │   Service    │    │  (Streaming) │    │    Engine    │  │
│  │ (Spring Boot)│    │              │    │              │  │
│  └──────────────┘    └──────────────┘    └──────────────┘  │
│         │                    │                    │          │
│         ▼                    ▼                    ▼          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │  PostgreSQL  │    │    Drools    │    │  WebSocket   │  │
│  │  (Storage)   │    │ Rule Engine  │    │   (STOMP)    │  │
│  └──────────────┘    └──────────────┘    └──────┬───────┘  │
│                                                   │          │
└───────────────────────────────────────────────────┼──────────┘
                                                    │
                                                    ▼
                                          ┌──────────────┐
                                          │    React     │
                                          │  Dashboard   │
                                          └──────────────┘

Core Components

1. Ingestion Service

Technology: Spring Boot 3.5.7, Java 21

Responsibilities:

Expose REST API for event ingestion
Validate incoming event payloads
Produce events to Kafka topics
Provide health and status endpoints

Key Features:

Request validation using Jakarta Bean Validation
Automatic timestamp generation
Unique event ID generation (evt_ prefix)
Error handling and meaningful error responses
OpenAPI/Swagger documentation

API Endpoints:

POST /api/v1/events - Ingest single event
GET /api/v1/events - Query events with pagination
GET /api/v1/events/{eventId} - Get event by ID
GET /api/v1/events/type/{eventType} - Filter by event type
GET /api/v1/events/health - Health check
GET /api/v1/events/test - Service status

2. Apache Kafka

Version: Confluent Platform 7.5.0

Configuration:

Brokers: Single broker (scalable to multiple)
Topics:
- eventara.events.raw - Raw events from ingestion
- eventara.events.processed - Processed events (planned)
Partitions: 3 per topic
Replication Factor: 1 (single-node setup)

Purpose:

Decouple producers from consumers
Provide event buffering and replay capability
Enable horizontal scaling of consumers
Ensure reliable event delivery with acknowledgments

3. Event Consumer

Technology: Spring Kafka

Responsibilities:

Consume events from Kafka topics
Persist events to PostgreSQL
Update in-memory metrics
Handle deduplication
Manual offset management

Processing Flow:

Receive event from Kafka
Check for duplicates using eventId
Save to PostgreSQL
Update metrics aggregation
Manually acknowledge offset

Error Handling:

Failed events are not acknowledged
Kafka retries delivery
Future: Dead-letter queue for permanent failures

4. PostgreSQL Database

Version: PostgreSQL 14

Schema Management: Flyway migrations

Key Tables:

events


CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    event_id VARCHAR(50) UNIQUE NOT NULL,
    event_type VARCHAR(100) NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    source VARCHAR(100) NOT NULL,
    user_id VARCHAR(100),
    session_id VARCHAR(100),
    severity VARCHAR(20),
    tags JSONB,
    metadata JSONB,
    received_at TIMESTAMP NOT NULL
);
 
-- Indexes for query optimization
CREATE INDEX idx_event_type ON events(event_type);
CREATE INDEX idx_timestamp ON events(timestamp);
CREATE INDEX idx_user_id ON events(user_id);
CREATE INDEX idx_source ON events(source);

alert_rules

Stores user-defined alert rules with Drools DRL configuration.

alert_history

Logs of triggered alerts with timestamps and details.

notification_channels

Configuration for notification destinations (email, Slack, webhooks).

rule_execution_log

Audit trail of rule evaluations.

notification_log

Delivery status of notifications.

5. Analytics Engine

Technology: In-memory metrics aggregation

Metrics Tracked:

Summary Metrics:

Total events
Events per second
Average latency
Error rate

Dimensional Metrics:

Events by type (count per event type)
Events by source (count per source service)
Events by severity (INFO, WARNING, ERROR, CRITICAL)
Events by user (unique users tracked)

Performance Metrics:

Latency percentiles (p50, p95, p99)
Latency by event type
Latency by source

Error Tracking:

Total errors
Errors by type
Errors by source
Error rate percentage

Throughput Metrics:

Current throughput (events/second)
Peak throughput
Peak timestamp

Data Retention:

Sliding window of last 24 hours
Last 1000 latency measurements per metric

Update Frequency:

Real-time updates on every event
Aggregation scheduled every 60 seconds

6. Rule Engine

Technology: Drools 8.44.0 (KIE)

Purpose:

Evaluate dynamic alert rules against metrics
Support complex conditions without code changes
Enable user-defined alerting logic

Rule Types:

Threshold-based (e.g., error rate > 5%)
Anomaly detection (planned)
Pattern matching (planned)

Workflow:

User creates rule via REST API
Rule configuration stored in PostgreSQL
DRL (Drools Rule Language) generated from JSON config
Rule compiled and loaded into KIE container
Executed asynchronously against metrics
Triggers alerts when conditions match

Example Rule Configuration:


{
  "name": "High Error Rate Alert",
  "ruleType": "THRESHOLD",
  "ruleConfig": {
    "metric": "errorRate",
    "operator": "GREATER_THAN",
    "threshold": 5.0,
    "windowMinutes": 5
  },
  "severity": "CRITICAL",
  "priority": 1
}

7. WebSocket Service

Technology: Spring WebSocket with STOMP

Configuration:

Protocol: STOMP over SockJS
Endpoint: /ws
Topic: /topic/metrics
Broadcast interval: 1 second

Message Format:


{
  "summary": {
    "totalEvents": 15000,
    "eventsPerSecond": 25.3,
    "avgLatency": 12.5,
    "errorRate": 2.1
  },
  "eventsByType": { ... },
  "eventsBySource": { ... },
  "latencyMetrics": { ... },
  "throughput": { ... }
}

Client Reconnection:

Automatic reconnection with exponential backoff
Maximum 10 reconnection attempts
3-second interval between attempts

8. React Dashboard

Technology: React 19.2, TypeScript, Vite

Features:

Real-time metric updates via WebSocket
Interactive charts using Chart.js
Multiple dashboard views:
- Overview
- Real-Time Monitoring
- Event Analytics
- Source Analytics
- User Analytics
- Performance Metrics
- Error Analysis
- Alerts and Anomalies

State Management:

WebSocket connection state
Metrics state updated from WebSocket
Auto-reconnection logic

Styling:

Tailwind CSS with blue color theme
Responsive design
Clean, professional UI

Data Flow

Event Ingestion Flow

Client sends event


POST /api/v1/events
Content-Type: application/json
{
  "eventType": "payment.failed",
  "source": "payment-service",
  "userId": "user_123",
  "severity": "ERROR"
}

EventController validates request
- Required fields: eventType, source
- Optional fields: userId, severity, tags, metadata
- Auto-generated: timestamp, eventId, receivedAt
EventService processes event
- Creates Event entity
- Generates unique eventId
- Sets default severity (INFO) if not provided
KafkaProducer publishes to Kafka
- Topic: eventara.events.raw
- Key: eventId
- Value: Event JSON
- Acknowledgment: all (ensures write to all replicas)

Response returned to client


{
  "status": "accepted",
  "eventId": "evt_a1b2c3d4",
  "eventType": "payment.failed",
  "timestamp": "2026-01-12T10:30:45.123Z",
  "message": "Event accepted for processing"
}

Event Processing Flow

EventConsumer receives from Kafka
- @KafkaListener on eventara.events.raw
- Manual acknowledgment enabled
Deduplication check
- Query: existsByEventId(eventId)
- Skip if already processed
Persist to PostgreSQL
- Insert into events table
- Transaction managed by Spring
Update metrics
- ComprehensiveMetricsService.recordEvent()
- Increments counters
- Updates latency tracking
- Updates error rates
Acknowledge Kafka offset
- Manual acknowledgment.acknowledge()
- Prevents reprocessing

Real-Time Broadcast Flow

MetricsWebSocketController scheduled task
- Runs every 1 second (@Scheduled(fixedRate = 1000))
Fetch current metrics
- ComprehensiveMetricsService.getComprehensiveMetrics()
Broadcast to all connected clients
- SimpMessagingTemplate.convertAndSend("/topic/metrics", metrics)
Dashboard receives update
- React WebSocket hook processes message
- Updates state
- Triggers re-render of charts and metrics

Alert Rule Evaluation Flow

RuleExecutionService scheduled or event-triggered
- Executes asynchronously
Load active rules from database
- Query: findByStatus(RuleStatus.ACTIVE)
Build KIE session with rules
- Load DRL files into KieFileSystem
- Compile with KieBuilder
- Create KieSession
Insert facts and fire rules
- Insert MetricsFact object
- Call kieSession.fireAllRules()
Handle triggered alerts
- AlertTriggerHandler processes matches
- Create AlertHistory record
- Send notifications (if configured)
- Update rule statistics

Design Decisions

Why Kafka?

Decoupling: Producers and consumers operate independently
Scalability: Horizontal scaling of consumers
Reliability: Durable storage and replay capability
Backpressure handling: Consumers process at their own pace

Why PostgreSQL with JSONB?

Structured data: Core fields indexed and queryable
Flexibility: JSONB for tags and metadata without schema changes
Performance: GIN indexes on JSONB for fast queries
Maturity: Battle-tested, reliable, feature-rich

Why In-Memory Metrics?

Speed: Sub-millisecond metric updates
Real-time: No database roundtrips for dashboards
Simplicity: Reduces infrastructure complexity

Trade-offs:

Not horizontally scalable (single instance)
Lost on restart (acceptable for real-time metrics)
Future: Redis for distributed metrics

Why Drools?

Dynamic rules: Change alert logic without code deployment
Expressive: Complex conditions in declarative language
Proven: Enterprise-grade rule engine
Performance: Optimized pattern matching (Rete algorithm)

Why WebSocket?

Real-time: Push updates without polling
Efficiency: Single connection for multiple updates
Scalability: Server-initiated messages

Scalability Considerations

Current Limitations

Single Kafka broker
Single PostgreSQL instance
In-memory metrics (not distributed)
Single Spring Boot instance

Scaling Path

Horizontal Scaling:

Add more Kafka brokers and increase replication
Add consumer group instances
Use Redis for distributed metrics
Load balance multiple API instances
PostgreSQL read replicas for queries

Vertical Scaling:

Increase JVM heap for metrics storage
Larger PostgreSQL instance
More Kafka partitions

Expected Capacity:

Current setup: ~1000 events/second
Optimized: ~10,000+ events/second

Security Considerations

Current State:

No authentication on API endpoints
No TLS/SSL encryption
CORS enabled for dashboard

Production Requirements:

API authentication (JWT, OAuth2)
TLS for all HTTP traffic
Kafka SASL authentication
PostgreSQL SSL connections
Network isolation
Rate limiting

Monitoring and Observability

Currently Available:

Health check endpoints
Application logs (SLF4J)
Kafka UI for topic inspection
Swagger UI for API exploration

Planned:

Prometheus metrics export
Grafana dashboards
Distributed tracing (OpenTelemetry)
Alert on system health

Next Steps

API Reference - Detailed endpoint documentation
Configuration - Environment and settings
Deployment - Production deployment guide