Skip to Content
Architecture

Architecture

This document describes the technical architecture of Eventara, including its components, data flow, and design decisions.

System Overview

Eventara follows a streaming architecture pattern, designed for high throughput, reliability, and real-time processing of events.

┌─────────────────────────────────────────────────────────────┐ │ External Applications │ │ (Microservices, APIs, IoT Devices, Mobile Apps) │ └────────────────────────────┬────────────────────────────────┘ │ HTTP POST ┌─────────────────────────────────────────────────────────────┐ │ Eventara Platform │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Ingestion │───▶│ Kafka │───▶│ Analytics │ │ │ │ Service │ │ (Streaming) │ │ Engine │ │ │ │ (Spring Boot)│ │ │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ PostgreSQL │ │ Drools │ │ WebSocket │ │ │ │ (Storage) │ │ Rule Engine │ │ (STOMP) │ │ │ └──────────────┘ └──────────────┘ └──────┬───────┘ │ │ │ │ └───────────────────────────────────────────────────┼──────────┘ ┌──────────────┐ │ React │ │ Dashboard │ └──────────────┘

Core Components

1. Ingestion Service

Technology: Spring Boot 3.5.7, Java 21

Responsibilities:

  • Expose REST API for event ingestion
  • Validate incoming event payloads
  • Produce events to Kafka topics
  • Provide health and status endpoints

Key Features:

  • Request validation using Jakarta Bean Validation
  • Automatic timestamp generation
  • Unique event ID generation (evt_ prefix)
  • Error handling and meaningful error responses
  • OpenAPI/Swagger documentation

API Endpoints:

  • POST /api/v1/events - Ingest single event
  • GET /api/v1/events - Query events with pagination
  • GET /api/v1/events/{eventId} - Get event by ID
  • GET /api/v1/events/type/{eventType} - Filter by event type
  • GET /api/v1/events/health - Health check
  • GET /api/v1/events/test - Service status

2. Apache Kafka

Version: Confluent Platform 7.5.0

Configuration:

  • Brokers: Single broker (scalable to multiple)
  • Topics:
    • eventara.events.raw - Raw events from ingestion
    • eventara.events.processed - Processed events (planned)
  • Partitions: 3 per topic
  • Replication Factor: 1 (single-node setup)

Purpose:

  • Decouple producers from consumers
  • Provide event buffering and replay capability
  • Enable horizontal scaling of consumers
  • Ensure reliable event delivery with acknowledgments

3. Event Consumer

Technology: Spring Kafka

Responsibilities:

  • Consume events from Kafka topics
  • Persist events to PostgreSQL
  • Update in-memory metrics
  • Handle deduplication
  • Manual offset management

Processing Flow:

  1. Receive event from Kafka
  2. Check for duplicates using eventId
  3. Save to PostgreSQL
  4. Update metrics aggregation
  5. Manually acknowledge offset

Error Handling:

  • Failed events are not acknowledged
  • Kafka retries delivery
  • Future: Dead-letter queue for permanent failures

4. PostgreSQL Database

Version: PostgreSQL 14

Schema Management: Flyway migrations

Key Tables:

events

CREATE TABLE events ( id BIGSERIAL PRIMARY KEY, event_id VARCHAR(50) UNIQUE NOT NULL, event_type VARCHAR(100) NOT NULL, timestamp TIMESTAMP NOT NULL, source VARCHAR(100) NOT NULL, user_id VARCHAR(100), session_id VARCHAR(100), severity VARCHAR(20), tags JSONB, metadata JSONB, received_at TIMESTAMP NOT NULL ); -- Indexes for query optimization CREATE INDEX idx_event_type ON events(event_type); CREATE INDEX idx_timestamp ON events(timestamp); CREATE INDEX idx_user_id ON events(user_id); CREATE INDEX idx_source ON events(source);

alert_rules

Stores user-defined alert rules with Drools DRL configuration.

alert_history

Logs of triggered alerts with timestamps and details.

notification_channels

Configuration for notification destinations (email, Slack, webhooks).

rule_execution_log

Audit trail of rule evaluations.

notification_log

Delivery status of notifications.

5. Analytics Engine

Technology: In-memory metrics aggregation

Metrics Tracked:

Summary Metrics:

  • Total events
  • Events per second
  • Average latency
  • Error rate

Dimensional Metrics:

  • Events by type (count per event type)
  • Events by source (count per source service)
  • Events by severity (INFO, WARNING, ERROR, CRITICAL)
  • Events by user (unique users tracked)

Performance Metrics:

  • Latency percentiles (p50, p95, p99)
  • Latency by event type
  • Latency by source

Error Tracking:

  • Total errors
  • Errors by type
  • Errors by source
  • Error rate percentage

Throughput Metrics:

  • Current throughput (events/second)
  • Peak throughput
  • Peak timestamp

Data Retention:

  • Sliding window of last 24 hours
  • Last 1000 latency measurements per metric

Update Frequency:

  • Real-time updates on every event
  • Aggregation scheduled every 60 seconds

6. Rule Engine

Technology: Drools 8.44.0 (KIE)

Purpose:

  • Evaluate dynamic alert rules against metrics
  • Support complex conditions without code changes
  • Enable user-defined alerting logic

Rule Types:

  • Threshold-based (e.g., error rate > 5%)
  • Anomaly detection (planned)
  • Pattern matching (planned)

Workflow:

  1. User creates rule via REST API
  2. Rule configuration stored in PostgreSQL
  3. DRL (Drools Rule Language) generated from JSON config
  4. Rule compiled and loaded into KIE container
  5. Executed asynchronously against metrics
  6. Triggers alerts when conditions match

Example Rule Configuration:

{ "name": "High Error Rate Alert", "ruleType": "THRESHOLD", "ruleConfig": { "metric": "errorRate", "operator": "GREATER_THAN", "threshold": 5.0, "windowMinutes": 5 }, "severity": "CRITICAL", "priority": 1 }

7. WebSocket Service

Technology: Spring WebSocket with STOMP

Configuration:

  • Protocol: STOMP over SockJS
  • Endpoint: /ws
  • Topic: /topic/metrics
  • Broadcast interval: 1 second

Message Format:

{ "summary": { "totalEvents": 15000, "eventsPerSecond": 25.3, "avgLatency": 12.5, "errorRate": 2.1 }, "eventsByType": { ... }, "eventsBySource": { ... }, "latencyMetrics": { ... }, "throughput": { ... } }

Client Reconnection:

  • Automatic reconnection with exponential backoff
  • Maximum 10 reconnection attempts
  • 3-second interval between attempts

8. React Dashboard

Technology: React 19.2, TypeScript, Vite

Features:

  • Real-time metric updates via WebSocket
  • Interactive charts using Chart.js
  • Multiple dashboard views:
    • Overview
    • Real-Time Monitoring
    • Event Analytics
    • Source Analytics
    • User Analytics
    • Performance Metrics
    • Error Analysis
    • Alerts and Anomalies

State Management:

  • WebSocket connection state
  • Metrics state updated from WebSocket
  • Auto-reconnection logic

Styling:

  • Tailwind CSS with blue color theme
  • Responsive design
  • Clean, professional UI

Data Flow

Event Ingestion Flow

  1. Client sends event

    POST /api/v1/events Content-Type: application/json { "eventType": "payment.failed", "source": "payment-service", "userId": "user_123", "severity": "ERROR" }
  2. EventController validates request

    • Required fields: eventType, source
    • Optional fields: userId, severity, tags, metadata
    • Auto-generated: timestamp, eventId, receivedAt
  3. EventService processes event

    • Creates Event entity
    • Generates unique eventId
    • Sets default severity (INFO) if not provided
  4. KafkaProducer publishes to Kafka

    • Topic: eventara.events.raw
    • Key: eventId
    • Value: Event JSON
    • Acknowledgment: all (ensures write to all replicas)
  5. Response returned to client

    { "status": "accepted", "eventId": "evt_a1b2c3d4", "eventType": "payment.failed", "timestamp": "2026-01-12T10:30:45.123Z", "message": "Event accepted for processing" }

Event Processing Flow

  1. EventConsumer receives from Kafka

    • @KafkaListener on eventara.events.raw
    • Manual acknowledgment enabled
  2. Deduplication check

    • Query: existsByEventId(eventId)
    • Skip if already processed
  3. Persist to PostgreSQL

    • Insert into events table
    • Transaction managed by Spring
  4. Update metrics

    • ComprehensiveMetricsService.recordEvent()
    • Increments counters
    • Updates latency tracking
    • Updates error rates
  5. Acknowledge Kafka offset

    • Manual acknowledgment.acknowledge()
    • Prevents reprocessing

Real-Time Broadcast Flow

  1. MetricsWebSocketController scheduled task

    • Runs every 1 second (@Scheduled(fixedRate = 1000))
  2. Fetch current metrics

    • ComprehensiveMetricsService.getComprehensiveMetrics()
  3. Broadcast to all connected clients

    • SimpMessagingTemplate.convertAndSend("/topic/metrics", metrics)
  4. Dashboard receives update

    • React WebSocket hook processes message
    • Updates state
    • Triggers re-render of charts and metrics

Alert Rule Evaluation Flow

  1. RuleExecutionService scheduled or event-triggered

    • Executes asynchronously
  2. Load active rules from database

    • Query: findByStatus(RuleStatus.ACTIVE)
  3. Build KIE session with rules

    • Load DRL files into KieFileSystem
    • Compile with KieBuilder
    • Create KieSession
  4. Insert facts and fire rules

    • Insert MetricsFact object
    • Call kieSession.fireAllRules()
  5. Handle triggered alerts

    • AlertTriggerHandler processes matches
    • Create AlertHistory record
    • Send notifications (if configured)
    • Update rule statistics

Design Decisions

Why Kafka?

  • Decoupling: Producers and consumers operate independently
  • Scalability: Horizontal scaling of consumers
  • Reliability: Durable storage and replay capability
  • Backpressure handling: Consumers process at their own pace

Why PostgreSQL with JSONB?

  • Structured data: Core fields indexed and queryable
  • Flexibility: JSONB for tags and metadata without schema changes
  • Performance: GIN indexes on JSONB for fast queries
  • Maturity: Battle-tested, reliable, feature-rich

Why In-Memory Metrics?

  • Speed: Sub-millisecond metric updates
  • Real-time: No database roundtrips for dashboards
  • Simplicity: Reduces infrastructure complexity

Trade-offs:

  • Not horizontally scalable (single instance)
  • Lost on restart (acceptable for real-time metrics)
  • Future: Redis for distributed metrics

Why Drools?

  • Dynamic rules: Change alert logic without code deployment
  • Expressive: Complex conditions in declarative language
  • Proven: Enterprise-grade rule engine
  • Performance: Optimized pattern matching (Rete algorithm)

Why WebSocket?

  • Real-time: Push updates without polling
  • Efficiency: Single connection for multiple updates
  • Scalability: Server-initiated messages

Scalability Considerations

Current Limitations

  • Single Kafka broker
  • Single PostgreSQL instance
  • In-memory metrics (not distributed)
  • Single Spring Boot instance

Scaling Path

Horizontal Scaling:

  1. Add more Kafka brokers and increase replication
  2. Add consumer group instances
  3. Use Redis for distributed metrics
  4. Load balance multiple API instances
  5. PostgreSQL read replicas for queries

Vertical Scaling:

  • Increase JVM heap for metrics storage
  • Larger PostgreSQL instance
  • More Kafka partitions

Expected Capacity:

  • Current setup: ~1000 events/second
  • Optimized: ~10,000+ events/second

Security Considerations

Current State:

  • No authentication on API endpoints
  • No TLS/SSL encryption
  • CORS enabled for dashboard

Production Requirements:

  • API authentication (JWT, OAuth2)
  • TLS for all HTTP traffic
  • Kafka SASL authentication
  • PostgreSQL SSL connections
  • Network isolation
  • Rate limiting

Monitoring and Observability

Currently Available:

  • Health check endpoints
  • Application logs (SLF4J)
  • Kafka UI for topic inspection
  • Swagger UI for API exploration

Planned:

  • Prometheus metrics export
  • Grafana dashboards
  • Distributed tracing (OpenTelemetry)
  • Alert on system health

Next Steps

Last updated on