Observability Stack
Overview
The NeuronDB observability stack provides comprehensive monitoring, visualization, and distributed tracing for the entire ecosystem. The stack includes:
- Prometheus - Metrics collection, alerting, and querying
- Grafana - Pre-configured dashboards and visualization
- Jaeger - Distributed tracing for request flows
- Alertmanager - Alert routing and notification management
Key Features
- Complete Coverage: All modules and variants monitored (NeuronDB, NeuronAgent, NeuronMCP, NeuronDesktop)
- Detailed Metrics: Module-specific metrics with proper labeling
- Comprehensive Alerts: 40+ alert rules for all critical failure modes
- Performance Optimization: Recording rules for common queries
- Production Ready: Alertmanager integration with notification routing
- Pre-configured: Grafana dashboards and Prometheus rules included
Prometheus
Prometheus collects metrics from all NeuronDB ecosystem components and provides a query language (PromQL) for monitoring and alerting.
Configuration Files
The Prometheus configuration is located in prometheus/ directory:
prometheus.yml- Main Prometheus configurationalerts.yml- Alert rules (organized by module)recording_rules.yml- Pre-computed metrics for performancealertmanager.yml- Alertmanager configurationpostgres_exporter.yml- PostgreSQL exporter custom queriesservice_discovery.yml- Service discovery reference
Quick Start
Start Prometheus with Docker Compose
# Start Prometheus
docker compose -f docker-compose.observability.yml up -d prometheus
# Access Prometheus UI
# http://localhost:9090
# Check targets
# http://localhost:9090/targetsMetrics Endpoints
All services expose Prometheus-compatible metrics:
- NeuronDB: Via PostgreSQL exporter at
:9187/metrics - NeuronAgent:
:8080/metrics - NeuronDesktop API:
:8081/metrics - Infrastructure: Node exporter (
:9100/metrics), cAdvisor (:8080/metrics)
📋 Complete Prometheus Documentation: See prometheus/README.md for detailed configuration, metrics reference, and alert rules.
Grafana
Grafana provides pre-configured dashboards for visualizing NeuronDB ecosystem metrics, performance data, and health status.
Quick Start
Start Grafana with Docker Compose
# Start Grafana
docker compose -f docker-compose.observability.yml up -d grafana
# Access Grafana UI
# http://localhost:3001
# Default credentials: admin/admin
# Grafana will automatically provision:
# - Prometheus datasource
# - Pre-configured dashboardsPre-configured Dashboards
Grafana includes dashboards for:
- NeuronDB: Database health, query performance, index health, cache metrics
- NeuronAgent: Service availability, error rates, latency, execution metrics
- NeuronDesktop: API availability, error rates, connection metrics
- NeuronMCP: Service availability, tool execution, connection pool
- Infrastructure: System resources, container health, network metrics
Dashboard Provisioning
Grafana dashboards are automatically provisioned from grafana/provisioning/dashboards/ directory. The Prometheus datasource is configured in grafana/provisioning/datasources/prometheus.yml.
Custom Dashboards
Create custom dashboards in Grafana UI or add JSON files to grafana/dashboards/ directory.
Jaeger
Jaeger provides distributed tracing for request flows across all NeuronDB ecosystem components.
Quick Start
Start Jaeger with Docker Compose
# Start Jaeger
docker compose -f docker-compose.observability.yml up -d jaeger
# Access Jaeger UI
# http://localhost:16686
# Jaeger endpoints:
# - UI: :16686
# - OTLP gRPC: :4317
# - OTLP HTTP: :4318Features
- Distributed Tracing: Track requests across all services
- Service Map: Visualize service dependencies
- Trace Analysis: Identify bottlenecks and slow operations
- Performance Insights: Understand request latency breakdown
Docker Compose Setup
Use the docker-compose.observability.yml file to run the complete observability stack:
Start observability stack
# Start all observability services
docker compose -f docker-compose.observability.yml up -d
# Check status
docker compose -f docker-compose.observability.yml ps
# View logs
docker compose -f docker-compose.observability.yml logs -f
# Stop services
docker compose -f docker-compose.observability.yml downAccess URLs
- Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3001(admin/admin) - Jaeger:
http://localhost:16686 - Alertmanager:
http://localhost:9093(if enabled)
Kubernetes Setup
The Helm chart includes the complete observability stack. Enable it in your values file:
Enable observability in Helm values
# values.yaml
monitoring:
enabled: true
prometheus:
enabled: true
retention: "30d"
persistence:
enabled: true
size: "20Gi"
grafana:
enabled: true
adminPassword: "change-me" # Change in production!
persistence:
enabled: true
size: "10Gi"
jaeger:
enabled: trueAccess Services in Kubernetes
Port-forward to observability services
# Grafana
kubectl port-forward svc/neurondb-grafana 3001:3000 -n neurondb
# Access at: http://localhost:3001
# Prometheus
kubectl port-forward svc/neurondb-prometheus 9090:9090 -n neurondb
# Access at: http://localhost:9090
# Jaeger
kubectl port-forward svc/neurondb-jaeger 16686:16686 -n neurondb
# Access at: http://localhost:16686Service Discovery
Kubernetes deployments use ServiceMonitors for automatic service discovery. Prometheus automatically discovers and scrapes all NeuronDB ecosystem services.
Metrics Reference
Key metrics exposed by each component:
NeuronDB Metrics
neurondb_queries_total- Total number of queries (by query_type, index_type)neurondb_query_duration_seconds- Query duration histogram (by query_type)neurondb_index_size_bytes- Index size in bytes (by index_name, index_type)neurondb_vector_count- Number of vectors (by table_name)neurondb_cache_hits_total- Cache hits (by cache_type)neurondb_cache_misses_total- Cache misses (by cache_type)neurondb_worker_status- Worker status (by worker_id, status)neurondb_errors_total- Total errors (by error_type)
NeuronAgent Metrics
neurondb_agent_http_requests_total- Total HTTP requests (by method, endpoint, status)neurondb_agent_http_request_duration_seconds- HTTP request duration (by method, endpoint)neurondb_agent_executions_total- Agent executions (by agent_id, status)neurondb_agent_execution_duration_seconds- Execution duration (by agent_id)neurondb_agent_llm_calls_total- LLM API calls (by model, status)neurondb_agent_llm_tokens_total- LLM tokens (by model, type)neurondb_agent_memory_chunks_stored_total- Memory chunks stored (by agent_id)neurondb_agent_tool_executions_total- Tool executions (by tool_name, status)neurondb_agent_database_connections_active- Active DB connections
NeuronDesktop Metrics
neurondesktop_api_requests_total- Total API requests (by endpoint, method)neurondesktop_api_errors_total- API errors (by endpoint, error_type)neurondesktop_api_request_duration_seconds- Request duration (by endpoint)neurondesktop_active_connections- Active connectionsneurondesktop_active_mcp_connections- Active MCP connectionsneurondesktop_active_neurondb_connections- Active NeuronDB connectionsneurondesktop_active_agent_connections- Active agent connections
📋 Complete Metrics Reference: See Prometheus README for all available metrics with descriptions and labels.
Alert Rules
Prometheus includes 40+ alert rules organized by module, covering all critical failure modes:
NeuronDB Alerts
- NeuronDBServiceDown (Critical) - Service down > 1m
- NeuronDBConnectionFailure (Critical) - >5 failures in 5m
- NeuronDBHighQueryLatency (Warning) - P95 > 1s for 5m
- NeuronDBIndexHealthDegraded (Warning) - Health < 80% for 5m
- NeuronDBCacheHitRateLow (Warning) - Hit rate < 70% for 5m
- NeuronDBConnectionPoolExhausted (Critical) - Utilization > 90% for 5m
NeuronAgent Alerts
- NeuronAgentServiceDown (Critical) - Service down > 1m
- NeuronAgentHighErrorRate (Critical) - Error rate > 5% for 5m
- NeuronAgentHighLatency (Warning) - P95 > 1s for 5m
- NeuronAgentExecutionFailure (Critical) - >10 failures in 5m
- NeuronAgentDatabaseConnectionIssue (Warning) - >5 errors in 5m
Infrastructure Alerts
- HighCPUUsage (Warning) - CPU > 80% for 5m
- HighMemoryUsage (Warning) - Memory > 85% for 5m
- HighDiskUsage (Warning) - Disk > 85% for 5m
- PrometheusTargetDown (Critical) - Target down > 2m
📋 Complete Alert Rules: See alerts.yml for all alert rules with conditions and descriptions.
Additional Resources
- Prometheus README - Complete Prometheus documentation
- Alert Rules - All alert definitions
- Prometheus Config - Main configuration file
- Prometheus Documentation - Official Prometheus docs
- Grafana Documentation - Official Grafana docs
- Jaeger Documentation - Official Jaeger docs