Audit Logging for ML and RAG Operations
Overview
NeuronDB provides comprehensive audit logging for ML inference and RAG operations, enabling compliance monitoring, security analysis, and usage tracking.
Features
- Audit logging for ML model inference calls
- Audit logging for RAG retrieve/generate operations
- Configurable retention periods
- Query interface for audit logs
- Input/output hashing for integrity verification
Configuration
Enable audit logging:
-- Enable ML inference audit logging SET neurondb.audit_ml_enabled = true; -- Enable RAG operation audit logging SET neurondb.audit_rag_enabled = true; -- Set retention period (days) SET neurondb.audit_retention_days = 365;
ML Inference Audit Logging
Automatic Logging
When neurondb.audit_ml_enabled is enabled, ML inference operations are automatically logged.
Manual Logging
-- Log ML inference operation SELECT log_ml_inference( model_id := 1, operation_type := 'predict', input_hash := encode(digest(input_data::text, 'sha256'), 'hex'), output_hash := encode(digest(output_data::text, 'sha256'), 'hex'), metadata := '{"batch_size": 100, "latency_ms": 45}'::jsonb );
Querying ML Audit Logs
-- Query ML inference audit logs SELECT * FROM query_audit_log( 'ml_inference', start_time := '2024-01-01'::timestamptz, end_time := '2024-12-31'::timestamptz, user_id := 'admin', operation_type := 'predict' );
RAG Operation Audit Logging
Automatic Logging
When neurondb.audit_rag_enabled is enabled, RAG operations are automatically logged.
Manual Logging
-- Log RAG operation SELECT log_rag_operation( pipeline_name := 'documents_rag', operation_type := 'retrieve', query_hash := encode(digest('What is machine learning?'::text, 'sha256'), 'hex'), result_count := 5, metadata := '{"k": 5, "rerank": true}'::jsonb );
Querying RAG Audit Logs
-- Query RAG operation audit logs SELECT * FROM query_audit_log( 'rag_operation', start_time := '2024-01-01'::timestamptz, end_time := '2024-12-31'::timestamptz, user_id := 'user123', operation_type := 'generate' );
Audit Log Schema
ML Inference Audit Log
CREATE TABLE neurondb.ml_inference_audit_log ( audit_id BIGSERIAL PRIMARY KEY, model_id INTEGER, operation_type TEXT NOT NULL, user_id TEXT DEFAULT CURRENT_USER, input_hash TEXT, output_hash TEXT, metadata JSONB, timestamp TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP );
RAG Operation Audit Log
CREATE TABLE neurondb.rag_operation_audit_log ( audit_id BIGSERIAL PRIMARY KEY, pipeline_name TEXT NOT NULL, operation_type TEXT NOT NULL, user_id TEXT DEFAULT CURRENT_USER, query_hash TEXT, result_count INTEGER DEFAULT 0, metadata JSONB, timestamp TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP );
Compliance Considerations
GDPR
- User IDs are logged for data access tracking
- Input/output hashes enable integrity verification without storing sensitive data
- Retention periods can be configured per compliance requirements
HIPAA
- Audit logs track all access to PHI-related ML models
- Query hashes enable compliance reporting
- Timestamps enable audit trail reconstruction
SOC 2
- Comprehensive logging of all ML/RAG operations
- User attribution for all operations
- Configurable retention for audit requirements
Best Practices
-
Retention Management: Regularly archive or delete old audit logs based on retention policies.
-
Indexing: Audit tables are indexed for efficient querying. Monitor table sizes and partition if needed.
-
Performance: Audit logging is asynchronous where possible to minimize impact on operations.
-
Monitoring: Set up alerts for unusual patterns in audit logs (e.g., excessive access, failed operations).
Log Rotation
Audit logs should be periodically rotated or archived based on retention policies:
-- Delete logs older than retention period DELETE FROM neurondb.ml_inference_audit_log WHERE timestamp < CURRENT_TIMESTAMP - (neurondb.audit_retention_days || ' days')::interval; DELETE FROM neurondb.rag_operation_audit_log WHERE timestamp < CURRENT_TIMESTAMP - (neurondb.audit_retention_days || ' days')::interval;