DocumentationNeuronDB Documentation
Documentation Branch: You are viewing documentation for the main branch (3.0.0-devel). Select a branch to view its documentation:
Serve ONNX models directly from PostgreSQL
Load ONNX models
Register models once, version them, and share across schemas. Use GitHub releases or object storage URLs for centralized distribution.
Register a model
Register a model
SELECT neurondb_register_model(
name => 'text-embedding-3-small',
version => '1.0.0',
storage_url => 'https://github.com/neurondb-ai/neurondb/releases/download/models/text-embedding-3-small.onnx',
runtime => 'onnx',
device => 'auto'
);Inspect registry
Inspect registry
SELECT name,
version,
metadata ->> 'owner' AS owner,
metadata ->> 'git_commit' AS git_commit,
created_at,
status
FROM neurondb_model_registry
ORDER BY created_at DESC;GPU batching & scheduling
NeuronDB orchestrates micro-batches per GPU worker. Configure queue sizes, max latency, and fallbacks.
PostgreSQL configuration
postgresql.conf
neurondb.gpu_enabled = on
neurondb.gpu_device_ids = '0,1'
neurondb.inference_batch_size = 32
neurondb.inference_max_latency_ms = 25
neurondb.inference_timeout_ms = 1000Session-level overrides
Session-level overrides
SET neurondb.session_inference_batch_size = 16;
SET neurondb.session_inference_max_latency = '15ms';
SELECT neurondb_embed_batch(
model_name => 'text-embedding-3-small',
inputs => ARRAY['vector search', 'pg extension', 'gpu batching']
);Model caching
Models are automatically cached in shared memory for fast access across sessions.
Cache statistics
SELECT * FROM neurondb_model_cache_stats();Next Steps
- Embedding Generation - Generate embeddings
- Performance Tuning - Optimize inference
- Model Management - Version and deploy models