Advanced Features
Overview
Advanced Features provide production-ready capabilities for building sophisticated AI applications. These features combine multiple techniques to deliver superior search relevance, intelligent caching, and end-to-end RAG (Retrieval Augmented Generation) pipelines.
Key Capabilities
- Hybrid Search: Combine vector similarity with full-text search for best results
- Reranking: Improve relevance with cross-encoder and LLM-based reranking
- RAG Pipelines: End-to-end retrieval augmented generation workflows
- Intelligent Caching: Semantic caching for embeddings, queries, and results
Production Benefits
- Higher Relevance: Hybrid search and reranking improve result quality
- Lower Latency: Intelligent caching reduces computation and API calls
- Better UX: RAG pipelines provide accurate, context-aware responses
- Cost Efficiency: Caching reduces API costs and improves throughput
Hybrid Search
Hybrid search combines vector similarity search (semantic understanding) withfull-text search (keyword matching) to deliver superior search results that understand both context and exact terms.
Why Hybrid Search?
- Vector Search Alone: Understands meaning but may miss exact keyword matches
- Full-Text Search Alone: Precise keywords but no semantic understanding
- Hybrid Search: Best of both worlds - semantic understanding + exact matching
Implementation
Hybrid search query
-- Combine vector and full-text search with weighted scoring
WITH vector_results AS (
SELECT id, content,
embedding <=> embed_text('PostgreSQL replication', 'text-embedding-ada-002') AS distance
FROM documents
ORDER BY distance
LIMIT 20
),
text_results AS (
SELECT id, content,
ts_rank(to_tsvector('english', content),
to_tsquery('english', 'PostgreSQL & replication')) AS rank
FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'PostgreSQL & replication')
ORDER BY rank DESC
LIMIT 20
)
SELECT
COALESCE(v.id, t.id) AS id,
COALESCE(v.content, t.content) AS content,
-- Weighted combination: 70% vector, 30% text
COALESCE(1.0 - v.distance, 0.0) * 0.7 + COALESCE(t.rank, 0.0) * 0.3 AS hybrid_score
FROM vector_results v
FULL OUTER JOIN text_results t ON v.id = t.id
ORDER BY hybrid_score DESC
LIMIT 10;Reciprocal Rank Fusion (RRF)
RRF combines rankings from multiple sources without requiring score normalization.
RRF hybrid search
WITH vector_ranked AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> embed_text('query', 'text-embedding-ada-002')) AS v_rank
FROM documents
LIMIT 100
),
text_ranked AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(...) DESC) AS t_rank
FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'query')
LIMIT 100
)
SELECT v.id,
-- RRF formula: 1/(60 + rank) for each source
1.0 / (60 + v.v_rank) + 1.0 / (60 + COALESCE(t.t_rank, 1000)) AS rrf_score
FROM vector_ranked v
LEFT JOIN text_ranked t ON v.id = t.id
ORDER BY rrf_score DESC
LIMIT 10;Best Practices
- Use 60-70% weight for vector search, 30-40% for full-text search
- Normalize scores from both sources before combining
- Consider query type: semantic queries favor vector, exact terms favor text
- Monitor recall and precision metrics to tune weights
Reranking
Reranking improves search relevance by re-scoring initial results using more sophisticated models. This two-stage approach (retrieve then rerank) provides better accuracy than single-stage retrieval.
Reranking Approaches
Cross-Encoder Reranking
Cross-encoders process query-document pairs together, providing more accurate relevance scores than bi-encoders (used in initial retrieval).
Cross-encoder reranking
-- Retrieve initial candidates
WITH candidates AS (
SELECT id, content
FROM documents
ORDER BY embedding <=> embed_text('PostgreSQL replication', 'text-embedding-ada-002')
LIMIT 50
)
-- Rerank with cross-encoder
SELECT
id,
content,
neurondb_rerank(
'cross-encoder/ms-marco-MiniLM-L-6-v2',
content,
'PostgreSQL replication'
) AS relevance_score
FROM candidates
ORDER BY relevance_score DESC
LIMIT 10;LLM-Based Reranking
Use large language models to rerank results based on semantic understanding and reasoning.
LLM reranking
-- Rerank with LLM
SELECT
id,
content,
neurondb_llm_rerank(
'gpt-4',
content,
'PostgreSQL replication',
criteria => 'relevance, accuracy, completeness'
) AS llm_score
FROM (
SELECT id, content
FROM documents
ORDER BY embedding <=> embed_text('PostgreSQL replication', 'text-embedding-ada-002')
LIMIT 20
) candidates
ORDER BY llm_score DESC
LIMIT 5;Ensemble Reranking
Combine multiple reranking models for improved accuracy and robustness.
Ensemble reranking
SELECT
id,
content,
-- Weighted combination of multiple rerankers
neurondb_rerank('cross-encoder/model1', content, query) * 0.5 +
neurondb_rerank('cross-encoder/model2', content, query) * 0.3 +
neurondb_llm_rerank('gpt-4', content, query) * 0.2 AS ensemble_score
FROM candidates
ORDER BY ensemble_score DESC;Reranking Best Practices
- Retrieve 20-100 candidates, rerank top 10-20
- Use cross-encoders for speed, LLMs for accuracy
- Cache reranking results for common queries
- Monitor reranking latency and adjust candidate count
RAG Pipelines
Retrieval Augmented Generation (RAG) enhances LLM responses by retrieving relevant context from your database before generating answers. This grounds LLM outputs in your actual data, reducing hallucinations and improving accuracy.
RAG Workflow
- User Question: "What is PostgreSQL replication?"
- Retrieve: Find relevant documents using hybrid search
- Rerank: Score and sort results by relevance
- Generate: LLM creates answer using retrieved context
- Response: Return answer with source citations
Complete RAG Pipeline
End-to-end RAG pipeline
-- 1. Document ingestion with embeddings
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536),
metadata JSONB
);
INSERT INTO documents (content, embedding, metadata)
SELECT
content,
embed_text(content, 'text-embedding-ada-002'),
jsonb_build_object('source', 'docs', 'timestamp', now())
FROM source_documents;
-- 2. RAG query function
CREATE OR REPLACE FUNCTION rag_query(user_query TEXT)
RETURNS TABLE(answer TEXT, sources JSONB) AS $$
DECLARE
query_embedding vector(1536);
retrieved_docs TEXT;
BEGIN
-- Generate query embedding
query_embedding := embed_text(user_query, 'text-embedding-ada-002');
-- Retrieve relevant documents (hybrid search)
SELECT string_agg(content, E'
')
INTO retrieved_docs
FROM (
SELECT content
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5
) top_docs;
-- Generate answer with context
RETURN QUERY
SELECT
neurondb_llm_generate(
'gpt-4',
format('Context: %s
Question: %s
Answer:', retrieved_docs, user_query)
) AS answer,
(
SELECT jsonb_agg(jsonb_build_object('id', id, 'content', content))
FROM (
SELECT id, content
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5
) sources
) AS sources;
END;
$$ LANGUAGE plpgsql;
-- 3. Use RAG pipeline
SELECT * FROM rag_query('What is PostgreSQL replication?');RAG Best Practices
- Retrieve 3-10 relevant documents for context
- Use hybrid search for better retrieval quality
- Rerank results before passing to LLM
- Include source citations in responses
- Monitor answer quality and adjust retrieval parameters
Intelligent Caching
Intelligent caching reduces computation, API calls, and latency by storing frequently accessed data, embeddings, and query results. NeuronDB provides multiple caching layers for optimal performance.
Embedding Cache
Cache generated embeddings to avoid redundant API calls and computation.
Embedding cache
-- Configure embedding cache
SET neurondb.embedding_cache_size = 10000; -- Cache 10K embeddings
SET neurondb.embedding_cache_ttl = 86400; -- 24 hour TTL
-- Cached embeddings are automatically used
SELECT embed_text('sample text', 'text-embedding-ada-002'); -- First call: API
SELECT embed_text('sample text', 'text-embedding-ada-002'); -- Second call: Cache
-- Check cache statistics
SELECT * FROM neurondb_embedding_cache_stats();Query Result Cache
Cache query results for identical or similar queries using semantic similarity.
Query result cache
-- Enable query result cache
SET neurondb.query_cache_enabled = on;
SET neurondb.query_cache_size = 1000;
SET neurondb.query_cache_ttl = 3600; -- 1 hour
-- Cache is automatically used for identical queries
SELECT * FROM documents
WHERE embedding <=> embed_text('PostgreSQL', 'text-embedding-ada-002') < 0.3
ORDER BY embedding <=> embed_text('PostgreSQL', 'text-embedding-ada-002')
LIMIT 10;Semantic Cache
Cache results for semantically similar queries, not just identical ones.
Semantic cache
-- Enable semantic cache
SET neurondb.semantic_cache_enabled = on;
SET neurondb.semantic_cache_threshold = 0.95; -- 95% similarity threshold
-- Similar queries use cached results
SELECT * FROM documents
WHERE embedding <=> embed_text('PostgreSQL database', 'text-embedding-ada-002') < 0.3
ORDER BY embedding <=> embed_text('PostgreSQL database', 'text-embedding-ada-002')
LIMIT 10;
-- If 'PostgreSQL' was cached and similarity > 0.95, uses cacheModel Output Cache
Cache model inference results to avoid redundant computation.
Model output cache
-- Cache ONNX model outputs
SET neurondb.model_cache_enabled = on;
SET neurondb.model_cache_size = 5000;
-- Cached predictions are reused
SELECT predict_classification('model', features) FROM data;
SELECT predict_classification('model', features) FROM data; -- Uses cacheCaching Best Practices
- Enable caching for frequently accessed embeddings and queries
- Set appropriate TTLs based on data freshness requirements
- Monitor cache hit rates and adjust sizes accordingly
- Use semantic caching for similar but not identical queries
- Clear cache when data or models are updated
Production Patterns
Complete Search Pipeline
Production search pipeline
-- 1. Hybrid retrieval
WITH hybrid_results AS (
SELECT id, content, hybrid_score
FROM (
-- Vector search (70%)
SELECT id, content, (1.0 - distance) * 0.7 AS hybrid_score
FROM (
SELECT id, content, embedding <=> embed_text('query', 'text-embedding-ada-002') AS distance
FROM documents
ORDER BY distance
LIMIT 50
) vector_results
UNION ALL
-- Full-text search (30%)
SELECT id, content, ts_rank(...) * 0.3 AS hybrid_score
FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'query')
LIMIT 50
) combined
ORDER BY hybrid_score DESC
LIMIT 20
)
-- 2. Reranking
SELECT
id,
content,
neurondb_rerank('cross-encoder/model', content, 'query') AS final_score
FROM hybrid_results
ORDER BY final_score DESC
LIMIT 10;RAG with Caching
Cached RAG pipeline
-- RAG with intelligent caching
CREATE OR REPLACE FUNCTION cached_rag_query(user_query TEXT)
RETURNS TABLE(answer TEXT, sources JSONB) AS $$
DECLARE
cached_answer TEXT;
BEGIN
-- Check semantic cache first
SELECT answer INTO cached_answer
FROM rag_cache
WHERE query_embedding <=> embed_text(user_query, 'text-embedding-ada-002') > 0.95
ORDER BY created_at DESC
LIMIT 1;
IF cached_answer IS NOT NULL THEN
RETURN QUERY SELECT cached_answer, '[]'::jsonb;
RETURN;
END IF;
-- Generate new answer and cache it
RETURN QUERY
WITH generated AS (
SELECT * FROM rag_query(user_query)
)
INSERT INTO rag_cache (query_embedding, answer, sources)
SELECT embed_text(user_query, 'text-embedding-ada-002'), answer, sources
FROM generated
RETURNING answer, sources;
END;
$$ LANGUAGE plpgsql;Related Documentation
- Hybrid Search - Detailed hybrid search guide
- Reranking - Complete reranking documentation
- RAG Pipelines - RAG implementation guide
- Vector Engine - Vector search capabilities
- Embedding Engine - Embedding generation
- Performance Tuning - Optimize advanced features