DocumentationNeuronDB Documentation

Advanced Features

Overview

Advanced Features provide production-ready capabilities for building sophisticated AI applications. These features combine multiple techniques to deliver superior search relevance, intelligent caching, and end-to-end RAG (Retrieval Augmented Generation) pipelines.

Key Capabilities

  • Hybrid Search: Combine vector similarity with full-text search for best results
  • Reranking: Improve relevance with cross-encoder and LLM-based reranking
  • RAG Pipelines: End-to-end retrieval augmented generation workflows
  • Intelligent Caching: Semantic caching for embeddings, queries, and results

Production Benefits

  • Higher Relevance: Hybrid search and reranking improve result quality
  • Lower Latency: Intelligent caching reduces computation and API calls
  • Better UX: RAG pipelines provide accurate, context-aware responses
  • Cost Efficiency: Caching reduces API costs and improves throughput

Reranking

Reranking improves search relevance by re-scoring initial results using more sophisticated models. This two-stage approach (retrieve then rerank) provides better accuracy than single-stage retrieval.

Reranking Approaches

Cross-Encoder Reranking

Cross-encoders process query-document pairs together, providing more accurate relevance scores than bi-encoders (used in initial retrieval).

Cross-encoder reranking

-- Retrieve initial candidates
WITH candidates AS (
  SELECT id, content
  FROM documents
  ORDER BY embedding <=> embed_text('PostgreSQL replication', 'text-embedding-ada-002')
  LIMIT 50
)
-- Rerank with cross-encoder
SELECT 
  id,
  content,
  neurondb_rerank(
    'cross-encoder/ms-marco-MiniLM-L-6-v2',
    content,
    'PostgreSQL replication'
  ) AS relevance_score
FROM candidates
ORDER BY relevance_score DESC
LIMIT 10;

LLM-Based Reranking

Use large language models to rerank results based on semantic understanding and reasoning.

LLM reranking

-- Rerank with LLM
SELECT 
  id,
  content,
  neurondb_llm_rerank(
    'gpt-4',
    content,
    'PostgreSQL replication',
    criteria => 'relevance, accuracy, completeness'
  ) AS llm_score
FROM (
  SELECT id, content
  FROM documents
  ORDER BY embedding <=> embed_text('PostgreSQL replication', 'text-embedding-ada-002')
  LIMIT 20
) candidates
ORDER BY llm_score DESC
LIMIT 5;

Ensemble Reranking

Combine multiple reranking models for improved accuracy and robustness.

Ensemble reranking

SELECT 
  id,
  content,
  -- Weighted combination of multiple rerankers
  neurondb_rerank('cross-encoder/model1', content, query) * 0.5 +
  neurondb_rerank('cross-encoder/model2', content, query) * 0.3 +
  neurondb_llm_rerank('gpt-4', content, query) * 0.2 AS ensemble_score
FROM candidates
ORDER BY ensemble_score DESC;

Reranking Best Practices

  • Retrieve 20-100 candidates, rerank top 10-20
  • Use cross-encoders for speed, LLMs for accuracy
  • Cache reranking results for common queries
  • Monitor reranking latency and adjust candidate count

RAG Pipelines

Retrieval Augmented Generation (RAG) enhances LLM responses by retrieving relevant context from your database before generating answers. This grounds LLM outputs in your actual data, reducing hallucinations and improving accuracy.

RAG Workflow

  1. User Question: "What is PostgreSQL replication?"
  2. Retrieve: Find relevant documents using hybrid search
  3. Rerank: Score and sort results by relevance
  4. Generate: LLM creates answer using retrieved context
  5. Response: Return answer with source citations

Complete RAG Pipeline

End-to-end RAG pipeline

-- 1. Document ingestion with embeddings
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536),
  metadata JSONB
);

INSERT INTO documents (content, embedding, metadata)
SELECT 
  content,
  embed_text(content, 'text-embedding-ada-002'),
  jsonb_build_object('source', 'docs', 'timestamp', now())
FROM source_documents;

-- 2. RAG query function
CREATE OR REPLACE FUNCTION rag_query(user_query TEXT)
RETURNS TABLE(answer TEXT, sources JSONB) AS $$
DECLARE
  query_embedding vector(1536);
  retrieved_docs TEXT;
BEGIN
  -- Generate query embedding
  query_embedding := embed_text(user_query, 'text-embedding-ada-002');
  
  -- Retrieve relevant documents (hybrid search)
  SELECT string_agg(content, E'

')
  INTO retrieved_docs
  FROM (
    SELECT content
    FROM documents
    ORDER BY embedding <=> query_embedding
    LIMIT 5
  ) top_docs;
  
  -- Generate answer with context
  RETURN QUERY
  SELECT 
    neurondb_llm_generate(
      'gpt-4',
      format('Context: %s

Question: %s

Answer:', retrieved_docs, user_query)
    ) AS answer,
    (
      SELECT jsonb_agg(jsonb_build_object('id', id, 'content', content))
      FROM (
        SELECT id, content
        FROM documents
        ORDER BY embedding <=> query_embedding
        LIMIT 5
      ) sources
    ) AS sources;
END;
$$ LANGUAGE plpgsql;

-- 3. Use RAG pipeline
SELECT * FROM rag_query('What is PostgreSQL replication?');

RAG Best Practices

  • Retrieve 3-10 relevant documents for context
  • Use hybrid search for better retrieval quality
  • Rerank results before passing to LLM
  • Include source citations in responses
  • Monitor answer quality and adjust retrieval parameters

Intelligent Caching

Intelligent caching reduces computation, API calls, and latency by storing frequently accessed data, embeddings, and query results. NeuronDB provides multiple caching layers for optimal performance.

Embedding Cache

Cache generated embeddings to avoid redundant API calls and computation.

Embedding cache

-- Configure embedding cache
SET neurondb.embedding_cache_size = 10000;  -- Cache 10K embeddings
SET neurondb.embedding_cache_ttl = 86400;   -- 24 hour TTL

-- Cached embeddings are automatically used
SELECT embed_text('sample text', 'text-embedding-ada-002');  -- First call: API
SELECT embed_text('sample text', 'text-embedding-ada-002');  -- Second call: Cache

-- Check cache statistics
SELECT * FROM neurondb_embedding_cache_stats();

Query Result Cache

Cache query results for identical or similar queries using semantic similarity.

Query result cache

-- Enable query result cache
SET neurondb.query_cache_enabled = on;
SET neurondb.query_cache_size = 1000;
SET neurondb.query_cache_ttl = 3600;  -- 1 hour

-- Cache is automatically used for identical queries
SELECT * FROM documents
WHERE embedding <=> embed_text('PostgreSQL', 'text-embedding-ada-002') < 0.3
ORDER BY embedding <=> embed_text('PostgreSQL', 'text-embedding-ada-002')
LIMIT 10;

Semantic Cache

Cache results for semantically similar queries, not just identical ones.

Semantic cache

-- Enable semantic cache
SET neurondb.semantic_cache_enabled = on;
SET neurondb.semantic_cache_threshold = 0.95;  -- 95% similarity threshold

-- Similar queries use cached results
SELECT * FROM documents
WHERE embedding <=> embed_text('PostgreSQL database', 'text-embedding-ada-002') < 0.3
ORDER BY embedding <=> embed_text('PostgreSQL database', 'text-embedding-ada-002')
LIMIT 10;
-- If 'PostgreSQL' was cached and similarity > 0.95, uses cache

Model Output Cache

Cache model inference results to avoid redundant computation.

Model output cache

-- Cache ONNX model outputs
SET neurondb.model_cache_enabled = on;
SET neurondb.model_cache_size = 5000;

-- Cached predictions are reused
SELECT predict_classification('model', features) FROM data;
SELECT predict_classification('model', features) FROM data;  -- Uses cache

Caching Best Practices

  • Enable caching for frequently accessed embeddings and queries
  • Set appropriate TTLs based on data freshness requirements
  • Monitor cache hit rates and adjust sizes accordingly
  • Use semantic caching for similar but not identical queries
  • Clear cache when data or models are updated

Production Patterns

Complete Search Pipeline

Production search pipeline

-- 1. Hybrid retrieval
WITH hybrid_results AS (
  SELECT id, content, hybrid_score
  FROM (
    -- Vector search (70%)
    SELECT id, content, (1.0 - distance) * 0.7 AS hybrid_score
    FROM (
      SELECT id, content, embedding <=> embed_text('query', 'text-embedding-ada-002') AS distance
      FROM documents
      ORDER BY distance
      LIMIT 50
    ) vector_results
    UNION ALL
    -- Full-text search (30%)
    SELECT id, content, ts_rank(...) * 0.3 AS hybrid_score
    FROM documents
    WHERE to_tsvector('english', content) @@ to_tsquery('english', 'query')
    LIMIT 50
  ) combined
  ORDER BY hybrid_score DESC
  LIMIT 20
)
-- 2. Reranking
SELECT 
  id,
  content,
  neurondb_rerank('cross-encoder/model', content, 'query') AS final_score
FROM hybrid_results
ORDER BY final_score DESC
LIMIT 10;

RAG with Caching

Cached RAG pipeline

-- RAG with intelligent caching
CREATE OR REPLACE FUNCTION cached_rag_query(user_query TEXT)
RETURNS TABLE(answer TEXT, sources JSONB) AS $$
DECLARE
  cached_answer TEXT;
BEGIN
  -- Check semantic cache first
  SELECT answer INTO cached_answer
  FROM rag_cache
  WHERE query_embedding <=> embed_text(user_query, 'text-embedding-ada-002') > 0.95
  ORDER BY created_at DESC
  LIMIT 1;
  
  IF cached_answer IS NOT NULL THEN
    RETURN QUERY SELECT cached_answer, '[]'::jsonb;
    RETURN;
  END IF;
  
  -- Generate new answer and cache it
  RETURN QUERY
  WITH generated AS (
    SELECT * FROM rag_query(user_query)
  )
  INSERT INTO rag_cache (query_embedding, answer, sources)
  SELECT embed_text(user_query, 'text-embedding-ada-002'), answer, sources
  FROM generated
  RETURNING answer, sources;
END;
$$ LANGUAGE plpgsql;

Related Documentation