DocumentationNeuronDB Documentation
Documentation Branch: You are viewing documentation for the main branch (3.0.0-devel). Select a branch to view its documentation:

⚡ Quick Start Guide

⚡ Quick Start Guide

Get started with NeuronDB in minutes!

Quick Start Difficulty


[!TIP] Start with the Simple Start Guide for a beginner-friendly walkthrough with detailed explanations.

For complete NeuronDB ecosystem setup, see the root QUICKSTART.md.


Minimal setup (copy-paste)

Docker (from repository root):

# 1. Start NeuronDB
docker compose -f docker/docker-compose.yml up -d neurondb

# 2. Wait until healthy (about 30–60 seconds)
docker compose -f docker/docker-compose.yml ps

# 3. Verify
docker compose -f docker/docker-compose.yml exec neurondb psql -U neurondb -d neurondb -c "CREATE EXTENSION IF NOT EXISTS neurondb; SELECT neurondb.version();"

Optional: load quickstart data (requires DB running on port 5433):

./scripts/neurondb-quickstart-data.sh

🎯 Goal

What you'll accomplish:

  • Install NeuronDB extension
  • Load sample data
  • Run your first vector search query
  • Understand basic concepts

Time required: 5-10 minutes


Prerequisites

Before you begin, make sure you have:

  • NeuronDB installed - See Installation Guide for setup instructions
  • PostgreSQL client - psql (or any SQL client)
  • 5-10 minutes - For complete quickstart
Verify Prerequisites
# Check if psql is installed
psql --version

# Check if Docker is installed (if using Docker)
docker --version
docker compose version

Step 1: Install NeuronDB

If you haven't installed NeuronDB yet, choose your method:

Fastest way to get started:

# From repository root (compose file is in docker/)
docker compose -f docker/docker-compose.yml up -d neurondb

# Wait for service to be healthy (30-60 seconds)
docker compose -f docker/docker-compose.yml ps neurondb

Expected output:

NAME                STATUS
neurondb-cpu        healthy

[!NOTE] Docker starts a PostgreSQL container with NeuronDB pre-installed. The first run takes 2 to 5 minutes to download images.

Option B: Native Installation

For production or custom setups:

Follow the detailed Installation Guide for native PostgreSQL installation.


Verify Installation

Test NeuronDB installation:

# With Docker Compose (from repo root)
docker compose -f docker/docker-compose.yml exec neurondb psql -U neurondb -d neurondb -c "CREATE EXTENSION IF NOT EXISTS neurondb;"

# Or with native PostgreSQL
psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS neurondb;"

Check the version:

# With Docker Compose
docker compose -f docker/docker-compose.yml exec neurondb psql -U neurondb -d neurondb -c "SELECT neurondb.version();"

# Or with native PostgreSQL
psql -d your_database -c "SELECT neurondb.version();"

Expected output:

 version
---------
 2.0
(1 row)

[!SUCCESS] If you see version 2.0 or similar, NeuronDB is installed and working correctly.


Step 2: Load Quickstart Data Pack

The quickstart data pack provides ~500 sample documents with pre-generated embeddings, ready for immediate use.

What's in the Data Pack?
  • ~500 documents - Sample text documents
  • Pre-generated embeddings - Vector representations (384 dimensions)
  • HNSW index - Pre-built index for fast search
  • Ready to query - No setup required

Easiest method - handles everything automatically:

# From repository root
./scripts/neurondb-cli.sh quickstart

What it does:

  1. Creates the quickstart_documents table
  2. Loads ~500 sample documents
  3. Creates HNSW index
  4. Verifies data is loaded

Option 2: Using the Loader Script

Manual control over the process:

# From repository root
./src/examples/quickstart/load_quickstart.sh

Option 3: Using psql Directly

For maximum control (from repository root):

# With native PostgreSQL
psql -d your_database -f src/examples/quickstart/quickstart_data.sql

# With Docker (connect from host; file on host)
psql "postgresql://neurondb:neurondb@localhost:5433/neurondb" -f src/examples/quickstart/quickstart_data.sql

Verify Data Loaded

Check that data was loaded successfully:

# Count documents
psql "postgresql://neurondb:neurondb@localhost:5433/neurondb" -c "SELECT COUNT(*) FROM quickstart_documents;"

Expected output:

 count
-------
   500
(1 row)

Check table structure:

\d quickstart_documents

Expected columns:

  • id - Document ID
  • title - Document title
  • content - Document content
  • embedding - Vector embedding (384 dimensions)

[!SUCCESS] Perfect! Your data is loaded and ready to query.


Step 3: Try SQL Recipes

The SQL recipe library provides ready-to-run queries for common operations.

Example 1: Basic Similarity Search 🎯

Find documents similar to a specific document:

-- Find documents similar to document #1
SELECT 
    id,
    title,
    embedding <=> (SELECT embedding FROM quickstart_documents WHERE id = 1) AS distance
FROM quickstart_documents
WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM quickstart_documents WHERE id = 1)
LIMIT 10;

What this does:

  1. Gets embedding of document #1
  2. Calculates cosine distance to all other documents
  3. Returns top 10 most similar documents

Expected output:

 id  | title                    |     distance      
-----+--------------------------+-------------------
  42 | Related Document Title   | 0.123456789012345
  87 | Another Similar Doc      | 0.234567890123456
  ...
(10 rows)

[!NOTE] Understanding distance: Lower distance = more similar. Cosine distance ranges from 0 (identical) to 2 (opposite).


Example 2: Query with Text Embedding 🔤

Search using a text query:

-- Generate embedding for query text
WITH query AS (
  SELECT embed_text('machine learning algorithms', 'all-MiniLM-L6-v2') AS q_vec
)
-- Find similar documents
SELECT 
    id,
    title,
    embedding <=> q.q_vec AS distance
FROM quickstart_documents, query q
ORDER BY embedding <=> q.q_vec
LIMIT 10;

What this does:

  1. Generates embedding for "machine learning algorithms"
  2. Searches for documents with similar embeddings
  3. Returns top 10 results

[!TIP] Embedding models: The all-MiniLM-L6-v2 model is fast and works well for general text. See Embedding generation for more options.


Example 3: Hybrid Search (Vector + Full-Text)

Combine vector similarity with PostgreSQL full-text search:

-- Hybrid search: vector + full-text
WITH query AS (
  SELECT 
    embed_text('machine learning', 'all-MiniLM-L6-v2') AS q_vec,
    to_tsquery('english', 'machine & learning') AS q_tsquery
)
SELECT 
    id,
    title,
    content,
    -- Combined score: 70% vector, 30% full-text
    (embedding <=> q.q_vec) * 0.7 + 
    (ts_rank(to_tsvector('english', content), q.q_tsquery) * 0.3) AS combined_score
FROM quickstart_documents, query q
WHERE to_tsvector('english', content) @@ q.q_tsquery
ORDER BY combined_score DESC
LIMIT 10;

What this does:

  1. Generates vector embedding for query
  2. Creates full-text search query
  3. Combines both scores (70% vector, 30% text)
  4. Returns top 10 results

[!NOTE] Why hybrid search? Vector search finds semantically similar content, while full-text search finds exact keyword matches. Combining both gives better results.


Add metadata filters to vector search:

-- Search with filters
WITH query AS (
  SELECT embed_text('technology', 'all-MiniLM-L6-v2') AS q_vec
)
SELECT 
    id,
    title,
    embedding <=> q.q_vec AS distance
FROM quickstart_documents, query q
WHERE id > 100  -- Example filter
  AND id < 200  -- Example filter
ORDER BY embedding <=> q.q_vec
LIMIT 10;

What this does:

  1. Generates query embedding
  2. Applies metadata filters (e.g., date range, category)
  3. Searches only within filtered subset
  4. Returns top 10 results

[!TIP] Filtering tips: Apply filters BEFORE vector search for better performance. PostgreSQL will use indexes on filter columns.


More SQL Recipes

📖 Additional Recipes

Reranking

Use MMR (Maximal Marginal Relevance) for diverse results:

SELECT * FROM neurondb.mmr_rerank(
  'quickstart_documents', 'embedding', 
  (SELECT embed_text('query text', 'all-MiniLM-L6-v2')),
  10,  -- top k
  0.7   -- lambda (diversity vs relevance)
);

Batch Embedding

Generate embeddings for multiple texts at once:

SELECT embed_text_batch(
  ARRAY['text1', 'text2', 'text3'],
  'all-MiniLM-L6-v2'
);

RAG Context Retrieval

Retrieve context for RAG pipelines:

SELECT * FROM neurondb.retrieve_context(
  (SELECT embed_text('query', 'all-MiniLM-L6-v2')),
  'quickstart_documents', 'embedding',
  10,  -- top k
  NULL  -- optional filters
);

🎓 Understanding the Results

Key Concepts

What is an Embedding?

An embedding is a vector. It represents the semantic meaning of text. Similar texts have similar embeddings.

Example:

  • "machine learning" → [0.1, 0.2, 0.3, ...] (384 numbers)
  • "artificial intelligence" → [0.12, 0.19, 0.31, ...] (similar numbers)
  • "banana" → [0.9, 0.1, 0.2, ...] (different numbers)

What is Distance?

Distance measures how similar two vectors are:

  • Lower distance = more similar
  • Higher distance = less similar

Distance metrics:

  • <=> - Cosine distance (0 = identical, 2 = opposite)
  • <-> - L2/Euclidean distance (0 = identical, ∞ = different)
  • <#> - Inner product (higher = more similar)

What is HNSW Index?

HNSW stands for Hierarchical Navigable Small World. It is an index. It makes vector search fast.

  • Without index: O(n) - checks every vector
  • With HNSW: O(log n) - checks only a few vectors

Trade-off: Slightly less accurate but much faster.


Next Steps

Continue your journey:


Tips for Success

Helpful Tips

Performance Tips

  • Use indexes - HNSW indexes make search 100x faster
  • Filter first - Apply WHERE clauses before vector search
  • Limit results - Use LIMIT to avoid processing too many rows
  • Batch operations - Use embed_text_batch for multiple embeddings

Development Tips

  • Start simple - Get basic search working first
  • Add complexity gradually - Try hybrid search after basic search works
  • Use examples - Copy working examples from recipes
  • Check logs - Use docker compose logs to debug issues

Learning Tips

  • Read the docs - Comprehensive documentation available
  • Try examples - Hands-on learning is best
  • Experiment - Try different queries and see what happens
  • Ask questions - Check troubleshooting or community

❓ Common Questions

❓ Frequently Asked Questions

Q: Why is my search slow?

A: Make sure you have an HNSW index:

CREATE INDEX ON quickstart_documents USING hnsw (embedding vector_cosine_ops);

Q: How do I change the embedding model?

A: Use a different model name in embed_text():

SELECT embed_text('text', 'sentence-transformers/all-mpnet-base-v2');

Q: How do I use my own data?

A: Yes! Create your own table and load your data:

CREATE TABLE my_docs (id SERIAL, content TEXT, embedding vector(384));

Q: How do I generate embeddings for my data?

A: Use embed_text() or embed_text_batch():

UPDATE my_docs SET embedding = embed_text(content, 'all-MiniLM-L6-v2');

DocumentDescription
Simple Start GuideBeginner-friendly walkthrough
Architecture GuideUnderstand components
Installation GuideDetailed installation options
SQL RecipesReady-to-run SQL examples
Complete DocumentationFull documentation index