DocumentationNeuronDB Documentation

Documentation Branch: You are viewing documentation for the main branch (3.0.0-devel). Select a branch to view its documentation:

main (3.0.0-devel) - Latest REL2_STABLE (2.0.0) - Stable REL1_STABLE (1.0.0) - Stable GitHub Repository →

Document Processing

Document Processing

Text processing and NLP capabilities.

Text Processing

Process and clean text:


-- Clean and normalize text
SELECT process_text(
    'Raw text with   multiple   spaces',
    '{"lowercase": true, "remove_extra_spaces": true}'::jsonb
) AS processed_text;

Chunking

Split documents into chunks:


-- Chunk text
SELECT chunk_text(
    'long document text...',
    500,  -- chunk size
    50    -- overlap
) AS chunks;

Tokenization


-- Tokenize text
SELECT tokenize_text('Hello world', 'whitespace') AS tokens;

Learn More

For detailed documentation on document processing, chunking strategies, tokenization, and NLP features, visit:

Document Processing Documentation

RAG Overview - RAG pipeline
Embedding Generation - Generate embeddings

PreviousComplete RAG Support

NextLLM Integration