DocumentationNeuronDB Documentation
Documentation Branch: You are viewing documentation for the main branch (3.0.0-devel). Select a branch to view its documentation:

Document Processing

Document Processing

Text processing and NLP capabilities.

Text Processing

Process and clean text:

-- Clean and normalize text
SELECT process_text(
    'Raw text with   multiple   spaces',
    '{"lowercase": true, "remove_extra_spaces": true}'::jsonb
) AS processed_text;

Chunking

Split documents into chunks:

-- Chunk text
SELECT chunk_text(
    'long document text...',
    500,  -- chunk size
    50    -- overlap
) AS chunks;

Tokenization

-- Tokenize text
SELECT tokenize_text('Hello world', 'whitespace') AS tokens;

Learn More

For detailed documentation on document processing, chunking strategies, tokenization, and NLP features, visit:

Document Processing Documentation