Document Processing
Text processing and NLP capabilities.
Text Processing
Process and clean text:
-- Clean and normalize text SELECT process_text( 'Raw text with multiple spaces', '{"lowercase": true, "remove_extra_spaces": true}'::jsonb ) AS processed_text;
Chunking
Split documents into chunks:
-- Chunk text SELECT chunk_text( 'long document text...', 500, -- chunk size 50 -- overlap ) AS chunks;
Tokenization
-- Tokenize text SELECT tokenize_text('Hello world', 'whitespace') AS tokens;
Learn More
For detailed documentation on document processing, chunking strategies, tokenization, and NLP features, visit:
Document Processing Documentation
Related Topics
- RAG Overview - RAG pipeline
- Embedding Generation - Generate embeddings