Data Types Reference | NeuronDB Vector Types and Structures

Overview

Complete reference for all data types, internal structures, and type system in NeuronDB.

PostgreSQL Compatibility: 16, 17, 18

Vector Types

vector

PostgreSQL Type: vector
C Structure: Vector
Storage: Extended (varlena)
Base Type: Float32 (4 bytes per dimension)

The main vector type in NeuronDB. It uses float32 precision. This is the primary type for storing embeddings and performing vector operations.

Limits:

Maximum Dimensions: 16,000
Minimum Dimensions: 1
Storage Overhead: 8 bytes (header + dimension)

Example usage

-- Create a vector
SELECT '[1.0, 2.0, 3.0]'::vector;

-- Create with dimension constraint
CREATE TABLE embeddings (
    id SERIAL PRIMARY KEY,
    embedding vector(384)  -- Fixed 384 dimensions
);

-- Insert vector
INSERT INTO embeddings (embedding) VALUES ('[0.1, 0.2, 0.3]'::vector);

halfvec

PostgreSQL Type: halfvec
C Structure: VectorF16
Base Type: Float16 (2 bytes per dimension)

Half-precision vector type providing 2x compression. Uses IEEE 754 half-precision floating point format.

Limits:

Maximum Dimensions: 4,000
Compression Ratio: 2x (compared to vector)
Precision: ~3 decimal digits

Example usage

-- Convert vector to halfvec
SELECT vector_to_halfvec('[1.0, 2.0, 3.0]'::vector);

-- Cast between types
SELECT '[1.0, 2.0, 3.0]'::vector::halfvec;

-- Create table with halfvec
CREATE TABLE embeddings_fp16 (
    id SERIAL PRIMARY KEY,
    embedding halfvec(384)
);

sparsevec

PostgreSQL Type: sparsevec
C Structure: SparseVector
Base Type: Sparse representation

Sparse vector type storing only non-zero values. Optimized for high-dimensional vectors with many zeros.

Limits:

Maximum Non-Zero Entries: 1,000
Maximum Dimensions: 1,000,000
Model Types: BM25 (0), SPLADE (1), ColBERTv2 (2)

binaryvec

PostgreSQL Type: binaryvec
Base Type: Binary (1 bit per dimension)

Binary vector type for 32x compression using Hamming distance.

Features:

Compression Ratio: 32x (compared to vector)
Distance Metric: Hamming distance only

Internal C Structures

Vector Structure

typedef struct Vector {
    int32  vl_len_;     /* varlena header (required) */
    int16  dim;         /* number of dimensions */
    int16  unused;      /* padding for alignment */
    float4 data[FLEXIBLE_ARRAY_MEMBER];  /* vector data */
} Vector;

Memory Layout

Offset	Size	Field
0	4	vl_len_ (varlena header)
4	2	dim (dimension count)
6	2	unused (padding)
8	4*dim	data[] (float32 array)

Total Size: offsetof(Vector, data) + sizeof(float4) * dim

Type Storage Formats

Storage Size Calculation

Type	Bytes per Dimension	Overhead
`vector`	4	8 bytes
`halfvec`	2	8 bytes
`sparsevec`	Variable	16 bytes
`binaryvec`	0.125 (1 bit)	8 bytes

Type Casting Rules

Implicit Casts

vector → halfvec (implicit)
vector → sparsevec (explicit only)

Explicit Casts

Type casting examples

-- Vector to halfvec
SELECT '[1.0, 2.0, 3.0]'::vector::halfvec;

-- Vector to sparsevec
SELECT vector_to_sparsevec('[0, 0, 1.5, 0]'::vector);

-- Vector to binary
SELECT vector_to_binary('[1.0, -1.0, 0.5]'::vector);

Memory Layout

In-Memory Representation

Vectors stored as contiguous float32 arrays
Aligned to 8-byte boundaries for SIMD operations
GPU transfers use same layout (zero-copy when possible)

TOAST Behavior

PostgreSQL automatically uses TOAST for large values:

Inline storage: Vectors < 2KB (512 dimensions)
Extended storage: Vectors ≥ 2KB (512+ dimensions)
Compression: Enabled by default for extended storage

Quantization Formats

Quantization Types

Scalar Quantization: int8, uint8 (4x compression)
Product Quantization (PQ): 8x-16x compression
Binary Quantization: 32x compression (Hamming distance)
Ternary Quantization: 16x compression

When to Use Quantization

Large datasets where storage is a concern
Read-heavy workloads
Acceptable precision loss for speed/storage trade-offs