DocumentationNeuronDB Reference
Documentation Branch: You are viewing documentation for the main branch (3.0.0-devel). Select a branch to view its documentation:

Data Types Reference

Overview

Complete reference for all data types, internal structures, and type system in NeuronDB.

PostgreSQL Compatibility: 16, 17, 18

Vector Types

vector

PostgreSQL Type: vector
C Structure: Vector
Storage: Extended (varlena)
Base Type: Float32 (4 bytes per dimension)

The main vector type in NeuronDB. It uses float32 precision. This is the primary type for storing embeddings and performing vector operations.

Limits:

  • Maximum Dimensions: 16,000
  • Minimum Dimensions: 1
  • Storage Overhead: 8 bytes (header + dimension)

Example usage

-- Create a vector
SELECT '[1.0, 2.0, 3.0]'::vector;

-- Create with dimension constraint
CREATE TABLE embeddings (
    id SERIAL PRIMARY KEY,
    embedding vector(384)  -- Fixed 384 dimensions
);

-- Insert vector
INSERT INTO embeddings (embedding) VALUES ('[0.1, 0.2, 0.3]'::vector);

halfvec

PostgreSQL Type: halfvec
C Structure: VectorF16
Base Type: Float16 (2 bytes per dimension)

Half-precision vector type providing 2x compression. Uses IEEE 754 half-precision floating point format.

Limits:

  • Maximum Dimensions: 4,000
  • Compression Ratio: 2x (compared to vector)
  • Precision: ~3 decimal digits

Example usage

-- Convert vector to halfvec
SELECT vector_to_halfvec('[1.0, 2.0, 3.0]'::vector);

-- Cast between types
SELECT '[1.0, 2.0, 3.0]'::vector::halfvec;

-- Create table with halfvec
CREATE TABLE embeddings_fp16 (
    id SERIAL PRIMARY KEY,
    embedding halfvec(384)
);

sparsevec

PostgreSQL Type: sparsevec
C Structure: SparseVector
Base Type: Sparse representation

Sparse vector type storing only non-zero values. Optimized for high-dimensional vectors with many zeros.

Limits:

  • Maximum Non-Zero Entries: 1,000
  • Maximum Dimensions: 1,000,000
  • Model Types: BM25 (0), SPLADE (1), ColBERTv2 (2)

binaryvec

PostgreSQL Type: binaryvec
Base Type: Binary (1 bit per dimension)

Binary vector type for 32x compression using Hamming distance.

Features:

  • Compression Ratio: 32x (compared to vector)
  • Distance Metric: Hamming distance only

Internal C Structures

Vector Structure

typedef struct Vector {
    int32  vl_len_;     /* varlena header (required) */
    int16  dim;         /* number of dimensions */
    int16  unused;      /* padding for alignment */
    float4 data[FLEXIBLE_ARRAY_MEMBER];  /* vector data */
} Vector;

Memory Layout

OffsetSizeField
04vl_len_ (varlena header)
42dim (dimension count)
62unused (padding)
84*dimdata[] (float32 array)

Total Size: offsetof(Vector, data) + sizeof(float4) * dim

Type Storage Formats

Storage Size Calculation

TypeBytes per DimensionOverhead
vector48 bytes
halfvec28 bytes
sparsevecVariable16 bytes
binaryvec0.125 (1 bit)8 bytes

Type Casting Rules

Implicit Casts

  • vectorhalfvec (implicit)
  • vectorsparsevec (explicit only)

Explicit Casts

Type casting examples

-- Vector to halfvec
SELECT '[1.0, 2.0, 3.0]'::vector::halfvec;

-- Vector to sparsevec
SELECT vector_to_sparsevec('[0, 0, 1.5, 0]'::vector);

-- Vector to binary
SELECT vector_to_binary('[1.0, -1.0, 0.5]'::vector);

Memory Layout

In-Memory Representation

  • Vectors stored as contiguous float32 arrays
  • Aligned to 8-byte boundaries for SIMD operations
  • GPU transfers use same layout (zero-copy when possible)

TOAST Behavior

PostgreSQL automatically uses TOAST for large values:

  • Inline storage: Vectors < 2KB (512 dimensions)
  • Extended storage: Vectors ≥ 2KB (512+ dimensions)
  • Compression: Enabled by default for extended storage

Quantization Formats

Quantization Types

  • Scalar Quantization: int8, uint8 (4x compression)
  • Product Quantization (PQ): 8x-16x compression
  • Binary Quantization: 32x compression (Hamming distance)
  • Ternary Quantization: 16x compression

When to Use Quantization

  • Large datasets where storage is a concern
  • Read-heavy workloads
  • Acceptable precision loss for speed/storage trade-offs