Embeddings represent data as dense vectors. They capture semantic meaning. Similar items have similar vectors. They enable similarity search and arithmetic operations. Word embeddings represent words as vectors. Sentence embeddings represent sentences. Document embeddings represent documents.
Embeddings transform discrete tokens into continuous vectors. They preserve semantic relationships. Words with similar meanings have similar vectors. They enable mathematical operations on meaning.
The diagram shows embedding space. Similar words cluster together. Relationships appear as vector differences. King - Man + Woman approximates Queen.
Word Embeddings
Word embeddings map words to vectors. Word2Vec learns from context. GloVe learns from co-occurrence statistics. Both capture semantic relationships. Pre-trained embeddings work well for many tasks.
Word2Vec has two architectures. Skip-gram predicts context from word. CBOW predicts word from context. Both learn useful representations. Training uses neural networks on large text corpora.
# Word Embeddings with Word2Vec
from gensim.models import Word2Vec
sentences =[
['king','queen','royal'],
['man','woman','person'],
['paris','france','city'],
['london','england','city']
]
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)
word_vectors = model.wv
# Find similar words
similar = word_vectors.most_similar('king', topn=3)
print("Similar to 'king': "+str(similar))
# Vector arithmetic
result = word_vectors['king']- word_vectors['man']+ word_vectors['woman']
SELECT word,1-(embedding <=>(SELECT embedding FROM word_embeddings WHERE word ='king'))AS similarity
FROM word_embeddings
WHERE word !='king'
ORDERBY similarity DESC
LIMIT5;
Word embeddings capture semantic relationships. They enable similarity search. They support arithmetic operations. They are foundational for NLP.
Detailed Word Embedding Training Methods
Word2Vec uses two architectures. Skip-gram predicts context words from target word. Continuous Bag of Words predicts target word from context. Both learn embeddings by predicting word co-occurrences.
Skip-gram maximizes probability of context words given target. P(w_{i-k}, ..., w_{i+k} | w_i). It works well for rare words. It requires more training data. It captures multiple contexts per word.
CBOW averages context word embeddings. It predicts target word from context average. It trains faster than skip-gram. It works well for frequent words. It uses less memory.
Training uses negative sampling. Instead of computing all vocabulary probabilities, sample negative examples. Reduces computation from O(V) to O(k) where k is number of negatives. Typical k is 5-20. Speeds up training significantly.
corpus =[[0,1,2,3],[1,2,3,4],[2,3,4,5]]# Word indices
model = Word2VecDetailed(vocab_size=10, embedding_dim=50)
model.train(corpus, epochs=5)
Embedding Quality Evaluation
Evaluate embeddings using intrinsic and extrinsic tasks. Intrinsic tasks test embedding properties directly. Extrinsic tasks test downstream performance.
Intrinsic tasks include word similarity and word analogy. Word similarity compares embedding similarity to human judgments. Word analogy tests relationships like king - man + woman ≈ queen. These tasks measure embedding quality directly.
Extrinsic tasks test embeddings in applications. Text classification uses embeddings as features. Named entity recognition uses embeddings for sequence labeling. Machine translation uses embeddings for alignment. Performance on these tasks measures practical value.
# Embedding Quality Evaluation
from sklearn.metrics.pairwise import cosine_similarity
The diagram shows word embedding space. Related words cluster together. Vector differences encode relationships.
Sentence Embeddings
Sentence embeddings represent entire sentences. They capture sentence meaning. They enable sentence similarity search. They work well for semantic search and clustering.
Detailed Sentence Embedding Methods
Averaging word embeddings is simple but limited. It computes mean of word vectors. It loses word order information. It works for short sentences. It fails for complex semantics.
Sentence encoders use neural networks. They process entire sentences. They preserve word order. They capture sentence structure. They work better than averaging.
Transformer-based encoders use BERT or similar models. They process sentences through transformer layers. They use [CLS] token or mean pooling. They capture rich semantic information. They work well for many tasks.
# Detailed Sentence Embedding Methods
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
Evaluate embeddings using multiple metrics. Intrinsic metrics test embedding properties. Extrinsic metrics test application performance. Both are important for assessment.
Intrinsic metrics include similarity correlation and analogy accuracy. Similarity correlation compares embedding similarity to human judgments. Higher correlation indicates better embeddings. Analogy accuracy tests word relationships. Higher accuracy indicates better structure.
Methods include averaging word embeddings, training sentence encoders, and using transformer models. Averaging is simple but loses word order. Sentence encoders preserve structure. Transformers capture complex relationships.
# Sentence Embeddings
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences =[
"The cat sits on the mat",
"A feline is on the rug",
"The weather is sunny today"
]
embeddings = model.encode(sentences)
# Compute similarity
similarity = np.dot(embeddings[0], embeddings[1])
print("Similarity between sentence 1 and 2: "+str(similarity))
# High similarity indicates similar meaning
Sentence embeddings enable semantic search. They find sentences with similar meaning. They work regardless of exact word matches.
The diagram shows sentence embedding space. Semantically similar sentences cluster together.
Document Embeddings
Document embeddings represent entire documents. They capture document topics and themes. They enable document similarity and clustering. They work well for information retrieval.
Methods include averaging sentence embeddings, training document encoders, and using transformer models with pooling. Document encoders preserve document structure. Transformers capture long-range dependencies.
# Document Embeddings
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
documents =[
"Machine learning is a subset of artificial intelligence...",
"Deep learning uses neural networks with multiple layers...",
"The weather forecast predicts rain tomorrow..."
]
doc_embeddings = model.encode(documents)
# Find similar documents
from sklearn.metrics.pairwise import cosine_similarity
Document embeddings enable semantic document search. They find documents with similar topics. They work for large document collections.
Embedding Similarity and Distance
Similarity measures compare embeddings. Cosine similarity measures angle between vectors. Euclidean distance measures straight-line distance. Dot product measures alignment. Each suits different use cases.
Cosine similarity is cos(θ) = (A·B) / (||A|| × ||B||). It ranges from -1 to 1. Higher values mean more similar. It ignores vector magnitudes. It works well for embeddings.
# Embedding Similarity
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances
embeddings = np.array([
[1.0,0.0,0.0],
[0.9,0.1,0.0],
[0.0,1.0,0.0]
])
# Cosine similarity
cos_sim = cosine_similarity(embeddings)
print("Cosine similarity:")
print(cos_sim)
# Euclidean distance
euc_dist = euclidean_distances(embeddings)
print("Euclidean distance:")
print(euc_dist)
Choose similarity measures based on needs. Cosine similarity works well for embeddings. Euclidean distance works for spatial data.
The diagram shows similarity computation. Vectors with small angles have high cosine similarity.
Embedding Arithmetic
Embedding arithmetic performs operations on meaning. King - Man + Woman approximates Queen. It demonstrates captured relationships. It enables analogy solving.
Arithmetic works because embeddings capture relationships. Vector differences encode relationships. Adding differences applies relationships. Results approximate semantic operations.
The diagram shows embedding arithmetic. Vector operations approximate semantic relationships.
Pre-trained Embeddings
Pre-trained embeddings are trained on large corpora. They capture general language patterns. They work well for many tasks. They save training time and data.
Common pre-trained embeddings include Word2Vec, GloVe, FastText, and transformer embeddings. Word2Vec and GloVe are word-level. FastText handles subwords. Transformers provide contextual embeddings.
similar = word_vectors.most_similar('computer', topn=5)
print("Similar to 'computer': "+str(similar))
Pre-trained embeddings provide strong baselines. They work well without fine-tuning. They enable quick prototyping.
Fine-tuning Embeddings
Fine-tuning adapts pre-trained embeddings to specific tasks. It improves performance on domain data. It requires task-specific training data. It balances general and specific knowledge.
Fine-tuning updates embedding weights. It preserves general knowledge. It learns task-specific patterns. It improves performance on target tasks.
# Fine-tuning Embeddings
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
model = SentenceTransformer('all-MiniLM-L6-v2')
# Task-specific examples
examples =[
InputExample(texts=['query about machine learning','document about ML']),
InputExample(texts=['query about weather','weather forecast document'])
Fine-tuning improves task performance. It adapts general embeddings to specific needs. It requires labeled task data.
Summary
Embeddings represent data as dense vectors. Word embeddings capture word meaning. Sentence embeddings capture sentence meaning. Document embeddings capture document topics. Similarity measures compare embeddings. Cosine similarity works well for embeddings. Embedding arithmetic performs operations on meaning. Pre-trained embeddings provide strong baselines. Fine-tuning adapts embeddings to tasks. Embeddings enable semantic search and similarity operations.