Advanced RAG Overview
Advanced RAG improves basic RAG with hybrid search, reranking, and multi-vector approaches. It combines semantic and keyword search. It reranks results for better quality. It uses multiple embeddings per document. It handles temporal information.
Advanced techniques improve retrieval quality. They increase answer accuracy. They reduce hallucinations. They enable complex queries.
The diagram shows advanced RAG flow. Hybrid search combines methods. Reranking improves order. Multi-vector handles complexity.
Hybrid Search
Hybrid search combines semantic and keyword search. It uses both embeddings and term matching. It improves recall and precision. It handles diverse query types.
Hybrid methods include score fusion, reciprocal rank fusion, and weighted combination. Score fusion averages normalized scores. Reciprocal rank fusion combines ranks. Weighted combination uses configurable weights.
# Hybrid Searchdef hybrid_search(query, documents, embeddings, index, alpha=0.5, top_k=10):# Semantic searchsemantic_scores = semantic_search(query, embeddings, index, top_k*2)# Keyword searchkeyword_scores = keyword_search(query, documents, top_k*2)# Normalize scoressemantic_scores = normalize_scores(semantic_scores)keyword_scores = normalize_scores(keyword_scores)# Combinehybrid_scores = alpha * semantic_scores + (1 - alpha) * keyword_scores# Rerankranked_indices = np.argsort(hybrid_scores)[::-1][:top_k]return ranked_indices# Exampleresults = hybrid_search("machine learning", documents, embeddings, index, alpha=0.6)print("Hybrid search results: " + str(results))
Hybrid search improves retrieval quality. It combines strengths of both methods. It handles diverse queries.
The diagram shows hybrid search. Semantic and keyword results combine. Final results improve quality.
Reranking Strategies
Reranking improves result order. It uses more sophisticated models. It considers query-document relationships. It improves precision at top ranks.
Reranking methods include cross-encoders, learned-to-rank, and LLM-based reranking. Cross-encoders compute query-document similarity. Learned-to-rank uses machine learning. LLM-based uses language models.
# Rerankingfrom sentence_transformers import CrossEncoderreranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')def rerank_results(query, documents, top_k=5):pairs = [[query, doc] for doc in documents]scores = reranker.predict(pairs)ranked_indices = np.argsort(scores)[::-1][:top_k]return [documents[i] for i in ranked_indices]# Examplequery = "machine learning tutorial"documents = ["ML guide", "Deep learning", "AI basics"]reranked = rerank_results(query, documents)print("Reranked results: " + str(reranked))
Reranking improves result quality. It uses more computation. It provides better precision.
The diagram shows reranking process. Initial results retrieved. Cross-encoder reranks candidates. Top results selected for final output.
Detailed Reranking Strategies
Cross-encoders process query and document together. They compute attention between query and document tokens. They capture fine-grained interactions. They are more accurate than bi-encoders. They are slower due to pairwise computation.
Learned-to-rank uses machine learning models. Features include query-document similarity, document length, position in initial ranking. Models learn optimal feature combinations. They improve ranking quality. They require training data.
LLM-based reranking uses language models. They score documents using prompts. They understand context better. They are more expensive. They provide high-quality reranking.
# Detailed Reranking Implementationfrom sentence_transformers import CrossEncoderimport numpy as npfrom sklearn.ensemble import RandomForestRegressorclass RerankingSystem:def __init__(self, method='cross_encoder'):self.method = methodif method == 'cross_encoder':self.reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')elif method == 'learned_to_rank':self.ltr_model = RandomForestRegressor(n_estimators=100)self.feature_names = ['similarity', 'doc_length', 'position', 'query_length']def cross_encoder_rerank(self, query, documents, top_k=5):# Create query-document pairspairs = [[query, doc] for doc in documents]# Score pairsscores = self.reranker.predict(pairs)# Rank by scoreranked_indices = np.argsort(scores)[::-1][:top_k]ranked_docs = [documents[i] for i in ranked_indices]ranked_scores = scores[ranked_indices]return ranked_docs, ranked_scoresdef learned_to_rank_rerank(self, query, documents, initial_scores, top_k=5):# Extract featuresfeatures = []for i, doc in enumerate(documents):feature_vector = [initial_scores[i], # Initial similarity scorelen(doc), # Document lengthi, # Position in initial rankinglen(query) # Query length]features.append(feature_vector)# Predict reranking scoresrerank_scores = self.ltr_model.predict(features)# Rank by reranking scoresranked_indices = np.argsort(rerank_scores)[::-1][:top_k]ranked_docs = [documents[i] for i in ranked_indices]return ranked_docs, rerank_scores[ranked_indices]def train_ltr_model(self, queries, documents_list, initial_scores_list, relevance_labels):"""Train learned-to-rank model"""X_train = []y_train = []for queries_batch, docs_batch, scores_batch, labels_batch in zip(queries, documents_list, initial_scores_list, relevance_labels):for query, docs, scores, labels in zip(queries_batch, docs_batch, scores_batch, labels_batch):for i, (doc, score, label) in enumerate(zip(docs, scores, labels)):features = [score, len(doc), i, len(query)]X_train.append(features)y_train.append(label)self.ltr_model.fit(X_train, y_train)return self.ltr_model# Examplereranker = RerankingSystem(method='cross_encoder')query = "machine learning tutorial"documents = ["ML guide", "Deep learning basics", "AI introduction", "Neural networks explained"]reranked, scores = reranker.cross_encoder_rerank(query, documents, top_k=3)print("Reranked documents: " + str(reranked))print("Reranking scores: " + str(scores))
Reranking Performance Optimization
Optimize reranking for production use. Cache frequent query-document pairs. Use approximate reranking for large candidate sets. Batch process multiple queries together.
Two-stage reranking uses fast model first. It filters to top candidates. It uses slow model on filtered set. This balances accuracy and speed.
# Optimized Reranking Pipelineclass OptimizedReranking:def __init__(self):self.fast_reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-3-v2') # Faster, smallerself.slow_reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-12-v2') # Slower, betterself.cache = {}def rerank_optimized(self, query, documents, initial_scores, top_k=5, use_cache=True):# Stage 1: Fast reranking on all candidatesif use_cache and query in self.cache:cached_results = self.cache[query]if len(documents) == len(cached_results['docs']):return cached_results['docs'][:top_k], cached_results['scores'][:top_k]# Fast rerankingpairs = [[query, doc] for doc in documents]fast_scores = self.fast_reranker.predict(pairs)# Filter to top candidates (e.g., top 20)filter_k = min(20, len(documents))top_indices = np.argsort(fast_scores)[::-1][:filter_k]top_docs = [documents[i] for i in top_indices]# Stage 2: Slow reranking on filtered settop_pairs = [[query, doc] for doc in top_docs]slow_scores = self.slow_reranker.predict(top_pairs)# Final rankingfinal_indices = np.argsort(slow_scores)[::-1][:top_k]final_docs = [top_docs[i] for i in final_indices]final_scores = slow_scores[final_indices]# Cache resultsif use_cache:self.cache[query] = {'docs': final_docs, 'scores': final_scores}return final_docs, final_scores# Batch processingdef batch_rerank(queries, documents_list, reranker, batch_size=32):"""Process multiple queries in batches"""all_results = []for i in range(0, len(queries), batch_size):batch_queries = queries[i:i+batch_size]batch_docs = documents_list[i:i+batch_size]batch_results = []for query, docs in zip(batch_queries, batch_docs):reranked, scores = reranker.rerank_optimized(query, docs, top_k=5)batch_results.append((reranked, scores))all_results.extend(batch_results)return all_results
Multi-vector Approaches
Multi-vector approaches use multiple embeddings per document. They capture different aspects. They improve retrieval coverage. They handle complex documents.
Methods include sentence-level embeddings, chunk-level embeddings, and aspect-based embeddings. Sentence-level captures fine-grained information. Chunk-level captures context. Aspect-based captures specific aspects.
# Multi-vector Approachdef create_multi_vectors(document):# Sentence embeddingssentences = split_sentences(document)sentence_embs = embedder.encode(sentences)# Chunk embeddingschunks = chunk_document(document)chunk_embs = embedder.encode(chunks)# Aspect embeddingsaspects = extract_aspects(document)aspect_embs = embedder.encode(aspects)return {'sentences': sentence_embs,'chunks': chunk_embs,'aspects': aspect_embs}# Search across all vectorsdef multi_vector_search(query, multi_vectors):query_emb = embedder.encode([query])all_scores = []for doc_id, vectors in multi_vectors.items():for vec_type, embs in vectors.items():scores = cosine_similarity(query_emb, embs)[0]all_scores.append((doc_id, vec_type, max(scores)))return sorted(all_scores, key=lambda x: x[2], reverse=True)
Multi-vector approaches improve coverage. They capture document complexity. They enable better retrieval.
The diagram shows multi-query process. Original query generates multiple queries. Each query retrieves results. Results combined and reranked. Improves coverage and recall.
Temporal Search Patterns
Temporal search handles time-sensitive information. It considers document timestamps. It prioritizes recent information. It enables time-based filtering.
Temporal methods include time-weighted scoring, recency boosting, and time-based filtering. Time-weighted combines relevance and recency. Recency boosts recent documents. Time-based filters by time ranges.
# Temporal Searchdef temporal_search(query, documents, timestamps, alpha=0.7, top_k=10):# Relevance scoresrelevance_scores = compute_relevance(query, documents)# Recency scoresmax_time = max(timestamps)recency_scores = [(max_time - t).days for t in timestamps]recency_scores = normalize(recency_scores)# Combinecombined_scores = alpha * relevance_scores + (1 - alpha) * recency_scoresranked_indices = np.argsort(combined_scores)[::-1][:top_k]return ranked_indices# Exampletimestamps = [datetime(2024, 1, 1), datetime(2024, 2, 1), datetime(2024, 3, 1)]results = temporal_search("AI news", documents, timestamps)print("Temporal search results: " + str(results))
Temporal search handles time-sensitive queries. It prioritizes recent information. It improves relevance for time-dependent topics.
Query Routing
Query routing directs queries to appropriate retrievers. It analyzes query characteristics. It selects best retrieval method. It improves efficiency and quality.
Routing methods include rule-based, learned, and hybrid routing. Rule-based uses heuristics. Learned uses machine learning. Hybrid combines approaches.
# Query Routingdef route_query(query):# Analyze queryhas_keywords = has_exact_terms(query)is_semantic = is_meaning_based(query)if has_keywords and is_semantic:return 'hybrid'elif has_keywords:return 'keyword'else:return 'semantic'def routed_search(query, documents, embeddings, index):route = route_query(query)if route == 'hybrid':return hybrid_search(query, documents, embeddings, index)elif route == 'keyword':return keyword_search(query, documents)else:return semantic_search(query, embeddings, index)# Exampleresults = routed_search("machine learning tutorial", documents, embeddings, index)print("Routed search results: " + str(results))
Query routing improves efficiency. It selects appropriate methods. It optimizes retrieval.
Summary
Advanced RAG improves basic RAG with hybrid search, reranking, and multi-vector approaches. Hybrid search combines semantic and keyword methods. Reranking improves result order. Multi-vector approaches handle document complexity. Temporal search handles time-sensitive information. Query routing optimizes retrieval. Advanced techniques improve RAG quality.