Fine-tuning Overview
Fine-tuning adapts pre-trained models to specific tasks. It updates model weights on task data. It requires less data than training from scratch. It improves task performance significantly.
Fine-tuning leverages pre-trained knowledge. It adapts to task-specific patterns. It balances general and specific knowledge. It enables efficient task adaptation.
The diagram shows fine-tuning process. Pre-trained model provides base. Task data adapts model. Fine-tuned model performs task.
Transfer Learning Concepts
Transfer learning uses knowledge from one task for another. Pre-trained models provide general knowledge. Fine-tuning adapts to specific tasks. It reduces data and compute requirements.
Transfer learning works because models learn general patterns. These patterns transfer across tasks. Fine-tuning adapts general patterns. It learns task-specific details.
# Transfer Learningfrom transformers import AutoModelForSequenceClassification, AutoTokenizer# Load pre-trained modelmodel = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased',num_labels=3 # Adapt to 3-class classification)# Fine-tune on task data# Model already knows language, learns task-specific patterns
Transfer learning enables efficient adaptation. It reduces training requirements. It improves performance.
The diagram shows transfer learning concept. Source task provides pre-trained model. Knowledge transfers to target task. Fine-tuning adapts to specific domain.
Dataset Preparation
Dataset preparation creates task-specific training data. It includes data collection, labeling, and formatting. It ensures data quality. It prepares data for training.
Preparation includes cleaning, formatting, and splitting. Cleaning removes errors. Formatting matches model requirements. Splitting creates train, validation, and test sets.
# Dataset Preparationfrom datasets import Datasetfrom transformers import AutoTokenizertokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')def prepare_dataset(texts, labels):# Tokenizeencodings = tokenizer(texts, truncation=True, padding=True, max_length=512)# Create datasetdataset = Dataset.from_dict({'input_ids': encodings['input_ids'],'attention_mask': encodings['attention_mask'],'labels': labels})# Splitdataset = dataset.train_test_split(test_size=0.2)return dataset# Exampletexts = ["Text 1", "Text 2", "Text 3"]labels = [0, 1, 0]dataset = prepare_dataset(texts, labels)print("Dataset prepared: " + str(len(dataset['train'])) + " train, " + str(len(dataset['test'])) + " test")
Dataset preparation affects fine-tuning quality. Good data improves performance. Proper formatting enables training.
Training Procedures
Training procedures fine-tune models effectively. They use appropriate learning rates. They monitor validation performance. They prevent overfitting. They save best models.
Procedures include learning rate selection, early stopping, and checkpointing. Learning rates are typically smaller than pre-training. Early stopping prevents overfitting. Checkpointing saves progress.
# Fine-tuning Trainingfrom transformers import TrainingArguments, Trainertraining_args = TrainingArguments(output_dir='./results',num_train_epochs=3,per_device_train_batch_size=16,per_device_eval_batch_size=16,learning_rate=2e-5, # Smaller than pre-trainingweight_decay=0.01,logging_dir='./logs',evaluation_strategy='epoch',save_strategy='epoch',load_best_model_at_end=True,)trainer = Trainer(model=model,args=training_args,train_dataset=train_dataset,eval_dataset=eval_dataset,)trainer.train()
Training procedures ensure effective fine-tuning. They balance adaptation and overfitting. They optimize performance.
Evaluation During Fine-tuning
Evaluation monitors fine-tuning progress. It measures validation performance. It detects overfitting. It guides training decisions.
Evaluation includes validation metrics, learning curves, and early stopping. Validation metrics measure performance. Learning curves show progress. Early stopping prevents overfitting.
# Evaluation During Trainingdef evaluate_during_training(trainer, eval_dataset):# Evaluateeval_results = trainer.evaluate(eval_dataset)# Log metricsprint("Validation Loss: " + str(eval_results['eval_loss']))print("Validation Accuracy: " + str(eval_results['eval_accuracy']))return eval_results# Monitor during trainingfor epoch in range(num_epochs):trainer.train()eval_results = evaluate_during_training(trainer, eval_dataset)
Evaluation guides fine-tuning. It detects issues early. It optimizes performance.
Hyperparameter Tuning
Hyperparameter tuning finds optimal settings. It tests different configurations. It measures performance. It selects best settings.
The diagram shows hyperparameter tuning methods. Grid search tests all combinations. Random search samples randomly. Bayesian optimization uses prior results. Each method has different efficiency.
Tuning includes learning rate, batch size, and epochs. Learning rate affects adaptation speed. Batch size affects stability. Epochs affect training duration.
# Hyperparameter Tuningfrom transformers import Trainerimport optunadef objective(trial):learning_rate = trial.suggest_loguniform('learning_rate', 1e-6, 1e-4)batch_size = trial.suggest_categorical('batch_size', [8, 16, 32])training_args = TrainingArguments(learning_rate=learning_rate,per_device_train_batch_size=batch_size,# ... other args)trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset)trainer.train()eval_results = trainer.evaluate(eval_dataset)return eval_results['eval_accuracy']study = optuna.create_study(direction='maximize')study.optimize(objective, n_trials=10)print("Best hyperparameters: " + str(study.best_params))
Hyperparameter tuning optimizes performance. It finds effective settings. It improves results.
Detailed Fine-tuning Workflow
Fine-tuning workflow includes data preparation, model setup, training, and evaluation. Each step requires careful attention. Proper workflow ensures successful fine-tuning.
Data preparation involves collecting task-specific data. Data should be clean and labeled. Split into train, validation, and test sets. Typical splits are 70-15-15 or 80-10-10. Validation set guides training. Test set evaluates final performance.
Model setup involves loading pre-trained model. Add task-specific head if needed. Freeze or unfreeze layers. Choose learning rate carefully. Pre-trained layers need smaller learning rates. New layers can use larger rates.
# Detailed Fine-tuning Workflowfrom transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainerfrom datasets import Datasetimport torchclass FineTuningWorkflow:def __init__(self, model_name='bert-base-uncased', num_labels=2):self.model_name = model_nameself.num_labels = num_labelsself.tokenizer = AutoTokenizer.from_pretrained(model_name)self.model = Nonedef prepare_data(self, texts, labels, test_size=0.2, val_size=0.1):"""Prepare and split data"""from sklearn.model_selection import train_test_split# First split: train + val vs testX_temp, X_test, y_temp, y_test = train_test_split(texts, labels, test_size=test_size, random_state=42, stratify=labels)# Second split: train vs valX_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=val_size/(1-test_size), random_state=42, stratify=y_temp)# Tokenizetrain_encodings = self.tokenizer(X_train, truncation=True, padding=True, max_length=512)val_encodings = self.tokenizer(X_val, truncation=True, padding=True, max_length=512)test_encodings = self.tokenizer(X_test, truncation=True, padding=True, max_length=512)# Create datasetstrain_dataset = Dataset.from_dict({'input_ids': train_encodings['input_ids'],'attention_mask': train_encodings['attention_mask'],'labels': y_train})val_dataset = Dataset.from_dict({'input_ids': val_encodings['input_ids'],'attention_mask': val_encodings['attention_mask'],'labels': y_val})test_dataset = Dataset.from_dict({'input_ids': test_encodings['input_ids'],'attention_mask': test_encodings['attention_mask'],'labels': y_test})return train_dataset, val_dataset, test_datasetdef setup_model(self, freeze_base=False, custom_head=None):"""Setup model for fine-tuning"""self.model = AutoModelForSequenceClassification.from_pretrained(self.model_name,num_labels=self.num_labels)if freeze_base:# Freeze base model parametersfor param in self.model.base_model.parameters():param.requires_grad = Falseif custom_head:# Replace classification headself.model.classifier = custom_headdef fine_tune(self, train_dataset, val_dataset, learning_rate=2e-5, num_epochs=3):"""Fine-tune model"""training_args = TrainingArguments(output_dir='./results',num_train_epochs=num_epochs,per_device_train_batch_size=16,per_device_eval_batch_size=16,learning_rate=learning_rate,weight_decay=0.01,logging_dir='./logs',evaluation_strategy='epoch',save_strategy='epoch',load_best_model_at_end=True,metric_for_best_model='accuracy',greater_is_better=True)trainer = Trainer(model=self.model,args=training_args,train_dataset=train_dataset,eval_dataset=val_dataset,)trainer.train()return trainerdef evaluate(self, test_dataset, trainer):"""Evaluate on test set"""results = trainer.evaluate(test_dataset)return results# Example workflowworkflow = FineTuningWorkflow(model_name='bert-base-uncased', num_labels=2)texts = ["Great movie", "Terrible film", "Amazing experience", "Poor quality"]labels = [1, 0, 1, 0]train_ds, val_ds, test_ds = workflow.prepare_data(texts, labels)workflow.setup_model(freeze_base=False)trainer = workflow.fine_tune(train_ds, val_ds, learning_rate=2e-5, num_epochs=3)results = workflow.evaluate(test_ds, trainer)print("Test results: " + str(results))
Fine-tuning Best Practices
Use appropriate learning rates. Pre-trained layers need small rates (1e-5 to 5e-5). New layers can use larger rates (1e-4 to 1e-3). Use learning rate schedules. Start with warmup.
Monitor training carefully. Track training and validation loss. Watch for overfitting. Use early stopping. Save best checkpoints. Evaluate on test set only at end.
Handle class imbalance. Use weighted loss functions. Oversample minority classes. Use F1 score instead of accuracy. Adjust decision thresholds.
# Fine-tuning Best Practicesfrom transformers import Trainer, TrainingArgumentsfrom torch.nn import CrossEntropyLossimport torchimport numpy as npclass BestPracticeFineTuning:def __init__(self):self.class_weights = Nonedef compute_class_weights(self, labels):"""Compute class weights for imbalanced data"""from sklearn.utils.class_weight import compute_class_weightunique_labels = np.unique(labels)weights = compute_class_weight('balanced', classes=unique_labels, y=labels)self.class_weights = torch.tensor(weights, dtype=torch.float)return self.class_weightsdef create_weighted_loss(self):"""Create weighted loss function"""def weighted_loss(predictions, labels):loss_fct = CrossEntropyLoss(weight=self.class_weights)return loss_fct(predictions, labels)return weighted_lossdef setup_training_args(self, learning_rate=2e-5, warmup_steps=100):"""Setup training arguments with best practices"""return TrainingArguments(output_dir='./results',num_train_epochs=3,per_device_train_batch_size=16,per_device_eval_batch_size=16,learning_rate=learning_rate,warmup_steps=warmup_steps, # Learning rate warmupweight_decay=0.01,logging_dir='./logs',logging_steps=10,evaluation_strategy='steps',eval_steps=100,save_strategy='steps',save_steps=100,load_best_model_at_end=True,metric_for_best_model='f1', # Use F1 for imbalanced datagreater_is_better=True,save_total_limit=3, # Keep only best 3 checkpointsfp16=True, # Use mixed precision for speed)def early_stopping_callback(self, patience=3):"""Early stopping to prevent overfitting"""from transformers import EarlyStoppingCallbackreturn EarlyStoppingCallback(early_stopping_patience=patience)# Examplebest_practice = BestPracticeFineTuning()labels = [0, 0, 0, 1, 1] # Imbalancedweights = best_practice.compute_class_weights(labels)print("Class weights: " + str(weights))training_args = best_practice.setup_training_args(learning_rate=2e-5, warmup_steps=50)print("Training arguments configured with best practices")
Summary
Fine-tuning adapts pre-trained models to tasks. Transfer learning enables efficient adaptation. Dataset preparation creates task data. Training procedures fine-tune effectively. Evaluation monitors progress. Hyperparameter tuning optimizes settings. Fine-tuning enables task-specific performance.