How to Fine Tune a LLM using LoRA
Introduction
Pre-trained Language Models (PLM) are trained on a large corpus of data with specific time in mind. They perform very well for general purpose tasks like reasoning & language generation. However many a times, they do not give acceptable results when applied to a particular business domain or business dataset. Say for example, LLMs like Mistral-7b or Deepseek-R1 when applied to a legal dataset of India may hallucinate providing undesirable outcome. Legal dataset could be a corpus of court judgements, agreements, PILs & laws with billions of parameters.
PLMs that are trained on general purpose data is unable to capture the finer nuances in very specific legal data in our case.
What do we do in such scenarios? How to solve for this problem?
One option could be that you train the PLM again from ground up with this new data set. However training the model from zero is resource & time intensive process. Not a good option.
In the world of AI, there are 2 ways to overcome this problem-
- Using Retrieval Augmented Generation (RAG)
- Using Finetuning
In RAG, we provide an external corpus of data (in our case it is the legal data) to a pre-trained model to answer user’s query.
In Finetuning, we modify the model’s parameters like hyper-parameters, no. of epochs, learning rate, etc. so as to increase the model’s accuracy and lower the loss function.
What is LoRA?
LoRA stands for Low-Rank Adaptation and PyTorch documentation defines it as follows -
“LoRA is an adapter-based method for parameter-efficient finetuning that adds trainable low-rank decomposition matrices to different layers of a neural network, then freezes the network’s remaining parameters. LoRA is most commonly applied to transformer models, in which case it is common to add the low-rank matrices to some of the linear projections in each transformer layer’s self-attention”
LoRA is a technique used in fine-tuning large AI models that makes the process much more efficient.
In simple terms:
Instead of adjusting all the parameters in a large model (which could be billions of numbers), LoRA adds small, trainable “adapter” modules that modify how the model behaves for specific tasks.
Finetuning
In this post, we will be finetuning deepseek-r1 model on an ancient indian wisdom dataset located on HuggingFace.
This dataset contains 616 instructions mapped to an expected output. Our objective is to use deepseek-r1 model to give acceptable answers to questions on this dataset.
We approach this in 3 major parts as given below.
Part A: Use the off-the-shelf vanilla deepseek-r1 model to answer our prompt (without any fine-tuning)
Part B: Fine-tune the model on our dataset & save it for Part C
Part C: Use the fine-tuned model from Part B to answer our same prompt to check the results
Prompt and Process
Our prompt for this will be -
Prompt = "In Yoga philosophy, what is the significance of the concept of ahimsa (non-violence)?"Part A: Check the prompt answer (without fine-tuning) on deepseek-r1
We use Google Colab to write down our PyTorch code as below-
!pip install transformers datasets torch trl peft bitsandbytes
# Load required libraries
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
def generate_text(prompt, max_length=500, temperature=0.1):
"""
Generate text using the deepseek-r1
Args:
prompt (str): Input text to generate from
max_length (int): Maximum length of generated text
temperature (float): Controls randomness in generation (0.0-1.0)
Returns:
str: Generated text
"""
# Encode the input text
inputs = tokenizer(prompt, return_tensors="pt")
# Generate text
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_length=max_length,
temperature=temperature,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
num_return_sequences=1
)
# Decode and return the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
return generated_text
# Example usage
if __name__ == "__main__":
# Example prompts to test the model
prompts = [
"In Yoga philosophy, what is the significance of the concept of ahimsa (non-violence)?"
]
print("Generating text from different prompts:\n")
for prompt in prompts:
print(f"Prompt: {prompt}")
generated = generate_text(prompt)
print(f"Generated text: {generated}\n")Here is the output -
As you can from the above output, the model does not give desirable answer.
Part B: Fine-tune the deepseek-r1 model
We add the following fine-tuning code in the next cell of colab.
from datasets import load_dataset
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
TrainingArguments,
)
from trl import SFTTrainer
import torch
from peft import LoraConfig, get_peft_model
# Step 1: Load the dataset
dataset = load_dataset("Abhaykoul/Ancient-Indian-Wisdom")
# Step 2: Format the dataset into instruction-response pairs
def format_dataset(examples):
"""Format the dataset into instruction-response pairs."""
texts = []
for instruction, response in zip(examples["instruction"], examples["output"]):
# Combine instruction and response into a single text
formatted_text = f"### Instruction:\n{instruction}\n\n### Response:\n{response}"
texts.append(formatted_text)
return {"text": texts}
# Apply formatting
dataset = dataset.map(format_dataset, batched=True, remove_columns=dataset["train"].column_names)
# Step 3: Load model and tokenizer
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
# Step 4: Configure LoRA
peft_config = LoraConfig(
r=16, # Rank of the low-rank matrices
lora_alpha=32, # Scaling factor
lora_dropout=0.1, # Dropout for LoRA layers
bias="none", # No bias for LoRA
task_type="CAUSAL_LM", # Task type
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"] # Target modules for LoRA
)
model = get_peft_model(model, peft_config)
# Step 5: Define training arguments
training_args = TrainingArguments(
output_dir="./results", # Directory to save results
num_train_epochs=20, # Number of training epochs
per_device_train_batch_size=4, # Batch size per device
per_device_eval_batch_size=4, # Evaluation batch size
gradient_accumulation_steps=4, # Gradient accumulation steps
gradient_checkpointing=False, # Disable gradient checkpointing for debugging
optim="adamw_torch", # Optimizer
learning_rate=1e-4, # Learning rate
warmup_ratio=0.1, # Warmup ratio
fp16=True, # Use mixed precision (FP16)
logging_steps=10, # Log every 10 steps
save_strategy="steps", # Save model at specific steps
save_steps=100, # Save every 100 steps
eval_strategy="steps", # Evaluate at specific steps
eval_steps=100, # Evaluate every 100 steps
eval_accumulation_steps=1, # Accumulate evaluation steps
load_best_model_at_end=True, # Load the best model at the end
metric_for_best_model="eval_loss", # Metric for best model
greater_is_better=False, # Lower eval_loss is better
remove_unused_columns=True, # Remove unused columns
report_to="none", # Disable external logging
)
# Step 6: Initialize the trainer
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["train"].select(range(120)), # Small evaluation set
)
# Step 7: Train the model
trainer.train()When the training process completes, we can save the model through the following code. Please note the training process took about 30 mins on T4 GPU that Colab offers in the free account.
model.save_pretrained("fine-tuned-deepseek-r1-1.5b")
tokenizer.save_pretrained("fine-tuned-deepseek-r1-1.5b")Part C: Check the prompt answer (with fine-tuned deepseek-r1 model
We repeat the Part A but just with the fine-tuned model now
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_path = "fine-tuned-deepseek-r1-1.5b"
# Load model with optimizations
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
def generate_text(prompt, max_new_tokens=1000):
inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.5,
top_k=50,
top_p=0.9,
use_cache=True
)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Test
prompt = "In Yoga philosophy, what is the significance of the concept of ahimsa (non-violence)?"
output = generate_text(prompt)
print(output)The outcome of this step is a fine-tuned model
Here is the output with # of epochs = 20
As can be seen, the model’s result to our Prompt is really bad. In fact, it provided insanely creative text.
How can we improve this?
We have a few options — during training process we could increase the no. of epochs, change the learning rate, batch size & try again.
We change the no. of epochs to 200 & checked the results. Please note that with 200 epochs it could take many hours to complete the training process on a T4 GPU. It could timeout for several reasons as well. Ideally you may have to bump to a better GPU like A100
The result was much better as can be seen from below screenshot.
Conclusion
This post demonstrated the use of finetuning in an LLM to improve the model answers on a custom dataset.
If you liked my post, consider giving a few claps & follow me on linkedin.
