I trained my own LLM and published it on HuggingFace

Published: (May 5, 2026 at 05:11 AM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Overview

This post documents the process of fine‑tuning a language model on medical data and publishing it to Hugging Face.

Model Choice

  • Base model: facebook/opt-1.3b – 1.3 billion parameters, open‑source, no usage restrictions.

Technique: LoRA (Low‑Rank Adaptation)

LoRA adds small trainable adapter layers on top of the frozen base model, reducing the number of trainable parameters from 1.3 B to roughly 4 M (≈100× cheaper).

Training Environment

  • Hardware: Free Google Colab Tesla T4 GPU (15 GB VRAM), 30 hours/week.
  • Constraints: No local GPU; CPU training would take days.

Key Code

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer, SFTConfig

# Load base model
model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b")

# Add LoRA adapters
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

# Train
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    args=SFTConfig(num_train_epochs=3, learning_rate=2e-4)
)
trainer.train()

Training Results

Training completed in 1.5 hours on the free T4 GPU. Loss progression:

  • Step 100: 1.163
  • Step 500: 0.994
  • Step 1000: 0.967
  • Step 1700: 0.944 ← training complete

Both training and validation loss decreased together, indicating genuine learning rather than memorization.

Publishing to Hugging Face

model.push_to_hub("Yakhilesh/medmind-opt-medical")
tokenizer.push_to_hub("Yakhilesh/medmind-opt-medical")

The model (adapter weights only ≈ 12.6 MB) is now publicly available at Yakhilesh/medmind-opt-medical. Anyone can download and use it.

Takeaways

  • Fine‑tuning is more dependent on data quality than on model size.
  • LoRA enables efficient adaptation of large models with minimal compute cost.
  • Even a short 1.5‑hour fine‑tune can capture meaningful medical patterns, as reflected by the loss curve.
0 views
Back to Blog

Related posts

Read more »

Transformers Are Inherently Succinct

Resources - View PDFhttps://arxiv.org/pdf/2510.19315 - HTML experimentalhttps://arxiv.org/html/2510.19315v2 Abstract We propose succinctness as a measure of th...