Skip to content

Instantly share code, notes, and snippets.

View us107's full-sized avatar
🎯
Focusing

TRISHA SHARMA us107

🎯
Focusing
View GitHub Profile
@ritwikraha
ritwikraha / Pretraining-LLM.md
Last active December 13, 2025 21:43
Pretraining of Large Language Models

Pretraining


A Map for Studying Pre-training in LLMs

  • Data Collection
    • General Text Data
    • Specialized Data
  • Data Preprocessing
    • Quality Filtering
  • Deduplication