pre-training

an archive of posts with this tag

Feb 18, 2025	DeepSeek v3
Jun 04, 2024	Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
May 07, 2024	How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel