Step Law · Hyperparameter contour plot of LR × batch size with optima
Step Law · 学习率×批大小超参数等高线图(带各算法最优点)
@paper Predictable Scale: Part I, Step Law — Optimal Hyperparameter Scaling Law in Large Language Model Pre-training · Predictable Scale: Part I, Step Law — 大语言模型预训练的最优超参数缩放律arXiv · 2025
arXiv:2503.04715#contour#log-log-axes#optimum-overlay