Step Law · 3D loss-landscape surface (LR vs. batch-size slices)
Step Law · 3D 损失曲面(LR × BS 双切片)
@paper Predictable Scale: Part I, Step Law — Optimal Hyperparameter Scaling Law in Large Language Model Pre-training · Predictable Scale: Part I, Step Law — 大语言模型预训练的最优超参数缩放律arXiv · 2025
arXiv:2503.04715#3d-surface#loss-landscape#twin-panel