By the zero-shot hyperparameter work do you mean https://arxiv.org/abs/2203.03466 “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”? I’ve been sceptical of NTK-based theory, seems I should update.
By the zero-shot hyperparameter work do you mean https://arxiv.org/abs/2203.03466 “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”? I’ve been sceptical of NTK-based theory, seems I should update.