The Karpathy constant is one of the best learning rates for the popular Adam (deep neural network) optimizer. It is defined as η = 3e-4. The actual symbol for the constant is α_k.
What is the correct learning rate for adam in this case?
Just use the Karpathy constant dude
48👍 2👎