Vanishing Gradients in RNNs: Causes and Fixes | NanoGPT