Gradient Clipping in RNN Training: Why It Matters | NanoGPT