Batch vs. Layer Normalization in RNNs | NanoGPT