Scaling Depth vs. Width: Cost Trade-Offs | NanoGPT