Latency-Aware Partitioning for AI Workflows | NanoGPT