Multimodal Data Pipelines: Scalability Best Practices | NanoGPT