How Vision-Language Models Generate Text | NanoGPT