Research & Theory

Unless gpt4 multi modal can actually infer images? I think it’s just a vision model...

Unless gpt4 multi modal can actually infer images? I think it’s just a vision model that converts to text (makes it seems fancy)