Research & Theory

Andrej's take: > This is interesting as a first large diffusion-based LLM. >...

Andrej's take: https://x.com/karpathy/status/1894923254864978091 > This is interesting as a first large diffusion-based LLM. > > Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to right, but all at once. You start with noise and gradually denoise into a token stream. > > Most of the image / video generation AI tools actually work this way and use Diffusion, not Autoregression. It's only text (and sometimes audio!) that have resisted. So it's been a bit of a mystery to me and many others why, for some reason, text prefers Autoregression, but images/videos prefer Diffusion. This turns out to be a fairly deep rabbit hole that has to do with the distribution of information and noise and our own perception of them, in these domains. If you look close enough, a lot of interesting connections emerge between the two as well. > > All that to say that this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!

x.com

LearningSocial Threadx.com

Andrej's take: &gt; This is interesting as a first large diffusion-based LLM. &gt;...

Andrej's take: > This is interesting as a first large diffusion-based LLM. >...