Engineering
^ "MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens. We demonstrate...
^ "MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens. We demonstrate generations as long as 84k tokens on a single node of 8 A100-80GB GPUs"
Infra