7B vs 70B parameters... I think. The small ones fit in the memory of consumer gr...

agnishom · on Dec 16, 2024

How many parameters did ChatGPT have in Dec 2022 when it first broke into mainstream news?

reissbaker · on Dec 16, 2024

GPT-3 had 175B, and the original ChatGPT was probably just a GPT-3 finetune (although they called it gpt-3.5, so it could have been different). However, it was severely undertrained. Llama-3.1-8B is better in most ways than the original ChatGPT; a well-trained ~70B usually feels GPT-4-level. The latest Llama release, llama-3.3-70b, goes toe-to-toe even with much larger models (albeit is bad at coding, like all Llama models so far; it's not inherent to the size, since Qwen is good, so I'm hoping the Llama 4 series is trained on more coding tokens).

swyx · on Dec 16, 2024

> However, it was severely undertrained

by modern standards. at the time, it was trained according to neural scaling laws oai believed to hold.

reissbaker · on Dec 18, 2024

Sure, at the time everyone misunderstood Chinchilla. Nonetheless it was severely undertrained, even if they didn't know it back then.

simonw · on Dec 16, 2024

I don't think that's ever been shared, but it's predecessor GPT-3 Da Vinci was 175B.

One of the most exciting trends of the past year has been models getting dramatically smaller while maintaining similar levels of capability.