Nice. What scale does this realistically reach on a single machine? | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

hiroakiaizawa 29 days ago | parent | context | favorite | on: Train Your Own LLM from Scratch

Nice. What scale does this realistically reach on a single machine?

lynx97 29 days ago | [–]

Model: 36L/36H/576D, 144.2M params

runs on a Blackwell 6000 Max-Q, using 86GB VRAM. Training supposedly takes 3h40m

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact