The Chinchilla scaling laws give you a minimum for the number of tokens you should be using for a given size: if you can't meet what they suggest for that size, you should shrink the size, as, otherwise, the capacity of the model is going to waste.
I do agree that it is a datapoint, but GP's point is that this model was undertrained, so it's hard to draw the same conclusions from it that we would from other research.
The irony of a post about a port primarily written by Claude having been primarily written by Claude on a website primarily designed by Claude. Come on.
Claude (and Codex) designed the site, mostly because I'm not a UI coder; if I'd designed the site nobody'd want to read it but me, simply thanks to the UX.
And I have a full-time job and more; I draft with an LLM's assistance and revise with another LLM (and other humans where possible) because I'm just arrogant enought o think that what I think might be useful to others. If it's not useful to you, I get it. Such is life.
Why can't you just publish the prompt instead? Do you not see how LLMs subtly alter your original message and erase your voice? They fill gaps that didn't exist, they create syllogisms that make no sense, and the voice is now so ridiculously AIdiosyncratic that it makes my eyes boil!
If you have a message that takes 100 words to say, do not use a LLM to add 400 words to it, this isn't a school assignment! Stretching a spaghetti does not yield more spaghetti, it just makes a mess!
Where is the value the LLM adds? Grammar? Vocabulary? The price you pay is you sound like everyone else and your original message is lost in the noise.
As far as publishing "the prompt" - there's no "the prompt." The draft was put together and expanded over a set of interactions with an LLM and other people over the space of about six hours. "The prompt" would have been about twelve pages long and unreadable. Funny as heck, but unreadable.
(If you're really interested, you can check the logs in the site and find the actual interaction that started the article out. It was a comment from someone else, and it got me thinking.)
Heh. Funny thing: I've been writing online and professionally for literal decades, since around 2002 or so, and the LLMs tend to change my actual writing voice relatively little and usually in positive ways, since they say I meander too much.
Your process loses your unique voice. The content was OK, but too verbose, and needed data on other rust ports of similar scope.
The issue is the quippy titles, “something - aside - continue” phrasing, and other constructions are feel like they or actually are wholly LLM written. I find a high correlation to this and low density fluff. The author did not have 10 paragraphs of things to say, but used an LLM to inflate a short outline to that. We would all of been better off with a tighter document - either human written or better prompted.
It's not about that. Me as a reader wants to read you as a human, with all of its colors and nuances.
These days due to usage of LLMs I developed (unknowingly) an LLM detector when reading these. I actually get distracted.
So please, I believe you do have something to tell to the world, but please take it slower. No need to rush. I'd rather have something to read uniquely made by you.
I agree with the others - I'm sure that you've provided your own input, but Claude's writing and design style is so overwhelmingly dominant that those who have spent time with it can immediately recognise it, and it makes it hard to take at face value that you were the primary author, even if you were.
For your workflow, I'd suggest drafting with a LLM to help you find the right balance of content, and then throwing all of that out and writing it yourself. Otherwise, it won't sound like you.
To be fair, who cares about ai slop websites? To be honest, they're often better than the average webdev garbage. Language runtimes are held to a much, much, higher standard.
Hot-reloading. You can edit your logic without rebuilding and restarting the host application; this cuts your iteration time from minutes to seconds, especially if the application is in a state that would need to be recreated.
I would be very surprised to see a large Rust codebase being harder to maintain than a large Zig codebase. The former makes it much easier to maintain invariants at scale.
Well, you could go ask Richard Feldman, who I believe cited that reason to rewrite the nascent Roc language from being implemented in Rust to Zig, or anyone else who is moving from Rust to anything else. I've seen multiple people at this point complain about the scaling issue with Rust; the larger the codebase, the more you end up fighting the compiler before anything will actually build.
Note that it doesn't matter if the compiler is correct about its claims; if the language doesn't actively discourage patterns that produce this outcome at scale, then the language does not scale, end of story.
The trend is basically either linear or exponential: as more LOC of Rust are added, the greater the percent of total time you spend fighting the compiler to get a successful build, especially in a team context (which is exactly what gets you to >1M LOC). Solo devs can contain the whole design in their minds and may not run into this issue as much; the problem specifically occurs on teams where the mental model MUST be fractured by necessity, and this results in "distributed knowledge of magic" that ends up constantly breaking.
Perhaps this explains WHY there aren't that many Rust projects done by more than 1 developer that approach that many LOC.
Unless you enforce those macros somehow in a team setting, someone's going to forget to use them, and then you're still stuck with the original problem.
By the time C++ and Java were as old as Rust is today there were thousands of programs that over 1MLOC that had been maintained for at least five years. Rust is a rather old language, yet I doubt there are even hundreds of Rust programs over 1MLOC.
It wouldn't be data distillation: instead, it would be teacher-student distillation. The teacher model has stronger representations that the student can mimic, which would give it more capability over training on the data itself.
“Grown” is a highly apt metaphor, IMO. It quite succinctly captures some of the most fundamental differences between building Claude and building an Ikea desk, for example.
> AI is grown, not built, and like with anything you grow, you'll never be able to predict exactly how it will turn out.
Remember when the frontier labs found out that curated high-quality training was critical to making better models?
Basically, just like high-quality and more education tends to make better humans, on average, I think we can expect quality education to turn out better ai, on average, and with better repeatability than with humans because of better control over the initial conditions and environment.
> Basically, just like high-quality and more education tends to make better humans, on average
Much like these models seem to be plateauing, I think there is a cap to the whole “more education makes better humans” and can’t be more apparent than in the US congress and the boatload of C-Suites not actually being very good humans.
Seems to me the venn diagram of "congress and c-suites" vs "educated people" would have one circle wholly inside the other.
I know people without a college education that would give you the shirt off their back, and educated people that rewrite wills while their parents are on their deathbed.
What we call education today is a problem, and one need look no further than the massive amount of debt we saddle on kids. For what? So they can pay for privilege of being told what books to read, what topics to write about, and a rubber stamp? I didn't learn a _thing_ in college that I haven't learned better either at $dayjob, or from reading.
Most of my math profs. didn't speak english well, and none of the TAs did. Any math I've since forgotten from college was self-taught. Calc i/ii/iii, diffew, linear, stat.
College/education lost the plot. The sooner we admit it, the sooner we can fix it.
> Sadly, education does not correct psychopathic traits, which might be overrepresented in c-suites, and selected for in politicians.
>> Seems to me the venn diagram of "congress and c-suites" vs "educated people" would have one circle wholly inside the other.
Both things can be true.
> look no further than the massive amount of debt we saddle on kids.
See politicians and c-suites populated by psychopaths for the origins of this problem.
> I didn't learn a _thing_ in college that I haven't learned better either at $dayjob, or from reading.
Putting it a bit bluntly, like any other activity, one gets out of it what one puts into it. I had a very different experience from yours, accents and language skills notwithstanding. But there is so much variation in a domain so broad in our country that is so big, it doesn't necessarily invalidate your experience.
> College/education lost the plot. The sooner we admit it, the sooner we can fix it.
There is a long list/tradition of higher education through thousands of years of human history, with Harvard/MIT/Oxford being the pre-eminent ones today. [1][2]
What alternative do you propose? For humans, and AI?
Except in this care we actually understand and know how these models work. They aren't some unknown construct of the universe. They are human made with particular goals in mind.
There is no mysticism behind the curtains, just computer science + math.
We do not understand and know how these models work. We know what their architectures are and how to create them, but we cannot explain their behaviours at a fundamental level. There is no definitive way for us to answer the question of "how did it produce response X for query Y?" - we're only grazing the surface with mechanistic interpretability.
I would love for this to be more public knowledge. I think the general public (and myself for a long time) believes the AI people know how this stuff works end to end, and so it must be trustworthy. But if we told the public "Look, we know if you put this thing in one end, you'll get something that looks similar to this out the other, but we don't really know what happens inbetween" I think we'd be able to have a more honest discussion about the relationship between AI, productivity and ongoing employment.
That’s not a refutation because this problem is not a logical problem, it is a scale problem.
We can’t explain it because we distilled so many inputs into matrixes and transformed them over and over again. If we had all the time and computing power in the universe to do so, we could trace through it bit by bit and eventually answer that question.
It is correct to say that it is just science and math, the same way we can say that gravity is just science and math even if we have only recently begun to understand how it truly functions.
If you had some time and computing power (not even all that much, in the large scale of things), you could simulate perfectly how a human grows from an embryo to an adult, or how an entire human brain processes some incoming signal, and yet this wouldn't give you the understanding to design a human or human brain from scratch.
You call this a "scale problem" as if there's some scalable way such as an algorithm to resolve arbitrary scientific questions and we simply haven't done it, but of course no such algorithm exists, which is why there's plenty of science that's still not settled.
It's a refutation that we know how they work now. In the limit, though, yes, we are likely to be able to trace the process: it is possible, though, that understanding remains inaccessible because the trace is beyond comprehension.
If you can distil the model's reasoning for a decision into a billion yes/no questions, each covering largely-independent areas, can you really say you understand what its overall reasoning was?
Isn't this fundamentally because it's all probabilities and weights? It would be like asking how did a pair of dice produce the response 4:3 on the last roll?
You could say something similar about biology—just physics behind the curtains, and we understand a lot of the basics. The difficulty comes from complexity, not mysticism.
To be clear I don't think that LLMs are sentient, but the appeal in studying them is similar to biology in that you get to dissect a highly complex system with comparatively crude tools.
it took significant research efforts to just understand how these models learn how to multiply two numbers. The fact that we know how they operate doesn't mean we understand it.
His argument is not that the existing global poor are going to be automated by AI, but that a great many people are going to join the global poor as their current livelihoods are automated.
A statistically average representative of the "Global Poor" -- e.g. the farmer working a smallholding in India or the DRC -- is unlikely to have his day-to-day activities affected by AI on any foreseeable time horizon, nor is his wealth likely to meaningfully increase or decrease.
The speech should have referenced the poor in industrialized nations, who are very likely to be affected, though I doubt they'll join the ranks of the global poor in most circumstances.
I do agree that it is a datapoint, but GP's point is that this model was undertrained, so it's hard to draw the same conclusions from it that we would from other research.
reply