Hacker Newsnew | past | comments | ask | show | jobs | submit | cirrusfan's commentslogin

This model is exactly what you’d want for your resources. GPU for prompt processing, ram for model weights and context length, and it being MoE makes it fairly zippy. Q4 is decent; Q5-6 is even better, assuming you can spare the resources. Going past q6 goes into heavily diminishing resources.

Anthropic might have the best product for coding but good god the experience is awful. Random limits where you _know_ you shouldn’t hit them yet, the jankiness of their client, the service being down semi-frequently. Feels like the whole infra is built on a house of cards and badly struggles 70% of the time.

I think my $20 openai sub gets me more tokens than claude’s $100. I can’t wait until google or openai overtake them.


Because they update it everyday and the team has not heard about something called stability. This is direct result of Move fast and break too many things all at once.


I think it depends on what you use it for. Coding, where time is money? You probably want the Good Shit, but also want decent open weights models to keep prices sane rather than sama’s 20k/month nonsense. Something like a basic sentiment analysis? You can get good results out of a 30b MoE that runs at good pace on a midrange laptop. Researching things online with many sources and decent results I’d expect to be doable locally by the end of 2026 if you have 128GB ram, although it’ll take a while to resolve.

What does it mean for U.S. AI firms if the new equilibrium is devs running open models on local hardware?

OpenAI isn’t cornering the market on DRAM for kicks…

If it sounds too good to be true…

There have been advances recently (last year) in scaling deep rl by a significant amount, their announcement is in line with a timeline of running enough experiments to figure out how to leverage that in post training.

Importantly, this isn’t just throwing more data at the problem in an unstructured way, afaik companies are getting as many got histories as they can and doing something along the lines of, get an llm to checkpoint pull requests, features etc and convert those into plausible input prompts, then run deep rl with something which passes the acceptance criteria / tests as the reward signal.


Should be possible with optimised models, just drop all "generic" stuff and focus on coding performance.

There's no reason for a coding model to contain all of ao3 and wikipedia =)


Now I wonder how strong the correlation between coding performance and ao3 knowledge is in human programmers. Maybe we are on to something here /s

There is: It works (even if we can't explain why right now).

If we knew how to create a SOTA coding model by just putting coding stuff in there, that is how we would build SOTA coding models.


I think I like coding models that know a lot about the world. They can disambiguate my requirements and build better products.

I generally prefer a coding model that can google for the docs, but separate models for /plan and /build is also a thing.

> separate models for /plan and /build

I had not considered that, seems like a great solution for local models that may be more resource-constrained.


You can configure aider that way. You get three, in fact: an architect model, a code editor model, and a quick model for things like commit messages. Although I'm not sure if it's got doc searching capabilities.

That's what Meta thought initially too, training codellama and chat llama separately, and then they realized they're idiots and that adding the other half of data vastly improves both models. As long as it's quality data, more of it doesn't do harm.

Besides, programming is far from just knowing how to autocomplete syntax, you need a model that's proficient in the fields that the automation is placed in, otherwise they'll be no help in actually automating it.


But as far as I know, that was way before tool calling was a thing.

I'm more bullish about small and medium sized models + efficient tool calling than I'm about LLMs too large to be run at home without $20k of hardware.

The model doesn't need to have the full knowledge of everything built into it when it has the toolset to fetch, cache and read any information available.


But... but... I need my coding model to be able to write fanfiction in the comments...

It literally always is. HN Thought DeepSeek and every version of Kimi would finally dethrone the bigger models from Anthropic, OpenAI, and Google. They're literally always wrong and average knowledge of LLMs here is shockingly low.

Nobody has been saying they'd be dethroned. We're saying they're often "good enough" for many use cases, and that they're doing a good job of stopping the Big Guys from creating a giant expensive moat around their businesses.

Chinese labs are acting as a disruption against Altman etcs attempt to create big tech monopolies, and that's why some of us cheer for them.


I find it really surprising that you’re fine with low end models for coding - I went through a lot of open-weights models, local and "local", and I consistently found the results underwhelming. The glm-4.7 was the smallest model I found to be somewhat reliable, but that’s a sizable 350b and stretches the definition of local-as-in-at-home.

You're replying to a bot, fyi :)

If it weren't for the single em-dash (really an en-dash, used as if it were an em-dash), how am I supposed to know that?

And at the end of the day, does it matter?


Some people reply for their own happiness, some reply to communicate with another person. The AI won't remember or care about the reply.

"Is they key unlock here"

Yeah, that hits different.

Huh? What prevents you from installing them "all at once"? The downside is obviously a long stretch of no sun, and for Europe winter being both low solar production and high energy demand due to heating which the soon-to-be-cheap grid scale batteries don’t really fix. The logistics of PV don’t seem difficult though - it seems by far the easiest of the power generation methods, even if the synchronization can get a bit tricky in a large grid.

Because manufacturing isn't there to do everything at once. You install say 50gw per year, each year. In 30y you need to replace first 50gw batch and so on

I get a slow-but-usable ~10tk/s on kimi 2.5 2b-ish quant on a high end gaming slash low end workstation desktop (rtx 4090, 256 gb ram, ryzen 7950). Right now the price of RAM is silly but when I built it it was similar in price to a high end macbook - which is to say it isn’t cheap but it’s available to just about everybody in western countries. The quality is of course worse than what the bleeding edge labs offer, especially since heavy quants are particularly bad for coding, but it is good enough for many tasks: an intelligent duck that helps with planning, generating bog standard boilerplate, google-less interactive search/stackoverflow ("I ran flamegraph and X is an issue, what are my options here?” etc).

My point is, I can get somewhat-useful ai model running at slow-but-usable speed on a random desktop I had lying around since 2024. Barring nuclear war there’s just no way that AI won’t be at least _somewhat_ beneficial to the average dev. All the AI companies could vanish tomorrow and you’d still have a bunch of inference-as-a-service shops appearing in places where electricity is borderline free, like Straya when the sun is out.


Then you're missing my point.

Yes, you, a hobbyist, can make that work, and keep being useful for the foreseeable future. I don't doubt that.

But either a majority or large plurality of programmers work in some kind of large institution where they don't have full control over the tools they use. Some percentage of those will never even be allowed to use LLM coding tools, because they're not working in tech and their bosses are in the portion of the non-tech public that thinks "AI" is scary, rather than the portion that thinks it's magic. (Or, their bosses have actually done some research, and don't want to risk handing their internal code over to LLMs to train on—whether they're actually doing that now or not, the chances that they won't in future approach nil.)

And even those who might not be outright forbidden to use such tools for specific reasons like the above will never be able to get authorization to use them on their company workstations, because they're not approved tools, because they require a subscription the company won't pay for, because etc etc.

So saying that clearly coding with LLM assistance is the future and it would be irresponsible not to teach current CS students how to code like that is patently false. It is a possible future, but the volatility in the AI space right now is much, much too high to be able to predict just what the future will bring.


I never understand anyone's push to throw around AI slop coding everywhere. Do they think in the back of their heads that this means coding jobs are going to come back on-shore? Because AI is going to make up for the savings? No, what it means is tech bro CEOs are going to replace you even more and replace at least a portion of the off-shore folks that they're paying.

The promise of AI is a capitalist's dream, which is why it's being pushed so much. Do more with less investment. But the reality of AI coding is significantly more nuanced, and particularly more nuanced in spaces outside of the SRE/devops space. I highly doubt you could realistically use AI to code the majority of significant software products (like, say, an entire operating system). You might be able to use AI to add additional functionality you otherwise couldn't have, but that's not really what the capitalists desire.

Not to mention, the models have to be continually trained, otherwise the knowledge is going to be dead. Is AI as useful for Rust as it is for Python? Doubtful. What about the programming languages created 10-15 years from now? What about when everyone starts hoarding their information away from the prying eyes of AI scraper bots to keep competitive knowledge in-house? Both from a user perspective and a business perspective?

Lots of variability here that literally nobody has any idea how any of it's going to go.


> but I have to baby sit the process and think whether I want to skip or retry a failed copy

Do you import originals or do you have the "most compatible" setting turned on?

I always assumed apple simply hated people that use windows/linux desktops so the occasional broken file was caused by the driver being sort-of working and if people complain, well, they can fuck off and pay for icloud or a mac. After upgrading to 15 pro which has 10 gbps usb-c it still took forever to import photos and the occasional broken photos kept happening, and after some research it turns out that the speed was limited by the phone converting the .heic originals into .jpg when transferring to a desktop. Not only does it limit the speed, it also degrades the quality of the photos and deletes a bunch of metadata.

After changing the setting to export original files the transfer is much faster and I haven’t had a single broken file / video. The files are also higher quality and lower filesize, although .heic is fairly computationally-demanding.

Idk about Android but I suspect it might have a similar behavior


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: