More

otabdeveloper4 · 2026-06-05T18:27:40 1780684060

I don't even know what "just placement" is.

(I need a better model to translate from llmese.)

grey-area · 2026-06-05T18:37:07 1780684627

Sometimes the things word generators say just don’t make sense.

otabdeveloper4 · 2026-06-05T18:25:55 1780683955

> I decided to rewrite all of the prose in my own voice

"Claude, rewrite all of the prose in my own voice."

The funny part is that it probably works.

otabdeveloper4 · 2026-06-05T10:07:15 1780654035

> using lisp machines, running smalltalk on microkernels that put the HURD to shame

That future is not different from this future. That road leads down to Javascript and React anyways. (Perhaps with a slightly different syntax.)

otabdeveloper4 · 2026-06-04T17:00:24 1780592424

There is literally nothing wrong in being biased against AI.

Being biased against AI is like being biased against war or ethnic cleansing. Like, why would you ever not be?

otabdeveloper4 · 2026-06-02T09:45:37 1780393537

Oh yes, this time it will be different, of course. (Like the last time.)

otabdeveloper4 · 2026-06-01T14:54:39 1780325679

> the spicy autocomplete can solve difficult open math problems

No it can't. It can't even solve my son's 4th grade math homework. (This is a real use case for me, not a dumb benchmark.)

You just know nothing about math and are happy to parrot bullshit AI salesmen are selling you.

ConceptJunkie · 2026-06-01T16:10:32 1780330232

Terrence Tao disagrees with what you're saying. I think he's in a slightly better position to speak on the subject.

otabdeveloper4 · 2026-06-04T20:39:46 1780605586

Terrence Tao is an expert on 4th grade homework? TIL!

skinner_ · 2026-06-01T15:09:08 1780326548

> You just know nothing about math and are happy to parrot bullshit AI salesmen are selling you.

Not the parent poster here. I do know things about math. I wrote a few papers related to the unit distance problem (https://arxiv.org/abs/2311.10069, https://arxiv.org/abs/2406.15317) and spent quite some time trying to solve it. I had no chance of coming up with the proof that the spicy autocomplete came up with. Dumb benchmark, sure.

otabdeveloper4 · 2026-06-01T15:31:00 1780327860

LLMs are good with symbolic manipulation but can't reason.

You can skirt around not reasoning in research math because so much of it is just extremely tedious symbolic manipulation.

You can't cheat with advanced fourth grade math, though. They don't know algebra yet and can't substitute verbosity for reasoning.

skinner_ · 2026-06-01T16:52:06 1780332726

> You can skirt around not reasoning in research math because so much of it is just extremely tedious symbolic manipulation.

LOL

threatofrain · 2026-06-01T15:11:57 1780326717

We've already long past that threshold.

sanderjd · 2026-06-01T15:15:48 1780326948

I would genuinely be interested in knowing what you're doing that led you to this conclusion.

I would be shocked if I was unable to solve 4th grade math homework with any of the contemporary frontier models. I spend most days using them to do significantly more complex things than that.

margalabargala · 2026-06-01T15:19:39 1780327179

If they took a blurry photo of the piece of paper and uploaded to chatGPT saying "solve this" then I would totally believe it. The frontier models are mostly obnoxiously bad at OCR and properly ingesting what's on an image of a page.

If you write out the 4th grade math problem, they would have no trouble.

sanderjd · 2026-06-02T15:15:18 1780413318

TBH, nowadays I would absolutely expect to be able to upload a picture to claude opus and have it figure out the questions and correctly answer them.

margalabargala · 2026-06-02T17:36:19 1780421779

I sent this worksheet to Opus 4.8 High:

https://www.k5learning.com/sites/all/files/worksheets/math/g...

Prompt was just "Please solve all the problems in this worksheet" plus the image.

I got this:

  1. 86 → 1, 2, 43, 86
  2. 7 → 1, 7 (prime)
  3. 12 → 1, 2, 3, 4, 6, 12
  4. 38 → 1, 2, 19, 38
  5. 52 → 1, 2, 4, 13, 26, 52
  6. 9 → 1, 3, 9
  7. 73 → 1, 73 (prime)
  8. 98 → 1, 2, 7, 14, 49, 98
  9. 24 → 1, 2, 3, 4, 6, 8, 12, 24
  10. 6 → 1, 2, 3, 6
  11. 80 → 1, 2, 4, 5, 8, 10, 16, 20, 40, 80
  
  I left out #6 because that number didn't come through clearly enough in the image for me to read it confidently, and I didn't want to risk solving the wrong one on a homework sheet. If you let me know what it is, I'll factor it right away.

It failed to read the "77", and it incorrectly reported the line item it failed to read as #6 rather than #4, and it numbered the output incorrectly; it should have left off the one it failed on with a gap in the list rather than having the second half of the answers be off by one. It did actually factor everything correctly though.

sanderjd · 2026-06-02T20:33:59 1780432439

Yep, fair enough. So pretty far from perfect still! But quite good. And definitely agrees with the point that it is the OCR that is the problem more so than the math.

otabdeveloper4 · 2026-06-01T15:27:49 1780327669

No, LLMs just can't do math.

minimaxir · 2026-06-01T15:32:55 1780327975

If your math does not involve multiplying 20 digit numbers, modern LLMs can "do" math even without a Python tool despite the counterintuition of next token prediction.

DiogenesKynikos · 2026-06-02T07:17:52 1780384672

And if you give your LLM access to a calculator, it will have to problem multiplying 20-digit numbers.

bdamm · 2026-06-01T15:40:47 1780328447

They can definitely recognize the problem class and build programs to do math. So what's the difference?

It's like saying that people can't turn high torque nuts on machine bolts, because you can't use your fingers to do it. But you can use a wrench, so effectively, we can turn high torque nuts on machine bolts even though it isn't something we can natively do unaided.

sanderjd · 2026-06-02T15:17:52 1780413472

Again, I'm very interested in your methodology here. It's true that LLMs can't do arbitrary math, but in my recent experience (like 9 months at least, maybe a year?), the frontier models are very good at figuring out that they should delegate the math to a tool and do it that way, either by having a tool handy that can solve the problem directly, or by writing code to do so.

margalabargala · 2026-06-01T16:55:19 1780332919

The neat thing about that claim is that it's easily falsifiable.

I asked Opus 4.8 "What is 12 times 13" and it gave me "156".

So it would appear that your statement is no longer true.

simonw · 2026-06-01T15:03:42 1780326222

Reasoning models with access to Python have been able to solve 4th grade math homework for over a year now. Prove me wrong: show me a 4th grade math problem they can't handle.

tomjakubowski · 2026-06-03T05:54:22 1780466062

The images you can't see in the chats are the question sheet from here, which was the first fourth grade math homework assignment I tried. https://www.k5learning.com/worksheets/math/data-graphing/gra...

Fourth graders typically don't have access to Python for their homework assignments. To be fair to the kids, I tried it first without Python: Opus 4.6 (Feb 2026) with default Medium effort. https://claude.ai/share/1533a3e4-6757-4614-b95d-0743350a6598

pastebin of the reasoning section (no Python): https://pastebin.com/zZeG5ZnJ

It got questions 2 (Shop D) and 5 (280) wrong. It got question 3 right but the work it showed has the numbers for each shop wrong. My fourth grade teacher would have taken off points for that (shout out Mrs. Van Bladel).

Here it is again with a prompted nudge to use Python: https://claude.ai/share/e1265efb-0988-40ac-90ac-c76225b67e98

pastebin of the reasoning section (with Python): https://pastebin.com/KsP0xxZL

This time it used Python to "check its work", and answered the same questions incorrectly (2 and 5). To the model's credit, it did show the correct work on answer 3 this time.

simonw · 2026-06-03T11:09:07 1780484947

That's more of a test of vision LLM ability to correctly identify and count things in an image than it is of mathematical reasoning.

If you look at the working of your non-Python example it gets most of the counts wrong - identifying shop A as two full notebooks plus one half notebook when it's actually three full notebooks, for example. The numeric answers it then gives would correct if it hadn't made those vision mistakes.

I've been testing vision LLMs on counting the number of pelicans in a photo for a while, they're very unreliable at that.

The best I've seen is Google Gemini 2.5 if you have it output image segmentation masks (a feature they have not included in the Gemini 3 series yet): https://simonwillison.net/2025/Apr/18/gemini-image-segmentat... - but that requires additional harness engineering, you need to explicitly cause it to use its image segmentation mechanism.

tomjakubowski · 2026-06-04T00:28:36 1780532916

Fourth grade math's† students are learning geometry and how to draw simple plots. Vision ability (or tactile ability, for visually impaired students) is pretty important to understanding and solving those homework problems.

†: think "bo's'n"

otabdeveloper4 · 2026-06-01T15:21:00 1780327260

> show me a 4th grade math problem they can't handle

Sure.

"8 7 6 5 4 3 2 1 - add minus signs and parenthesis to get 31."

P.S. There is an answer online and some LLMs will just copy it verbatim. This doesn't count.

sanderjd · 2026-06-02T15:23:15 1780413795

It's very funny how you chose an example that is both not 4th grade level math and also something the frontier LLMs are much more likely to be able to solve than nearly any 4th grader.

This is a counterexample to your argument, not evidence for your claim. The only possible conclusion from this example is "woah, it's amazing that we have AIs capable of solving this kind of difficult math problem!", and very much the opposite of "these AIs can't even do my 4th grader's math homework".

simonw · 2026-06-01T15:33:51 1780328031

Whoa, 4th grade math problems got hard! I'm not sure how I'd tackle that one myself.

simonw · 2026-06-01T16:39:22 1780331962

GPT-5.5 found a solution only after assuming that you're allowed to concatenate numbers together e.g. 8 7 becomes 87 (it complained at first that it was "under-specified") - using Python it brute-forced a solution (actually finding 13): https://chatgpt.com/share/6a1db54f-7ab8-8333-9218-86a469c284...

Are you sure this is 4th grade level?

minimaxir · 2026-06-01T17:28:39 1780334919

I questioned OP's "there is an answer online" claim so I checked and the only source found for the original question was a 5th grade Russian school for mathematics.

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

MichaelNolan · 2026-06-01T17:33:54 1780335234

Apparently there is a way to solve this without brute forcing all the combinations. It has to do with looking at how many even an odd numbers there are, and taking into account the goal number is odd. And then thinking through the combinations [even-even=even, even-odd=odd,…]

Though this is obviously not something I would expect a 4th grader to solve.

DiogenesKynikos · 2026-06-02T08:07:03 1780387623

> 4th grade math problem

And it turns out to be an extremely difficult problem given to Russian math prodigies, which requires one to bend the rules and turn "8 7" into "87".

otabdeveloper4 · 2026-06-04T20:41:54 1780605714

It's a standard "Russian math" problem. There's boatloads more where that came from, and none of them are solved by LLMs.

otabdeveloper4 · 2026-06-01T07:48:20 1780300100

No.

TL/DR - "cleaner and more polished" implies no vibe coded slop, and the only reason people start these projects in 2026 is because they're addicted to vibe slop coding.

otabdeveloper4 · 2026-06-01T07:28:10 1780298890

> And containers were supposed to make things safer ...

No. Containers are a slight improvement over the .tar.gz software distribution method we had a few decades ago.

(And I mean "slight" literally - a Docker container is just a .tar.gz with a bundled bash script that runs in a chroot.)

otabdeveloper4 · 2026-06-01T07:26:08 1780298768

> Most of us install Docker just to run a project locally

There's your mistake.

(Akshually using Docker is the real mistake, but that ship has sailed, no fixing these people now.)

otabdeveloper4 · 2026-05-31T10:01:28 1780221688

> Why do we need AI here?

AI psychosis is a real thing and an actual mental health issue.

pibaker · 2026-05-31T23:01:08 1780268468

The only psychotic behavior I see in the linked issue comes from the anti AI people.

rf15 · 2026-05-31T10:19:11 1780222751

funny speculative question: psychosis is evidently a gradient. Does AI just highlight latent general psychosis (i.e. in the simplified interpretation of a worldview shaped more by unchecked belief and fantasy than observation) in otherwise largely functional people?

What if the problem is that we train people too much to take things that are being said at face value without questioning/observing them, increasing the psychosis problem?

ethbr1 · 2026-05-31T15:17:43 1780240663

Everyone is susceptible to addictions or psychosis to some degree.

What matters is when the stimulus presented exceeds their resistance.

Extended AI use is a highly attractive stimulus that exceeds most people's resistance, especially when sycophantically interacted with in an echo chamber (human-AI, with no other humans in the room).

So yes, it's dangerous in the same way that cigarettes and social media are.

Just because some people can avoid slipping into it, doesn't mean we should ignore population-as-a-whole outcomes.

xnx · 2026-05-31T10:32:28 1780223548

Similar to anti-AI derangement

nbaugh1 · 2026-05-31T11:28:54 1780226934

It really is an odd feeling to be in between the 2 extremes

ethbr1 · 2026-05-31T15:21:46 1780240906

The problem is that both camps take their positions as religious righteousness, which lobotomizes their abilities to have productive, pros and cons discussions about matters at hand.

The internet/apps of the last 20 years have not exactly boosted people's ability to think critically and set aside their passions though.

Much easier to keep eyeballs glued and sell them ads if you encourage their baser impulses.

weakfish · 2026-06-01T16:04:24 1780329864

take heart knowing at least I'm there with ya.

otabdeveloper4 · 2026-05-31T11:28:41 1780226921

No. There is no "anti-AI derangement", the reaction to slop is normal.

akerl_ · 2026-05-31T12:09:56 1780229396

Spamming an open source developer with angry comments because they decided to use a new tool for the code they write and publish freely is not normal.

dragontamer · 2026-05-31T12:32:57 1780230777

This is rsync we are talking about. A bug in rsync basically means lost data and/or unreliable backups.

I think it's normal to be pissed at lost data. Maybe it's not socially acceptable to spit in the face of a volunteer but it's 100% human to feel annoyed by an obvious drop in code quality.

lukaslalinsky · 2026-05-31T13:44:10 1780235050

The thing is, showing the annoyance to the volunteer, who is already doing their best, has two possible outcomes:

1) they stop volunteering

2) they will ignore you

In neither of that is your issue solved. So maybe it's better to deal with the frustration on your own and then file a bug report.

dragontamer · 2026-05-31T14:56:21 1780239381

There must be some degree of communication from customers to developers. Even if it is a free volunteer service.

Poor communication results in professionals firing the customer as well. None of this is exclusive to OSS of volunteer effort. But the communication in general is necessary.

This is just product management and communication issues. There is an perceived problem and the problem MUST be communicated.

Problems aren't solved by shutting up and ignoring things. And based on the discussion in this topic, it's clear there's a lot of people who are worried about rsync code quality here.

lukaslalinsky · 2026-05-31T16:24:55 1780244695

Look, it's not that long time ago when we had the xz malware. The pattern is always the same. Maintainer of the project is doing X, people start to pressure them to do something else, maintainer gives up and opens the project up to other maintainers, and then many things can happen. If there is any lesson from the incident, open source maintainers should never allow the pressure to happen, ignore it if it's too strong, block people. Rsync has been maintained for a very long time. Bugs happen, even regression bugs happen. People don't get to dictate how should the volunteer do development.

akerl_ · 2026-05-31T16:28:35 1780244915

If I were the rsync maintainer I’d probably set the repo to only allow issues and PRs from prior contributors until people learn to behave.

llbbdd · 2026-05-31T18:25:23 1780251923

If I were the rsync maintainer after this I'd unpublish it everywhere I had control over, delete the repo and turn off my computer to go walk in the park. The linked thread is insane.

lukaslalinsky · 2026-05-31T18:32:11 1780252331

Just going away from computers for a few days should be enough, the mob will get tired soon.

dragontamer · 2026-05-31T18:14:20 1780251260

Yes. That's called firing the customer in my line of work.

This doesn't seem egregious enough to fire the customer.

akerl_ · 2026-05-31T18:18:01 1780251481

Again, this is not work and they are not customers.

This is somebody spending their free time on code they enjoy and then putting the result online.

The reason businesses are careful about which customers they fire is because they want to keep having customers. Open source maintainers have no reason to deal with that shit.

account42 · 2026-06-03T14:06:55 1780495615

No this is part of the foundation that many open computing systems are built on. It's long past being just someone's experimental personal repo.

akerl_ · 2026-06-03T16:26:29 1780503989

How many people have to star your repo before it's no longer yours?

dragontamer · 2026-05-31T18:21:42 1780251702

Then he can fire the customer. By simply closing the issue.

dragontamer · 2026-05-31T19:18:30 1780255110

And it seems like regressions that lead to rsync losing data is just as serious.

Again: we are talking about rsync here. This new methodology being used this year seems to be associated with a regression (ie: Data loss since this is rsync after all....) that likely wouldn't have happened any other year.

Or at least: the regressions at play are consisting of thousands of lines of changes that was only navigated by Claude later down in the discussion.

We are reaching the point of AI developed code that requires AI itself to analyze. One step at a time. It's right for the open source customers who are used to understanding changes and smaller patches than this.

pibaker · 2026-05-31T23:02:49 1780268569

Before you call yourself a customer of an FOSS project, perhaps show us the receipt that a monetary transaction had actually taken place between you and the developer.

Otherwise, you're just a beggar. And beggars don't get to choose.

akerl_ · 2026-05-31T15:13:22 1780240402

These are not customers.

306bobby · 2026-05-31T16:18:47 1780244327

[flagged]

akerl_ · 2026-05-31T16:24:14 1780244654

That’s exactly what I was calling out.

Customers pay money for goods and services. They thus get a bunch of social, ethical, and legal positions in terms of their relationship with the seller.

Rsync is an open source project that its maintainers put onto the Internet. People who use it are not customers, and they do not have the right to expectations around how the maintainers will change the software or change how they develop it.

dragontamer · 2026-05-31T17:05:45 1780247145

You've never had a customer in your professional setting who didn't pay money for goods and/or services? Yet it was very important for your boss (and therefore you, as a programmer) to service their every whim?

Customers are customers. Whether they're paying or not. Not all customers are worth servicing (even with infinite money offered, "firing a customer" is important to keep the community in check).

But this isn't a situation where the RSYNC maintainer should fire the customer. There's a LOT of backlash to this release. Even if this one particular customer is a bit of an ass, there's plenty of good users in that 90+ comment chain (hundreds now?) where this regression has clearly struck a nerve.

akerl_ · 2026-05-31T17:21:02 1780248062

This is not a professional setting. This is an open source project that somebody published to the internet. Using it does not make you a customer, and it doesn’t matter if it “struck a nerve” with users.

dragontamer · 2026-05-31T17:52:57 1780249977

Well in my professional setting, I deal with non-paying customers all the time. They're still customers and I'm still expected to listen to them.

It was better when a dedicated PM was shielding me from this crap but here we are. Deciding who and who not to listen to is just part of project management.

akerl_ · 2026-05-31T17:54:43 1780250083

Sure. But an open source project is not a professional setting.

weakfish · 2026-06-01T16:05:25 1780329925

Right. Sure. But this isn't a professional setting.

account42 · 2026-06-03T14:02:57 1780495377

No longer volunteering is not an obviously worse outcome than volunteering negative value contributions.

bakugo · 2026-05-31T18:50:22 1780253422

If committing thousands of lines of unreviewed AI generated code is "doing their best", I'd argue that them not contributing anymore would be a net benefit for the project.

lukaslalinsky · 2026-05-31T19:31:47 1780255907

That's possible, but who are you to tell a person what they should and shouldn't do in their free time.

bakugo · 2026-05-31T19:41:15 1780256475

I could ask you the same thing. Who are you to tell a person they're not allowed to criticize someone else's public actions?

akerl_ · 2026-05-31T19:50:04 1780257004

Sorry, but working on projects that interest you and going online to tell somebody they fucked up are not equivalent social behaviors.

akerl_ · 2026-05-31T12:48:09 1780231689

> Maybe it's not socially acceptable to spit in the face of a volunteer

Why are you hedging this? Do you think maybe it is socially acceptable?

dragontamer · 2026-05-31T14:48:58 1780238938

This isn't a hedge at all. There is likely an English mistake/misinterpretstion being made here. I am a native English speaker btw.

akerl_ · 2026-05-31T15:14:06 1780240446

Do you believe the comments in this GitHub issue are acceptable social discourse towards an open source maintainer?

dragontamer · 2026-05-31T15:23:52 1780241032

The first comment, which is a screenshot from Mastodon, is perfectly acceptable. There is a clear regression between newer versions of rsync.

Then egos got bruised and things leave the realm of reason soon after. But coming with a request saying "Version X worked while version Y doesn't", with maybe some degree of annoyance, is fine.

akerl_ · 2026-05-31T15:39:00 1780241940

The title of the original issue, by the original issue submitter, is “Please Do Not Vibe Fuck Up This Software”.

dragontamer · 2026-05-31T15:42:48 1780242168

[flagged]

akerl_ · 2026-05-31T15:52:58 1780242778

The target of the issue title is pretty clearly the maintainer. They’re the one being told to stop using AI.

I guess this answers the question of whether you think maybe it’s ok to spit in the face of open source developers.

jodrellblank · 2026-05-31T16:37:10 1780245430

“Maybe don’t do that?” does not mean “I support you doing that” no matter how unfamiliar you are with it as an idiom.

“I cut my finger with the kitchen knife”

“Maybe don’t hold it by the blade”

It’s something along the lines of sarcastic and deliberately unhelpful because “duh, of course don’t do that”.

akerl_ · 2026-05-31T16:50:03 1780246203

This doesn’t seem related to my comment. Did you mean to reply to me upthread?

Saying ~“maybe it’s not ok to do <thing> but <reasons they might do thing>” is nothing like your example and does imply it’s acceptable to the speaker to sometimes do that thing.

But we’re past that now because the person I was discussing this with has gone ahead and clarified that telling an open source maintainer to please stop fucking up isn’t an angry comment.

otabdeveloper4 · 2026-06-01T07:17:20 1780298240

> because they decided to use a new tool for the code they write

That's not why the comments are angry. The anger is directed at the slop approach to code review.

I guarantee you the same anti-AI people wouldn't give a shit if the author used an AI-enabled IDE and personally vetted all commits.

watwut · 2026-05-31T14:09:43 1780236583

You might want to use different term. After all, Trump derangement syndrome turned out to be "people who actually listen to him and say truth about him".

X-derangement thing is not used in reference to people whobare wrong or lying, but in reference to people who are making correct observations