Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Codex App (openai.com)
772 points by meetpateltech 1 day ago | hide | past | favorite | 586 comments




It is baffling how these AI companies, with billions of dollars, cannot build native applications, even with the help of AI. From a UI perspective, these are mostly just chat apps, which are not particularly difficult to code from scratch. Before the usual excuses come about how it is impossible to build a custom UI, consider software that is orders of magnitude more complex, such as raddbg, 10x, Superluminal, Blender, Godot, Unity, and UE5, or any video game with a UI. On top of that, programs like Claude Cowork or Codex should, by design, integrate as deeply with the OS as possible. This requires calling native APIs (e.g., Win32), which is not feasible from Electron.

>This requires calling native APIs (e.g., Win32), which is not feasible from Electron.

Who told you that? You can write entire C libraries and call them from Electron just fine. Browser is a native application after all. All this "native applications" debate boils down to the UI implementation strategy. Maintaining three separate UI stacks (WinUI, SwiftUI, GTK/Qt) is dramatically more expensive and slower to iterate on than a single web-based UI with shared logic

We already have three major OSes, all doing things differently. The browsers, on the other hand, use the same language, same rendering model, same layout system, and same accessibility layer everywhere, which is a massive abstraction win.

You don't casually give up massive abstraction wins just to say "it's native". If "just build it natively" were actually easier, faster, or cheaper at scale, everyone would do just that.


It baffles me how much the discourse over native apps rarely takes this into consideration.

You reduce development effort by a third, it is ok to debate whether a company so big should invest into a better product anyway but it is pretty clear why they are doing this


That might be true (although you do add in the mess of web frameworks), but I strongly believe that resource usage must factor into these calculations too. It's a net negative to end users if you can develop an app a bit quicker but require the end users to have multiple more times RAM, CPU, etc.

> multiple more times RAM, CPU, etc.

Part of this (especially the CPU) is teams under-optimizing their Electron apps. See the multi-X speedup examples when they look into it and move hot code to C et al.


It might be a cynical take, but I don't think there is a single person in these companies that cares about end user resource usage. They might care if the target were less tech savvy people that are likely to have some laptop barely holding up with just Win11. But for a developer with a MacBook, what is one more electron window?

Especially given how fast things progress, timeline and performance are a tradeoff where I'd say swaying things in favour of the latter is not per definition net positive.

There's another benefit - you don't have to keep refactoring to keep up with "progress"!

Of course you do!

Microsoft makes a new UI framework every couple of years, liquid glass from apple and gnome has a new gtk version every so often.


The real question is how much better are native apps compared to Electron apps.

Yes that would take much disk space, but it takes 50Mb or 500Mb isn't noticeable for most users. Same goes for memory, there is a gain for sure but unless you open your system monitor you wouldn't know.

So even if it's something the company could afford, is it even worth it?

Also it's not just about cost but opportunity cost. If a feature takes longer to implement natively compared to Electron, that can cause costly delays.


It absolutely is noticeable the moment you have to run several of these electron “apps” at once.

I have a MacBook with 16GB of RAM and I routinely run out of memory from just having Slack, Discord, Cursor, Figma, Spotify and a couple of Firefox tabs open. I went back to listening to mp3s with a native app to have enough memory to run Docker containers for my dev server.

Come on, I could listen to music, program, chat on IRC or Skype, do graphic design, etc. with 512MB of DDR2 back in 2006, and now you couldn’t run a single one of those Electron apps with that amount of memory. How can a billion dollar corporation doing music streaming not have the resources to make a native app, but the Songbird team could do it for free back in 2006?

I’ve shipped cross platform native UIs by myself. It’s not that hard, and with skyrocketing RAM prices, users might be coming back to 8GB laptops. There’s no justification for a big corporation not to have a native app other than developer negligence.


On that note, I could also comfortably fit a couple of chat windows (skype) on a 17'' CRT (1024x768) back in those days. It's not just the "browser-based resource hog" bit that sucks - non-touch UIs have generally become way less space-efficient.

Also, modern native UIs became looking garbage on desktops / laptops, where you usually want a high information density.

Just look at this TreeView in WinUI2 (w/ fluent design) vs a TreeView in the good old event viewer. It just wastes SO MUCH space!

https://f003.backblazeb2.com/file/sharexxx/ShareX/2026/02/mm...

And imo it's just so much easier to write a webapp, than fiddle with WinUI. Of course you can still build on MFC or Win32, but meh.


There are cross platform GUI toolkits out there so while I am in team web for lots of reasons, generally it’s because web apps are faster and cheaper to iterate.

Cross platform GUIs might does have the same of support and distributed knowledge as HTML/CSS/JS. If that vendor goes away or the oss maintainers go a different direction, now you have an unsupported GUI platform.

I mean the initial release of Qt predates JavaScript by a few months and CSS by more than a year. GTK is only younger by a few years and both remain actively maintained.

Argument feels more like FUD than something rooted in factual reality.


>You reduce development effort by a third

Done by the company which sells software which is supposed to reduce it tenfold?


> You don't casually give up massive abstraction wins

Value is value, and levers are levers, regardless of the resources you have or the difficulty of the problem you're solving.

If they can save effort with Electron and put that effort into things their research says users care about more, everyone wins.


After every time I read "save effort with Electron", I go back to Win2K VM and poke around things and realize how faster everything is than M4 Max, just because value is value, and Electron saves some effort.

That's like a luxury lumber company stuffing its showrooms full of ikea furniture.

> You reduce development effort by a third

Sorry to nitpick, but this should be "by three" or "by two thirds", right?


> If "just build it natively" were actually easier, faster, or cheaper at scale, everyone would do just that

Value prop of product quality aside, isn't the AI claim that it helps you be more productive? I would expect that OpenAI would run multiple frontends and that they'd use Codex to do it.

Ie are they using their own AI (I would assume it's semi-vibe-coded) to just get out a new product or using AI to create a new product using the productivity gains to let them produce higher quality?


On a side note, the company I work for (RemObjects, not speaking on their behalf) has a value ethos specifically about using the native UI layers, and encouraging our customers to do the same. (We make dev tools, a compiler supporting six languages (C#, Java, Go, etc) plus IDEs.)

Our IDE does this: common code / logic, then a native macOS layer and a WPF layer. Yes, it takes a little more work (less than you'd think!) but we think it is the right way to do it.

And what I hope is that AI will let people do the same -- lower the cost and effort to do things like this. If Electron was used because it was a cheap way to get cross-platform apps out, AI should now be the same layer, the same intermediate 'get stuff done' layer, but done better. And I don't think this prevents doing things faster because AI can work in parallel. Instead of one agent to update the frontend, you have two to update both frontends, you know?

We're building an AI agent, btw. Initially targeting Delphi, which is a third party's product we try to support and provide modern solutions for. We'll be adding support for our own toolchains too.

What I fear is that people will apply AI at the wrong level. That they'll produce the same things, but faster: not the same things, but better (and faster.)


It's about consistency - you want to build an app that looks and functions the same on all platforms as much as possible. Regardless of if you are hand-coding or vibe-coding 3 entirely separate software stacks, getting everything consistent is going to be a challenge and subtle inconsistencies will sneak in.

It comes back to fundamental programming guidelines like DRY (Don't Repeat Yourself) - if you have three separate implementations in different languages for everything, changes will be come harder and you will move slower. These golden guidelines still stand in a vibe-code world.


Wouldn’t maintaining the different UI stacks be something a language model could handle? Creating a new front end where the core logic is already defined or making a new one from an existing example has gone pretty fast for me. The “maintenance“ cost might not be as high as you think.

The gap here is that the company has the money and native apps are so clearly better. With an interactive app a company like OpenAI could really tweak the experience for Android and iOS which have different UX philosophies and featuresets in order to give the best experience possible. It's really a no brainer imho.

> the company has the money

It's not about money. It's not a tradeoff in cost vs quality - it's a tradeoff in development speed. Shipping N separate native versions requires more development time for any given change: you must implement everything (at least every UI) N times, which drastically increases the design & planning & coordination required vs just building and shipping one implementation.

Do you want to move slower to get "native feel", or do you want to ship fast and get N times as much feature dev done? In a competitive race while the new features are flowing, development speed always wins.

Once feature development settles down, polish starts to matter more and the slowdown becomes less important, and then you can refocus.


Yeah that's why startups often pick iOS first, get product-market fit, and then do Android. The fallacy that abstractions tout is that Android and iOS are the same.

They are not.

A good iOS app is not 1:1 equivalent to what a good Android app would be for the same goal. Treating them as such just gives users a lowest common denominator product.


> it's a tradeoff in development speed

Doesn't this get thrown out the window now that everyone claims you can be 10x, 50x, 100x more productive with AI? Hell people were claiming you can ask a bunch of AI agents to build a browser from scratch, so surely the dev speed argument no longer applies.


Even if we assume a developer is actually 10x more productive with AI, if you triple their workload by having them build 3 native apps now you're only 3.33x more productive.

No, you would be ten times as productive. You would ship three different apps 3,3 times faster than you previously only shipped one.

The productivity comparison must be made between how long it takes to ship a certain amount of stuff.


So, this certainly was a valid argument. But it seems to me that the whole value proposition behind these agentic AI coding tools is to be able to move beyond this. Are we very far from being able to define some Figmas and technical specs and have Codex generate the UIs in 5 different stacks? If that isn't a reality in the near future, then why should we buy AI Tools?

>If "just build it natively" were actually easier, faster, or cheaper at scale, everyone would do just that.

Exactly. Years go by and HN keeps crying about this despite it being extremely easy to understand for anyone. For such a smart community, it's baffling how some debates are so dumb.

The only metric really worth reviewing is resource usage (and perhaps appearance). These factors aren't relevant to the general population as otherwise, most people wouldn't use these apps (which clearly isn't the case).


React Native is able to build abstractions on top of both Android and iOS that uses native UI. Microsoft even have a package for doing a "React Native" for Windows: https://github.com/microsoft/react-native-windows

It's weird that we don't have a unified "React Native Desktop" that would build upon the react-native-windows package and add similar backends for MacOS and Linux. That way we could be building native apps while keeping the stuff developers like from React.


There are such implementations for React Native: https://reactnative.dev/docs/out-of-tree-platforms

React Native desktop on Linux isn't a thing, the GTK backend is abandonned.

So if you want a multiplatform desktop app also supporting Linux, React Native isn't going to cut it.


https://reactnative.dev/docs/out-of-tree-platforms says otherwise

React Native Skia allegedly runs on Linux too


React Native Skia seems abandoned. But maybe this will make React Native on Linux viable

https://github.com/gtkx-org/gtkx


React Native Skia last commit is three years ago.

the three OSes is BS, none of them cares about linux

This is such a toy webdev take. It's like you guys forget that the web-browser wouldn't work at all if not for the server half, all compiled to native code.

The browser is compiled to native code. It wasn't that long ago that we had three seperate web browsers who couldn't agree on the same set of standards either.

Try porting your browser to Java or C# and see how much faster it is then. The OS the browser and the server run on are compiled to native code. Sun gave up on HotJava web browser in the 1990's, because it couldn't do %10 or %20 of what Netscape or IE could do, and was 10 x slower.

Not everybody is running a website selling internet widgets. Some of us actually have more on the line if our systems fail or are not performant than "oooh our shareholders are gonna be pissed".

People running critical emergency response systems day in, day out.

The very system you typed this bullshit on is running native code. But oh no, thats "too hard" for the webdev crowd. Everyone should bend to accomodate them. The OS should be ported to run inside the browser, because the browser is "so good".

Good one. It's hilarious to see this Silicon Valley/Bay Area, chia-seed eating bullshit in cheap spandex riding their bikes while the trucks shipping shit from coast to coast passing them by.


The situation for Desktop development is nasty. Microsoft had so many halfassed frameworks and nobody knows which one to use. It’s probably the de facto platform on Windows IS Electron, and Microsoft use them often, too.

On MacOS is much better. But most of the team either ended up with locked in Mac-only or go cross platform with Electron.


I guess it shows how geriatric I am with desktop app development these days, but does no one use Qt anymore? Wasn't the dream for that to be a portable and native platform to write GUI apps? Presumably that could abstract away which bullshit Microsoft framework they came out with this week.

I haven't touched desktop application programming in a very long time and I have no desire to ever do so again after trying to learn raw GTK a million years ago, so I'm admittedly kind of speaking out of my ass here.


Qt is still used, but I think part of the reason it is less used is that C++ isn't always the right language anymore for building GUI application.

That’s actually why we're working on Slint (https://slint.dev): It's a cross-platform native UI toolkit where the UI layer is decoupled from the application language, so you can use Rust, JavaScript, Python, etc. for the logic depending on what fits the project better.


How can C++ not be the "right" language? It seems to meet all the requirements for event-driven GUIs - event handlers are function callbacks after all...

C++ works, but compared to other languages it's often no longer the most productive choice for UI work. Modern UI code is mostly glue and state management, where fast iteration matters more than squeezing out maximum performance. And when performance does matter, there are also newer, safer languages.

For teams comfortable with C++ or with existing C++ libraries to integrate, it can of course still be a strong choice, just not the preferred one for most current teams.


But desktop C++ isn't difficult or slow to write...

It seems odd to me that the software world has gone in the direction of "quick to write - slow to run". It should be the other way around. Things of quality (eg. paintings by Renaissance masters) took time to create, despite being quick to observe.

It also seems proven that releasing software quickly ("fast iteration") doesn't lead to quality - see how many releases of the YouTube app or Netflix there are on iOS or Android; if speedy releases are important, it is valuing rush to production over quality, much like a processed food version of edible content.

In a world that is also facing energy issues, sluggish and inefficient performance should be shunned, not welcomed?

I suppose this mentality is endemic, and why we see a raft of cruddy slow software these days, where upcoming developers ("current teams") no longer value performance over ease of their job. It can only get worse if the "it's good enough" mentality persists. It's quite sad.


The part that takes time in UI isn’t wiring up components, it’s the small changes like something is a pixel to the right or that gap is two pixels wide. Changing those in a C++ project means recompiling and that adds up to significant overhead over a day of polishing the UI. If C++ was able to get builds out in less than a second, this wouldn’t be an issue. People value performance in their own tools more than the tools of their customer.

In modern Qt you don't write UI in C++ anymore - you do that in QML. It is far simpler to create amazing pixel perfect UIs with drooling-inducing animations in QML. I wrote a blog post that talks a bit about this[1].

[1] https://rubymamistvalove.com/block-editor


Qt means C++. I'll take Typescript over C++ for a GUI task any day.

Qt is also pretty memory-hungry; maybe rich declarative (QML) skinnable adaptable UIs with full a11y support, etc just require some RAM no matter what. And it also looks a wee bit "non-native" to purists, except on Windows, where the art of uniform native look is lost.

Also, if you ever plan extensions / plugin support, you already basically have it built-in.

Yes, a Qt-based program may be wonderfully responsive. But an Electron-based app can be wonderfully responsive, too. And both can feel sluggish, even on great hardware. It all depends on a right architecture, on not doing any (not even "guaranteed fast") I/O in the GUI thread, mostly. This takes a bit of skill and, most importantly, consideration; both are in short supply, as usual.

The biggest problem with Electron apps is their size. Tauri, which relies on the system-provided web view component, is the reasonable way.


I don't get this HN worship of Qt. Have you ever used Qt apps on macOS? They don't feel native at all. They feel sort-of native-emulating in the same way wxWidgets apps on macOS feel: they use native controls but all the little details including design language are off.

I'm not saying this is a huge problem for me even if it bothers me personally. But if you're here on HN advocating native over Electron, then it seems logical to me that you would care about being truly native instead of merely "using native controls while feeling off".

This is even before getting to the point that Qt isn't truly native. They just draw controls in a style that looks native, they don't actually use native controls. wxWidgets uses native controls but they don't behave better despite that.


This is not because of Qt - it is due to some (most) Qt developers not caring enough. I created my Qt app feel native both on macOS and Windows[1]. It did require a lot of tuning - but those are things I'll reuse across other apps.

[1] https://get-notes.com/


They don’t look native on Windows, either.

And GTK4 is even very usable from Rust too. It’s not a bad development experience, but these companies probably find 100 webdevs for every system programmer.

Come on GUI apps are not systems programming, what's with this title inflation.

One reason why I personally never bothered is the licensing of some of its important parts, which is a choice of either GPL or commercial. Which is fair, but too bothersome for some use-cases (e.g. mobile apps which are inherently GPL-unfriendly). Electron and the likes are typically MIT/BSD/etc licensed.

Qt is still pretty good, but it's dated in comparison to newer frameworks like Flutter and React Native. No hot reloading of changes, manual widget management vs. React where you just re-define the whole UI every frame and it handles changes magically, no single source of truth for state, etc.


That's a third party paid addon. Hardly a fair comparison.

I built my Block Editor (Notion-style) in Qt C++ and QML[1].

[1] https://get-notes.com


This is another common excuse.

You don't need to use microsoft's or apple's or google's shit UI frameworks. E.g. see https://filepilot.tech/

You can just write all the rendering yourself using metal/gl/dx. if you didn't want to write the rendering yourself there are plenty of libraries like skia, flutter's renderer, nanovg, etc


Customers simply don't care. I don't recall a single complain about RAM or disk usage of my Electron-based app to be reported in the past 10 years.

You will be outcompeted if you waste your time reinventing the wheel and optimizing for stuff that doesn't matter. There is some market for highly optimized apps like e.g. Sublime Text, but you can clearly see that the companies behind them are struggling.


>Customers simply don't care. I don't recall a single complain about RAM or disk usage of my Electron-based app to be reported in the past 10 years.

I see complains about RAM and slugginess against Slack and countless others Electron apps every fucking day, same as with Adobe forcing web rendered UI parts in Photoshop, and other such cases. Forums are full of them, colleagues always complain about it.


Of course they complain about them, but those are the users, not the purchasers.

How are Adobe and Slack/Salesforce doing?

Are they hurting for customers?


the people that USE the software the most are not the people BUYING the software. it’s why all enterprise software has trash UX.

do you think i as a software engineer like using Jira? Outlook? etc? Heck even the trendy stuff is broken. Anthropic took took 6 months to fix a flickering claude code. -_-


Yes that was my point.

Not relevant point though. I was answering to this "I don't recall a single complain about RAM or disk usage of my Electron-based app to be reported in the past 10 years", I wasn't arguing that such apps don't make money.


McDonald’s isn’t hurting for customers either. Doesn’t mean their food is anything a chef ought to aspire to.

I'm loving it

McDonald's is renown for speed of service, a bit ironic to compare that to slow apps

Maybe 40 years ago.

Neither it means that McDonald's should aspire to be a chef

Sure, aspiring to mediocrity at a cost to others is a choice.

> Customers simply don't care. I don't recall a single complain about RAM or disk usage of my Electron-based app to be reported in the past 10 years.

Nothing is worse than reading something like this. A good software developer cares. It’s wrong to assume customers don't care simply because they don't know what's going on under the hood. Considering the downsides and the resulting side effects (latency, more CPU and RAM consumption, fans spinning etc.), they definitely do care. For example, Microsoft has been using React components in their UI, thinking customers wouldn’t care, but as we have been seeing lately, they do care.


Not seeing complaints doesn't mean they don't exist. Not to mention ui latency that is common in electron apps that is just a low-level constant annoyance.

People absolutely care, but the issue is that no single company/app is really responsible. It's the tragedy of the commons, but for users RAM. No one electron app uses all the RAM, but just a couple are enough to make a common 16GB machine slow down massively.

I have complained about literally every Electron based app I have ever used. How would you know there are no complaints?

There are complaints and then users keep using these super popular and bloated apps. Techies make it seem like bloat is a capital sin but it isn't.

That just means your feedback system is trash if it fails to surface such an obvious and common pain point in user experience. Tough that's an extremely common state of feedback systems. But also, the general computer knowledge isn't that high for every end user to connect some sluggishness in another app to your app wasting ram and causing disk swaps, that eliminates a lot of end user complaints

> reinventing the wheel

what exactly are you inventing by using a framework "invented" decades ago and used by countless apps in all those years?


I don’t complain about Electron because I didn’t install the app if I could avoid it.

> I don't recall a single complain about RAM or disk usage of my Electron-based app to be reported in the past 10 years.

When was the last time complaining about this did anything?


Even with SublimeText, most popular IDE is VSCode, most popular interface design tool Figma, all popular chat platforms and so on are all electron based. If people were desperate for faster platforms they'll be migrating to them.

> Even with SublimeText, most popular IDE is VSCode

What a weird comparison, one is free, another one is a premium app, of course a lot of people prefer some suffering over paying money


Your mistaking supply-side path dependent outcomes that produce a lack of consumer choice with consumer preference. No consumer prefers slow, bloated, non-native software, but they're stuck with what they can get.

There is competition for Figma. Sketch.

There's plenty of competition for VSCode too.

Don't forget that these Electron apps outcompeted native apps. Figma and VSCode were underdogs to native apps at one point. This is why your supply side argument doesn't make any sense.


> There's plenty of competition for VSCode too.

But there isn't, not if you include all the extensions and remember the price


So an Electron app won. Seems like Electron wasn't a hinderance.

Sure, you can ignore that it was a hindrance just like you ignored ignored the previous point.

Like how you ignored my point too?

If it was a hindrance, why did it win?

Seems clear to me that Electron's higher RAM usage did not affect adoption. Instead, Electron's ability to write once and ship in any platform is what allowed VSCode to win.


> Like how you ignored my point too?

No, differently

> If it was a hindrance, why did it win?

Because reality is not as primitive as you portray it to be, you can have hindrances and boosts with the overall positive even winning effect? That shouldn't be that hard!

> Seems clear to me that Electron's higher RAM usage did not affect adoption.

Again, it only seems clear because you ignore all the dirt, including basic things (like here, it's not just ram, is disk use, startup speed, but also like before with competition) and strangely don't consider many factors.

> Instead, Electron's ability to write once and ship in any platform is what allowed VSCode to win.

So nothing to do with it using the most popular web stack, meaning the largest pool of potential contributors to the editor or extensions??? What about other cross platform frameworks that also allowed that??? (and of course it's not any platform, just 3 desktop ones where VSc runs)


  So nothing to do with it using the most popular web stack, meaning the largest pool of potential contributors to the editor or extensions??? What about other cross platform frameworks that also allowed that??? (and of course it's not any platform, just 3 desktop ones where VSc runs)
I'm not even sure what you're arguing at this point.

Are you arguing that Electron helped VSCode win or what? Because Electron being able to use a popular web stack is also a benefit.

What is your point?


Do a search for "Microsoft teams slow crash" and you'll find a billion complaints by normies.

They're only doing well because of their illegal monopolistic practices not being cracked down on.


Because there is no point in reporting such complains. Just a waste of time.

They don’t care, or they don’t know? What they do know is their computer that’s only 5 years old goes to shit with only a few apps open. Time for a new laptop.

Thanks for contributing to the obsolescence cycle.


The various GPU-accelerated terminal projects always make me chuckle

Not sure why, terminals are literally GPU accelerated text rendering solutions since the very beginning of rendering text

Heck, not even just a separate card or whatever, back in the terminal days where you practically had a whole separate small computer just to display the output of the bigger computer on a screen instead of paper.

> Customers simply don't care.

They do, but they don't know what's causing it. 8GB of RAM usage for Codex App is clown-level ridiculous.


I don't bother complaining about Electron-based applications to the developer, and I expect that's not an unusual position. It's not like the downsides are hidden, unique, or a surprise, and if the developers' priorities aligned with ours, they wouldn't have picked electron in the first place.

I use web-tech apps because I have to, and because they're adequate, not because it's an optimal user experience.


I care. I refuse to use Electron slop unless it is literally the only option available (usually due to some proprietary locked-in platform eg Discord). I will happily pay significant sums of money for well-made native apps that often have fewer features than the Electron versions, simply for the pleasure of using tools that integrate seamlessly with my operating system. Not all of us have given up on software quality.

> Customers simply don't care. I don't recall a single complain about microplastics in the past 10 years.

> You will be outcompeted if you waste your time reinventing the wheel and optimizing for stuff that doesn't matter. There is some market for safe, environmentally-friendly products, but you can clearly see that the companies that make them are struggling.

ok.


That'll work great until your first customer from a CJK or RTL language writes in, "Hey, how come I can't type in your app?", or the blind user writes in "Hey how come your app is completely blank?" then you'll be right in the middle of the "Find Out" phase

These strategies are fine for toy apps but you cannot ship a production app to millions or even thousands of people without these basics.


How is File Pilot for accessibility and for all of the little niceties like native scrolling, clipboard interaction, drag and drop, and so on? My impression is that the creator is has expertly focused on most/all of these details, but I don't have Windows to test.

I insist on good UI as well, and, as a web developer, have spent many hours hand rolling web components that use <canvas>. The most complicated one is a spreadsheet/data grid component that can handle millions of rows, basically a reproduction of Google Sheets tailored to my app's needs. I insist on not bloating the front-end package with a whole graph of dependencies. I enjoy my NIH syndrome. So I know quality when I see it (File Pilot). But I also know how tedious reinventing the wheel is, and there are certain corners that I regularly cut. For example there's no way a blind user could use my spreadsheet-based web app (https://github.com/glideapps/glide-data-grid is better than me in this aspect, but there's no way I'm bringing in a million dependencies just to use someone else's attempt to reinvent the wheel and get stuck with all of their compromises).

The answer to your original question about why these billion dollar companies don't create artisanal software is pretty straightforward and bleak, I imagine. But there are a few actually good reasons not to take the artisanal path.


File pilot is extremely good in my experience, literally the only issue is it doesn't display the sync status on icons in a Dropbox folder.

I'd love to see some opensource projects actually do a good job of this. Its a lot of work, especially if you want:

- Good cross platform support (missing in filepilot)

- Want applications to feel native everywhere. For example, all the obscure keyboard shortcuts for moving around a text input box on mac and windows should work. iOS and Android should use their native keyboards. IME needs to work. Etc

- Accessibility support for people who are blind and low vision. (Screen readers, font scaling, etc)

- Ergonomic language bindings

Hitting these features is more or less a requirement if you want to unseat electron.

I think this would be a wonderful project for a person or a small, dedicated team to take on. Its easier than it ever used to be thanks to improvements in font rendering, cross platform graphics libraries (like webgpu, vulcan, etc) and improvements in layout engines (Clay). And how much users have dropped their standards for UI consistency ever since electron got popular and microsoft gave up having a consistent UI toolkit in windows.

There are a few examples of teams doing this in house (eg Zed). But we need a good opensource project.


We're actually working on a native open source cross-platform UI toolkit called Slint that’s trying to do exactly that. https://slint.dev

But Electron doesn’t hit that bar even

yep, you're right to call that.

> You don't need to use microsoft's or apple's or google's shit UI frameworks. E.g. see https://filepilot.tech/

That's only for Windows though, it seems? Maybe the whole "just write all the rendering yourself using metal/gl/dx" is slightly harder than you think.


The proof that rendering is not _that_ hard because the flutter team did it when they switched off skia (although technically they still use skia for text rendering, I'll admit that text rendering and layout is hard)

How is a fact that someone did something proof that it isn’t hard?

I mean, every cross-platform commercial DAW manages to do it? Bitwig, Renoise, Reaper, even VCV.

Every space company manages to shoot spacecrafts into space, does that mean it's easy? Obviously not :)

Cross-platform native GUIs are still hard, although maybe not rocket science, but there is a reason most individuals/companies don't go for that by default and reach for other solutions.


“Render yourself with GPU APIs” has all the same problems with a11y, compatibility, inconsistent behaviour that electron has - the only one it might fix is performance and plenty of apps have messed that one up too

They’re all iterating products really fast. This Codex is already different than the last Codex app. This is all disposable software until the landscape settles.

It's essentially asking application developers to wipe ass for OS developers like Microsoft. It's applaudible when you do it, understandable when you don't.

Even though OpenAI has a lot of cash to burn, they're not in a good position now and getting butchered by Anthropic and possibly Gemini later.

If any major player in this AI field has the power to do it's probably Google. But again, they've done the Flutter part, and the result is somewhat mixed.

At the end of the day, it's only HN people and a fraction of Redditors who care. Electron is tolerated by the silent majority. Nice native or local-first alternatives are often separate, niche value propositions when developers can squeeze themselves in over-saturated markets. There's a long way before the AI stuff loses novelty and becomes saturated.


"native" is used for different things, from "use the platform's default gui toolkit" to "compile to a machine code binary". the former is a bit of a mess, but the latter is strictly better than wrapping a web view and shipping an entire chrome fork to display and interpret it. just write something in qt and forget about native look and feel, and the performance gain will be enough to greatly improve the user experience.

Should just use javafx or swing. Take a leaf out of intellij which while it as it's own performance problems (although not from the fact of the ui framework) has a fantastic ui across Mac / windows / nix

Java swing is way underrated despite being very complex. It baffles me why this just sort of withered on the vine.

(I was a swing developer for several years)


The web sucked all the oxygen out of the room.

It really was Oracle’s fault – they neglected deployment for too long. Deploying Java applications was simply too painful, and neither JLink nor JPackage existed.

Non-native UI widgets, non-native runtime, non-native language

I can’t tell if this word salad is sarcasm or genuine.

From the suggestions it looks like sarcasm, but you never can tell these days

Qt with QML works fine. The real reason is that companies can't hire enough native developers because the skill is comparitively rare.

These companies have BILLIONS of dollars and some of the smartest people in the world and access to bleeding edge AI

There should be no excuses! Figure it out!


it'll be the least important thing to do

As I outlined in a sibling comment. You can still use React and your JS developers. Just don't ship a whole browser with your app.

May be an app that is as complex as Outlook needs the pixel-perfect tweaking of every little button that they need to ship their own browser for exact version match. But everything else can use *system native browser*. Use Tauri or Wails or many other solutions like these

That said, I do agree on the other comments about TUIs etc. Yes, nobody cares about the right abstractions, not even the companies that literally depend on automating these applications


microsoft also uses react native for the start menu and also bricked that during a recent upgrade apparently... along with breaking other stuff.

Do not give a shit about how they excuse doing a bad job. If their tools make them that much more productive, and being the developer of those tools should allow you to make great use of them.

Use native for osx Use .Net framework for windows Use whatever on Linux.

Its just being lazy and ineffective. I also do not care about whatever "business" justification anyone can come up with for half assing it.


Win32 is the platform to use on Microsoft Windows. Everything else is built on top of it. So it will (a) work (b) be there forever.

This. Even Linux is nasty. Qt and GTK are both horrible messes to use.

It would be nice if someone made a way to write desktop apps in JavaScript with a consistent, cross-platform modern UI (i.e. swipe to refresh, tabs, beautiful toggle switches, not microscopic check boxes) but without resorting to rendering everything inside a bloated WebKit browser.


Qt is not a horrible mess to use, the problem is just people don't bother to learn any tech stack outside web. It's so obvious that this is the issue to anybody who actually does native development.

That’s what React Native is. But JavaScript is the problem.

Can you explain why GTK is a mess?

Just jumping on the thread. I think the conversation is conflating two very different things:

1. Turing test UX's, where a chat app is the product and the feature (Electron is fine) 2. The class of things LLMs are good at that often do not need a UI, let alone a chat app, and need automation glue (Electron may cause friction)

Personally, I feel like we're jumping on capabilities and missing a much larger issue of permissioning and security.

In an API or MCP context, permissions may be scoped via tokens at the very least, but within an OS context, that boundary is not necessarily present. Once an agent can read and write files or executed commands as the logged in user, there's a level of trust and access that goes against most best practices.

This is probably a startup to be hatched, but it seems to me this space of getting agents to be scoped properly and stay in bounds, just like cursor has rules, would be a prereq before giving access to an OS at all.


It’s just irrelevant for most users. These companies are getting more adoption than they can handle, no matter how clunky their desktop apps are. They’re optimizing for experimentation. Not performance.

While this may be true for casual users, for dev native products like Codex, the desktop experience actually matters a lot. When you are living in the tool for hours, latency, keyboard handling, file system access, and OS-level integration stop being “nice to have” and start affecting real productivity. web or Electron apps are fine for experimentation, but they hit a ceiling fast for serious workflows -- especially if the icp is mostly technical users

VSCode is arguably one of the most if not the most popular code editor these days…

And they're pretty much the only example of an embedded browser architecture actually performing tolerably and integrating well with the native environment.

Still good enough for the majority of the users.

Fair, I think I'm certainly in the minority. Especially now more then ever with an increasing amount of non-technical people exploring vibe coding, 'good enough' really is good enough for most users

[flagged]


Well unfortunately, that’s just how I write. None of my posts are LLM-generated, so I'm sorry they come across that way.

Apologies.

It's not irrelevant for developers neither for users. Tiktok has shown that users deeply care about the experience and they'll flock en-masse to something that has a good experience.

The experience in the claude app is fine.

More adoption? I don't think so... It feels to me that these models && tools are getting more verbose/consuming more tokens to compensate for a decrease in usage. I know my usage of these tools has fallen off a cliff as it become glaringly obvious they're useful in very limited scopes.

I think most people start off overusing these tools, then they find the few small things that genuinely improve their workflows which tend to be isolated and small tasks.

Moltbot et al, to me, seems like a psyop by these companies to get token consumption back to levels that justify the investments they need. The clock is ticking, they need more money.

I'd put my money on token prices doubling to tripling over the next 12-24 months.


> I'd put my money on token prices doubling to tripling over the next 12-24 months.

Chinese open weights models make this completely infeasible.


What do weights have to do with how much it costs to run inference? Inference is heavily subsidized, the economics of it don't make any sense.

Anthropic and OpenAI could open source their models and it wouldn't make it any cheaper to run those models.. You still need $500k in GPUs and a boatload of electricity to serve like 3 concurrent sessions at a decent tok/ps.

There are no open source models, Chinese or otherwise that are going to be able to be run profitably and give you productivity gains comparable to a foundation model. No matter what, running LLMs is expensive and the capex required per tok/ps is only increasing, and the models are only getting more compute intensive.

The hardware market literally has to crash for this to make any sense from a profitability standpoint, and I don't see that happening, therefor prices have to go up. You can't just lose billions year after year forever. None of this makes sense to me. This is simple math but everyone is literally delusional atm.


Open weights means that the current prices for inference of Chinese models are indicative of their cost to run because.

https://openrouter.ai/moonshotai/kimi-k2.5

It's a fantasy to believe that every single one of these 8 providers is serving at incredibly subsidized dumping prices 50% below cost and once that runs out suddenly you'll pay double for 1M of tokens for this model. It's incredibly competitive with Sonnet 4.5 for coding at 20% of the token price.

I encourage you to become more familiar with the market and stop overextrapolating purely based on rumored OpenAI numbers.


I'm not making any guesses, I happen to know for a fact what it costs. Please go try to sell inference and compete on price. You actually have no clue what you're talking about. I knew when I sent that response I was going to get "but Kimi!"

The numbers you stated sound off ($500k capex + electricity per 3 concurrent requests?). Especially now that the frontier has moved to ultra sparse MoE architectures. I’ve also read a couple of commodity inference providers claiming that their unit economics are profitable.

You're delusional, I didn't even include the labor the install and run the damn thing. More than 500k

Okay, so you are claiming "every single one of those 8 providers, along with all others who don't serve openrouter but are at similar price points, are subsidizing by more than 50%".

That's an incredibly bold claim that would need quite a bit of evidence, and just waving "$500k in gpus" isn't it. Especially when individuals are reporting more than enough tps at native int4 with <$80k setups, without any of the scaling benefits that commercial inference providers have.


Imagine thinking that $80k setups to run Kimi and serve a single user session is evidence that inference providers are running at cost, or even close to it. Or that this fact is some sort of proof that token pricing will come down. All you one-shotted llm dependents said the same thing about Deepseek.

I know you need to cope because your competency is 1:1 correlated to the quality and quantity of tokens you can afford, so have fun with your Think for me SaaS while you can afford it. You have no clue the amount of engineering that goes into provide inference at scale. I wasn't even including the cost of labor.


It really is insane how far it's gone. All of the subsidization and free usage is deeply anticompetitive, and it is only a profitable decision if they can recoup all the losses. It's either a bubble and everything will crash, or within a few years once the supplier market settles, they will eventually start engaging in cartel-like behavior and ratchet up the price level to turn on the profits.

I suspect making the models more verbose is also a source of inflation. You’d expect an advanced model to nail down the problem succinctly, rather than spawning a swarm of agents that brute force something resembling an answer. Biggest scam ever.

- Video games often use HTML/JS-based UI these days.

- UE5 has its own custom UI framework, which definitely does not feel "native" on any platform. Not really any better than Electron.

- You can easily call native APIs from Electron.

I agree that Electron apps that feel "web-y" or hog resources unnecessarily are distasteful, but most people don't know or care whether the apps they're running use native UI frameworks, and being able to reassign web developers to work on desktop apps is a significant selling point that will keep companies coming back to Electron instead of native.


I have been building Desktop apps with Go + Wails[1]. I happen to know Go, but if you are ai-coding even that is not necessary.

A full fledged app, that does everything I want, is ~ 10MB. I know Tauri+Rust can get it to probably 1 MB. But it is a far cry from these Electron based apps shipping 140MB+ . My app at 10MB does a lot more, has tons of screens.

Yes, it can be vibe coded and it is especially not an excuse these days.

[1] https://wails.io/

Microsoft Teams, Outlook, Slack, Spotify? Cursor? VsCode? I have like 10 copies of Chrome in my machine!


I've looked into Tauri and Wails, but they don't seem realistic for a cross-platform app with wide distribution across multiple platforms and platform versions.

One of Electron's main selling points is that you control the browser version. Anything that relies on the system web view (like Tauri and Wails) will either force you to aggressively drop support for out-of-date OS versions, or constantly check caniuse.com and ship polyfills like you're writing a normal web app. It also forces you to test CSS that touches form controls or window chrome on every supported major version of every browser, which is just a huge pain. And you'll inevitably run into bugs with the native -> web glue that you wouldn't hit with Electron.

It is absolutely wasteful to ship a copy of Chrome with every desktop app, but Tauri/Wails don't seem like viable alternatives at the moment. As far as I can tell, there aren't really any popular commercial apps using them, so I imagine others have come to the same conclusion.


If the web-interface of a website can serve same HTML to all browsers, these UI can as well. I don't think we have IE6 level incompatibility these days. I have no idea what specific incompatibility you are talking about. I am writing my 4th desktop app since early 2025.

But sure, you could have some specific need, but I find it hard to believe for these simple apps.


Yes, but most people don’t do that. Companies are optimizing to ship features fast, not trying to min/max resource usage when majority don’t care.

This is a new era where “if it works more or less well, ux/dx is fine, let’s ship it” has more moat than ever. Everything else is really secondary.


I am in love with wails. having python and JS background with no go experience. I pm'ed Ai agents to create a fairly complex desktop app for my own use. it is day and night in terms of performance compared to lightest electron app.

Wails is pretty good. I wrote a couple of apps but since I'm on macOS I ended up rewriting them in SwiftUI and that's much lighter of course since it uses all native APIs.

Interesting. Does the Wails renderer support the full set of what Webkit/Chromium supports?

Wow Wails looks interesting! Hadn't heard of it before.

Given that OpenAI managed to single-handedly triple the price of RAM globally, people will very much care about their chat application consuming what little scraps are left over for them, even if they don't know enough about anything to know why their system is running poorly.

It's baffling to me that people still throw around the word "native" like it means anything. Go use VSCode or Obsidian, then go use Apple Music. Electron can be so much better than anything native. The problem isn't that macos ChatGPT, Codex, or Claude isn't native. Their apps just really suck. They're poorly engineered and bad.

Apple Music isn't native either.

Oh right, I forgot, "native" just means "good". So if an app is bad, it can't be native, and if an electron app is actually good its because they're doing crazy optimizations that aren't feasible for mortal souls so don't even think about it. This is the "Hackers News Law of Application Nativeness".

My main take is exactly the opposite. Why not build everything with a simple text interface (shell command) so the models learn to use these tools natively in pretraining. Even TUI like codex-cli or claude code are needless abstractions for such use cases and make full automation hard. You could add as many observability or input layers for humans as you want but the core should be simple calls that are saved in historical and documentation logs. [the headless/noninteractive modes come close, as do the session logs]

TUI is easy to train on, but hard to use for users. Part of the reason it’s easier to have LLMs use a bunch of Unix tools for us is that their text interface is tedious and hard to remember. If you’re a top 5% expert in those tools it doesn’t matter as much I guess but most people aren’t.

Even a full-featured TUI like Claude Code is highly limited compared to a visual UI. Conversation branching, selectively applying edits, flipping between files, all are things visual UI does fine that are extremely tedious in TUI.

Overall it comes down to the fact that people have to use TUI and that’s more important than it being easy to train, and there’s a reason we use websites and not terminals for rich applications these days.


I use headless mode (-p) and keep my named shell histories as journals (so no curses/TUI or GUI). But session management or branching could improve at the tool level and allow seamless integration with completion tools, which could be a simple fast AI looking at recent sessions or even be visual, say for navigating and extending a particular complex branch of a past session. It is not too hard to work with such shell-based flows within Emacs for me, but it would be nice if there was some standardization effort on the tool fronts and some additional care for future automation. I dont want my AI clicking buttons if it can be precise instead. And I certainly want multithreading. I think of AI more as an OS; it needs a shell more than it needs windows at this point in time.

all the examples for visual UI, are tasks which already are (or soon be) done by the agent, not human. hence not needed.

I suspect that final(*) UI is much more similar to TUI: being kind of conversational (human <> AI). Current GUIs provided by your bank/etc are much less effective/useful for us, comparing to conversation way: 'show/do me sth which I just need'. Not to mention (lack of) walled garden effect, and attention grabbing not in the user interest (popups, self-promo, nagging). Also if taking into account age factor. Also that we do not have to learn, yet another GUI (teach a new bank to your mom ;). So at least 4 distinct and important advantages for TUI.

My bet: TUI/conversation win (*).

*) there will be some UI where graphical information density is important (air controller?) especially in time critical environments. yet even there I suspect it's more like conversation with dynamic image/report/graph generated on the go. Not the UI per se.


It would be cool if I didn't have to worry about whether I was "in" or "out" of the AI TUI. Right now, I need at least two terminals open: One running my shell, that I use to issue shell commands, and one running Claude, where I can't. It would be nice if it could just be my shell, and when I wanted to invoke claude, I'd just type:

   c Do this programming task for me.
Right in the shell.

Most AI agents have a 'bash mode' and, you can use Warp terminal which is terminal first, but easy to activate the AI from the terminal. For example, if you mangle a jq command, it will use AI to suggest the right way to do it.

isn't that what Simon Willison's `llm` does?

edit: [link](https://github.com/simonw/llm)



Oh wow, nice. Does it remember context from run to run?

I agree. I like using Claude or Codex in VM on top of the tmux. Much more flexibility that way. I open a new tmux window for each issue/task big enough to warrant it, issue a prompt to create a worktree and agents and let them go to town. I actually use claude and codex a the same time. I still get observability because of tmux and I can close my laptop and let them cook for a while in yolo mode since the VM is frequently backed up in proxmox pbs. I am a retired hobbyist but this has been a nice force multiplier without devolving a complete viby mess. I hope these new orchestration tool support this like vs code remote development does. Same for cloud. I want them to support my personal "cloud" instead of laggy github mess.

> even with the help of AI.

This is what you get when you build with AI, an electron app with an input field.


Doesn't have to be. I just revived one of my C++ GLFW app from 2009. Codex was able to help me get it running again and added some nice new features.

I guess you get an Electron app if you don't prompt it otherwise. Probably because it's learned from what all the humans are putting out there these days.

That said.. unless you know better, it's going to keep happening. Even moreso when folks aren't learning the fundamentals anymore.


I've done way, way more than that, as I'm sure others have too.

This is just bad product management.


So where is all this amazing software that you and others built with AI?

All I see is hype blog posts and pre-IPO marketing by AI companies, not much being shipped though.


You won't see it because it's mostly personal software for personal computers.

I've got a medical doctor handwriting decipherer, a board game simulator that takes a PDF of the rulebooks as input and an accounting/budgeting software that can interface with my bank via email because my bank doesn't have an API.

None of that is of any use to you. If you happen to need a similar software, it will be easier for you to ask your own AI to make a custom one for you rather than adapt the ones I had my AI make for me.

Under the circumstances, I would feel bad shipping anything. My users would be legitimately better off just vibe coding their own versions.


I disagree. There is a tier of people who can't vibe code what you've vibe coded, but also might not trust your app (especially the bank one). There is still a real gap here to be filled by professional work or fakers.

Professionals are doing what I am doing, only inside companies. They make custom software that solves ultra-specific problems of that one company.

I don't quite understand the obsession with shipping fancy enterprise b2b saas solutions. That was the correct paradigm for back when developing custom code was expensive. Now it is cheap.

Why pay for Salesforce when you only use 1% of Salesforce's features? Just vibe code the 1% of features that you actually need, plus some custom parts to handle some cursed bespoke business logic that would be a pain in the ass to do in Salesforce anyway.


Let me know when Stacy from HR vibe codes her own salesforce alternative, sounds very cool!

quality takes time. I'm still in stealth

There is a guy on twitter documenting his progress with moltbot/openclaw: https://x.com/Austen/status/2018371289468072219. Apparently he has already registered his bot for an LLC so he can make money w/ it.

Some guy on Twitter selling an AI coding boot camp is an interesting example. Also it's literally just a post of him looking for a real developer to review his vibe coded bot???

What does the bootcamp have to do w/ anything? He is using AI slop to make money, that's all that matters in a socio-economic system wherein everyone & everything must make profits to persist. Edit: found another example from coinbase: https://x.com/0xEricBrown/status/2018082458143699035.

Edit: I'm not going to keep addressing your comment if you keep editing it. You asked for an example & I found two very easily. I am certain there are many others so at this point the onus is on you to figure out what exactly it is you are actually arguing.


Your first example is just a twitter post of a guy asking for a developer to review his vibe coded bot. Nothing shipped.

The second example is twitter post of a crypto bro asking people to build something using his crypto API. Nothing shipped.

Literally nothing shipped, just twitter posts of people selling a coding bootcamp and crypto.


You should make a note & revisit this in a few days to figure out whether your assessment was correct or not. I've already wasted enough time here so good luck.

Their goal is to ship as fast as possible b/c they don't care about what you care about. Their objective is to gather as much data as possible & electron is good enough for that.

I work at OpenAI, and I get the concern. From our side, this was a careful tradeoff: Electron lets us iterate faster and makes it possible to bring the Codex app to Windows and Linux very soon. That doesn’t mean performance or UX don’t matter—we’re actively paying attention to both.

Would genuinely love your thoughts if you try it. Early users have been surprised by how native it feels!


the problem is that getting this out in this shape the week after Cursor made $100M ARR would have made sense

getting it out now suggests there are structural problems about how decisions get made and code gets shipped—and the "iterate faster" line feels misplaced


Your AI can’t make a UI for 3 platforms? Seems pretty worthless.

I use Google's antigravity so I personally have no problem w/ electron applications. At the end of the day UI performance is not a bottleneck for me.

Aaaaand that is why we, as end users get machines which are sluggish, because literally every. Single. Application is taking this attitude.

Shock horror, the waste adds up, and it adds up extremely quickly.


I don't really care about memory or how much of it is taken up by the editor. I have enough memory for the work that I do & UI performance would make no difference to my workflow.

Claude code is perfectly capable of writing low level rendering and input code and it can be equally as mindless as vibe coding web apps.

E.g. just say "write a c++ gui widget library using dx11 and win32 and copy flutters layout philosophy, use harfbuzz for shaping, etc etc"


At the end of the day LLMs just reproduce the consensus of the internet, so it makes sense that a coding agent would spit out software that looks like most of what's on the internet.

LLM output is called slop for a reason.


It's amazing to me that in the official Gemini app on Android, the hamburger menu on the left does not open on swipe - only if you click the hamburger button. A basic native UI interaction in the home screen of one of Google's flagship apps.

Is there any actual problem you have with the application? If not, who cares? Is this the 10 year old Electron is slow and bad meme being trotted out again?

I used wxWidgets to build a native GUI for Windows + Mac 10+ years ago and implemented all GUI-drawing (it was an audio signal processor control software so included meters, faders, knobs, audio spectrum and I even incorporated Horde3D OpenGL interface for visualising an arena [sadly never fully finished to full potential as my modelling abilities in Blender simply wasn't good enough]). I wrote that, and another guy wrote the network library in C that sent signals to the network devices, and received them. I responded to the incoming network info to draw appropriate parts of the UI like meters/scopes at 50ms minimum.

The fact that we did this as a 1-man team for the GUI and that I can still compile it today (if I had the code) against wxWidgets, to then run on macOS and Windows simply shows the lazy nature of (most/all?) desktop apps by big companies these days.

I utterly detest using them, but it seems customers think an app that takes 5 seconds to launch with a spinning whirly wheel and horizontal gradient animation over list views for 5+ seconds before content is loaded is perfectly acceptable. Quality with a capital K!


Desktop GUI is a lost art. Gen X were the last people to master it.

I am a gen X, this hurts, but it might be true :(

Anthropic bought Bun and it's creator relentlessly optimizes Claude Code now.

So, there seems to be light.


What features are they missing that a native app would allow for?

No-one outside of a small sliver of the tech community cares if an app is built with web tech

Electron also opens up easier porting to Linux which almost certainly wouldn't happen if companies insist on native only


Users care about performance and jank, it’s just that they’ve been successfully forced to shut-up-and-deal-with-it. They’re not involved in purchasing or feedback, and the people that are don’t use it enough to care, or just don’t care. Users who complain about it may as well shout into the void for how much companies take note, but hey, at least we got an ai button now!

Atlassian products are a great example of this. Everyone knows Atlassian has garbage performance. Everyone complains about it. Never gets fixed though. Everyone I know could write customer complaints about its performance in every feedback box for a year, and the only thing that would happen is that we’d have wasted our own time.

Users _care_ about this stuff. They just aren’t empowered to feedback about it, or are trained to just sigh and put up with it.


I find outside of specific use cases the performance and jank are down to the developers and not whether it's native or not

Obsidian is an Electron app which is pretty much universally loved. We can both give single examples


i think you've to be more nuanced here - perf becomes important only on the extreme. i think there are compromises to be made between perf and go-to-market.

“They just aren’t empowered to feedback about it, or are trained to just sigh and put up with it” is a roundabout way of saying users don’t care about it enough.

Software decisions are often not made by who will use said software.

It is not that easy to build such app from scratch ... it all requires a lot of work, even with AI help. I think the most important is to provide easy to use UI first, and if speed or some missing features will be blockers for further innovation step then maybe native app will be at some point created.

Why should they? It's not like these apps will be around for long

The brief video OpenAI did sets the stage as I see it - for an evolving kind of engineer. Who will think and design as much as know code.

AI has more training on web apps I think the real answer is that these guys are told they need to ship in 2 months and have a huge team of web devs

Devex tooling is so much better in web, you can ship much faster, and speed to market matters more than making a native app. Apple dev tooling and build speed sucks a ton in comparison, and don't get started with windows

This is the reason. When an app changes so often, across different platforms, you don't want to support people lagging behind on old versions.

I'm not saying native is better or worse, but this will be why.


Why would widgets and buttons be better than a console, and or voice?

because not everything can be describe in code, language, or speech. if you're iterating on anything that requires refinement in terms of perception, you may need real time feedback.

Because you see stuff before you decide what to invoke?

I feel like it's because of the way they are internally structured. They have some people who are machine learning trained on post-training, and other people who have nice product management resumes. The post-training people want more compute, well-formatted data, honestly more ways to try whatever technique of reinforcement learning they want.

The product people are building things, but OpenAI has literally been throwing stuff at the wall and it hasn't been sticking. They seem to be behind in terms of everything user interface. Canvas came after Anthropic had artifacts. Codex came after Anthropic had Claude Code.

Some of their researchers (okay one) have (has) stated they believe in interface work. That's because GUIs help engage the person beyond thought, and help the person work with more complexity (perception, physics, form, 3D). But they're playing catchup, or they're trying to incubate wins in science / math.


> This requires calling native APIs (e.g., Win32), which is not feasible from Electron.

That is not correct. One of the major selling points of Electron is precisely that you can call native APIs.


Despite being an Electron app, it doesn't work on Intel macs...

This is what happens when an entire generation of programmers doesn’t know how to write code in any language other than JavaScript or maybe Python.

How far from grace we have fallen :sob:


It’s less that and more - we’re still figuring out the best interface to use them.

For coding, interacting with the agent is best done via chat, especially if you’re trying to run teams of agents, then you’re not going to be looking at code all the time. The agent will summarize the changes, you will click on the diffs, approve them and move on. So it’s a very different experience from being the only one coding.

Edit:

Here’s a hot take -

A quick note on SwiftUI, it’s a piece of garbage that’s so hard to use that native devs despise it. So far no AI has been able to one shot it for me.

Blender most likely uses immediate mode - which is more resource intensive and less efficient than a stateful object oriented interface. But Zed uses a similar approach with (I think) success.

Then think about this, pre-AI, Google with all its billions, used web interfaces in all its desktop product GUIs :)

Apple, with all of its billions, created XCode which is inferior to every other IDE I have ever used. They still haven’t learned from Visual Studio. Microsoft is bad at a lot of things but developer tooling isn’t one of them.

All that to say, even if you knew what you wanted, taking that vision to reality is a difficult challenge. At least with AI, they could create a 100 different prototypes cheaply and pick the best direction based on that. That they should do, and they probably aren’t.


Stating Unity, Unreal Engine and Blender as good examples of UI is a joke in itself, can't take comment seriously after that.

CLI > all Don't be a application peasant

What if they kept the 'good stuff' from us? Does that seem likely here?

LLMs are trained on web slop, not engineering

Unpopular opinion: why would you want a "native app"? On macOS, basically every app Apple makes themselves is worse in terms of design, usability, and performance than a popular Electron-based alternative.

For example, I tried opening a 200MB log file in Apple's Console.app and it hung. Opened right up in VS Code.


What is the superior Electron based alternative to Pixelmator Pro? Final Cut? Logic? Pages?

People's mileage may vary, but in my instance, this was so bad that I actually got angry while trying to use it.

It's slow and stupid. It does not do proper research. It does not follow instructions. It randomly decides to stop being agentic, and instead just dumps the code for me to paste. It has the extremely annoying habit of just doing stuff without understanding what I meant, making a mess, then claiming everything is fine. The outdated training data is extremely annoying when working with Nuxt 4+. It is not creative at solving problems. It dosent show the thinking. The Undo code does not give proper feedback on the diff and if it actually did "undo." And I hate the personality. It HAS to be better than it comes off for me because I am actually in a bad mood after having worked with it. I would rather YOLO code with Gemini 3 flash, since it's actually smarter in my assessment, and at least I can iterate faster, and it feels like it has better common sense.

Just as an example, I found an old, terrible app I made years ago for our firm that handles room reservations. I told it to update from Bootstrap to Flowbite UI. Codex just took forever to make a mess, installed version 2.7 when 4.0.1 is the latest, even when I explicitly stated that it should use the absolute latest version. Then it tried to install it and failed, so it reverted to the outdated CDN.

I gave the same task to Claude Code. Same prompt. It one-shotted it quickly. Then I asked it to swap out ALL the fetch logic to have SPA-like functionality with the new beta 4 version of HTMX, and it one-shot that too in the time Codex spent just trying to read a few files in the project.

This reminds me of the feeling I had when I got the Nokia N800. It was so promising on paper, but the product was so bad and terrible to use that I knew Nokia was done for. If this was their take on what an acceptable smartphone could be, it proves that the whole foundation is doomed. If this is OpenAI's take on what an agentic coding assistant should be—something that can run by itself and iterate until it completes its task in an intelligent and creative way.... OpenAI is doomed.


If you're using 5.2 high, with all due respect, this has to be a skill issue. If you're using 5.2 Codex high — use 5.2 high. gpt-5.2 is slow, yes (ok, keeping it real, it's excruciatingly slow). But it's not the moronic caricature you're saying it is.

If you need it to be up to date with your version of a framework, then ask it to use the context7 mcp server. Expecting training data to be up to date is unreasonable for any LLM and we now have useful solutions to the training data issue.

If you need it to specify the latest version, don't say "latest". That word would be interpreted differently by humans as well.

Claude is well known at its one-shotting skills. But that's at the expense of strict instruction following adherence and thinner context (it doesn't spend as much time to gather context in larger codebases).


I am using GPT-5.2 Codex with reasoning set to high via OpenCode and Codex and when I ask it to fix an E2E test it tells me that it fixed it and prints a command I can run to test the changes, instead of checking whether it fixed the test and looping until it did. This is just one example of how lazy/stupid the model is. It _is_ a skill issue, on the model's part.

Non codex gpt 5.2 is much better than codex gpt 5.2 for me. It does everything better.

Yup, I find it very counter-intuitive that this would be the case, but I switched today and I can already see a massive difference.

Codex runs in a stupidly tight sandbox and because of that it refuses to run anything.

But using the same model through pi, for example, it's super smart because pi just doesn't have ANY safeguards :D


I'll take this as my sign to give Pi a shot then :D Edit: I don't want to speak too son, but this Pi thing is really growing on me so far… Thank you!

i refuse to defend the 5.2-codex models. They are awful.

Perhaps if he was able to get Claude Code to do what he wanted in less time, and with a better experience, then maybe that's not a skill he (or the rest of us) want to develop.

Talking LLMs off a ledge is a skill we will all need going forward.

Sure, that's fine. I wrote my comment for the people who don't get angry at an AI agents after using them for the first time within five hours of their release. For those who aren't interested in portending doom for OpenAI. (I have elaborate setups for Codex/Claude btw, there's no fanboying in this space.)

Some things aren't common sense yet so I'm trying my part to make them so.


Feelings are information with just as much, or more, value as biased intellectualizing.

Ask Linus Torvalds.


TBH, "use a package manager, don't specify versions manually unless necessary, don't edit package files manually" is an instructions that most agents still need to be given explicitly. They love manually editing package.json / cargo.toml / pyproject.toml / what have you, and using whatever version is given in their training data. They still don't have an intuition for which files should be manually written and which files should be generated by a command.

Agree, especially if they're not given access to the web, or if they're not strongly prompted to use the web to gather context. It's tough to judge models and harnesses by pure feel until you understand their proclivities.

Ty for the tip on context7 mcp btw

Ok. You do you. I'll stick with the models that understand what latest version of a framework means.

How would a person interpret the latest version of flowbite?

Agreed, had the same experience. Codex feels lazy - I have to explicitly tell it to research existing code before it stops giving hand-wavy answers. Doc lookup is particularly bad; I even gave it access to a Context7 MCP server for documentation and it barely made a difference. The personality also feels off-putting, even after tweaking the experimental flag settings to make it friendlier.

For people suggesting it’s a skill issue: I’ve been using Claude Code for the past 6 months and I genuinely want to make Codex work - it was highly recommended by peers and friends. I’ve tried different model settings, explicitly instructed it to plan first and only execute after my approval, tested it on both Python and TypeScript backend codebases. Results are consistently underwhelming compared to Claude Code.

Claude Code just works for me out of the box. My default workflow is plan mode - a few iterations to nail the approach, then Claude one-shots the implementation after I approve. Haven’t been able to replicate anything close to that with Codex


Curious, are you doing the same planning with Codex out-of-band or otherwise? In order to have the same measurable outcome you'd need to perhaps use Codex in a plan state (there's experimental settings - not recommended) or other means (explicit detailed -reusable- prompt for planning a change). It's a missing feature if your preference is planning in CLI (I do not prefer this).

You are correct in that this mode isn't "out of the box" as it is with Claude (but I don't use it in Claude either).

My preference is to have smart models generate a plan with provided source. I wrote (with AI) a simple python tool that'll filter a codebase and let me select all files or just a subset. I then attach that as context and have a smart model with large context (usually Opus, GPT-5.2, and Gemini 3 Pro in parallel), give me their version of a plan. I then take the best parts of each plan, slap it into a single markdown and have Codex execute in a phased manner. I usually specify that the plan should be phased.

I prefer out-of-CLI planning because frankly it doesn't matter how good Codex or Claude Code dive in, they always miss something unless they read every single file and config. And if they do that, they tip over. Doing it out of band with specialized tools, I can ensure they give me a high quality plan that aligns with the code and expectations, in a single shot (much faster).

Then Claude/Codex/Gemini implement the phased plan - either all at once - or stepwise with me testing the app at each stage.

But yeah, it's not a skill issue on your part if you're used to Plan -> Implement within Claude Code. The Experimental /collab feature does this but it's not supported and more experimental than even the experimental settings.


I just want Anthropic to spend like two weeks making their own "Codex app", but with Opus.

I'm not taking OpenAI's side here but have you reviewed what claude did?

I only use claude through the chat ui because it’s faster and it gives me more control. I read most of it and the code is almost always better than what I would do, simply because lazy ass me likes to take shortcuts way too often.

Genuinely excited to try this out. I've started using Codex much more heavily in the past two months and honestly, it's been shockingly good. Not perfect mind you, but it keeps impressing me with what it's able to "get". It often gets stuff wrong, and at times runs with faulty assumptions, but overall it's no worse than having average L3-L4 engs at your disposal.

That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...

Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.

Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.


I have the $20 a month subscription for ChatGPT and the $200/year subscription to Claude (company reimbursed).

I have yet to hit usage limits with Codex. I continuously reach it with Claude. I use them both the same way - hands on the wheel and very interactive, small changes and tell them both to update a file to keep up with what’s done and what to do as I test.

Codex gets caught in a loop more often trying to fix an issue. I tell it to summarize the issue, what it’s tried and then I throw Claude at it.

Claude can usually fix it. Once it is fixed, I tell Claude to note in the same file and then go back to Codex


The trick to reach the usage limit is to run many agents in parallel. Not that it’s an explicit goal of mine but I keep thinking of this blog post [0] and then try to get Codex to do as much for me as possible in parallel

[0]: http://theoryofconstraints.blogspot.com/2007/06/toc-stories-...


Telling a bunch of agents to do stuff is like treating it as a senior developer who you trust to take an ambiguous business requirement and letting them use their best judgment and them asking you if they have a question .

But doing that with AI feels like hiring an outsourcing firm for a project and they come back with an unmaintable mess that’s hard to reason through 5 weeks later.

I very much micro manage my AI agents and test and validate its output. I treat it like a mid level ticket taker code monkey.


My experience with good outsourcing firms is that they come back with heavily-documented solutions that are 95% of what you actually wanted, leaving you uncomfortably wondering if doing it yourself woulda been better.

I’m not fully sure what’s worse, something close to garbage with a short shelf life anyone can see, or something so close to usable that it can fully bite me in the ass…


I fully believe that if I didn’t review its output and ask it to clean it up it would become unmaintainable real quick. The trick I’ve found though is to be detailed enough in the design from both a technical and non-technical level, sometimes iterating a few time on it with the agent before telling it to go for it (which can easily take 30 minutes)

That’s how I used to deal with L4, except codex codes much faster (but sometimes in the wrong direction)


It’s funny over the years I went from

1. I like being hands on keyboard and picking up a slice of work I can do by myself with a clean interface that others can use - a ticket taking code monkey.

2. I like being a team lead /architect where my vision can be larger than what I can do in 40 hours a week even if I hate the communication and coordination overhead of dealing with two or three other people

3. I love being able to do large projects by myself including dealing with the customer where the AI can do the grunt work I use to have to depend on ticket taking code monkeys to do.

Moral of the story: if you are a ticket taking “I codez real gud” developer - you are going to be screwed no matter how many b trees you can reverse on the whiteboard


Moral of your story.

Each and everyone of us is able to write their own story, and come up with their own 'Moral'.

Settling for less (if AI is a productivity booster, which is debatable) doesn't equal being screwed. There is wisdom in reaching your 'enough' point.


If you look at the current hiring trends and how much longer it is taking developers to get jobs these days, a mid level ticket taker is definitely screwed between a flooded market, layoffs and AI.

By definition, this is the worse AI coding will ever be and it’s pretty good now.


> By definition, this is the worse AI coding will ever be

This may be true, but it's not necessarily true, and certainly not by definition. For example, formal verification by deductive methods has improved over the past four decades, and yet, by the most important measures, it's got worse. That's because the size of software it can be used to verify affordably has grown, but significantly slower than the growth in the size of the average software project. I.e. it can be used on a smaller portion of software than it could be used on decades ago.

Perhaps ironically, some people believe that the solution to this problem is AI coding agents that will write correctness proofs, but that is based on the hope that their fate will be different, i.e. that their improvement will outpace the growth in software size.

Indeed, it's possible that AI coding will make some kinds of software so cheap that their value will drop to close to zero, and the primary software creation activity by professionals will shift precisely to those programs that agents can't (yet) write.


I am really not convinced yet.

From all the data I have seen, the software industry is poised for a lot more growth in the foreseeable future.

I wonder if we are experiencing a local minima, on a longer upward trend.

Those that do find a job in a few days aren't online to write about it, so based on what is online we are lead to believe that it's all doom and gloom.

We also come out of a silly growth period where anyone who could sort a list and build a button in React would get hired.

My point is not that AI-coding is to be avoided at all costs, it's more about taming the fear-mongering of "you must use AI or will fall behind". I believe it's unfounded - use it as much or as little as you feel the need to.

P.S.: I do think that for juniors it's currently harder and require intentional efforts to land that first job - but that is the case in many other industries. It's not impossible, but it won't come on a silver plate like it did 5-7 years ago.


I mean it is online that major tech companies are have laid off a couple of hundred thousand people. What companies are going to absorb all of these people?

Anyone who hires can tell you one open req gets hundreds of applicants within 24 hours. LinkedIn easy apply backs that up.

I have two anecdotes from both sides. I applied for 200 jobs for a bog standard “C#/Python/Typescript” enterprise developer who had AWS experience. I heard crickets and every application had hundreds of applicants - LinkedIn shows you.

Did I mention according to my resume (I only went back 10 years) I had 10 years of experience as a developer including 2.5 leading AWS architecture at a startup and 3.5 actually working at AWS (ProServe)?

I had 8 jobs since 1996 and I’ve always been able to throw my resume up in the air and by the time it landed I would have three offers. LinkedIn showed that my application had hardly been viewed and my resume only downloaded twice.

Well everything I said above is true. But it was really just an experiment while I was waiting for my plan A outreach to work - targeting companies in a niche in AWS where at the time I could reasonably one of the industry experts with major open source contributions to a popular official “AWS Solution” and leaning on my network of directors, CTOs etc that I had established over the years.

None of them were looking for “human LLM code monkeys” that are a dime a dozen.

On the other hand, I’m in the hiring loop at my company. Last year we had over 6000 applicants and a 4% offer rate.

Who is going to absorb or need a bunch of mid level ticket takers in the future with AI improving? Or at least enough to absorb all of the ones who are currently being laid off and the ones coming in?


I will say that doing small modifications or asking a bunch of stuff fills the context the same in my observations. It depends on your codebase and the rest of stuff you use (sub agents, skills, etc)

I was once minimising the changes and trying to take the max of it. I did an uncountable numbers of tests and and variations. Didn't really matter much if I told it to do it all or change one line. I feel Claude code tries to fill the context as fast as possible anyway

I am not sure how worth Claude is right now. I still prefer that rather than codex, but I am starting to feel that's just a bias


I don’t think it’s bias: I have no love for any of these tools, but in every evaluation we’ve done at work, Opus 4.5 continually comes out ahead in real world performance

Codex and Gemini are both good, but slower and less “smart” when it comes to our code base


I hit the Claude limit within an hour.

Most of my tokens are used arguing with the hallucinations.

I’ve given up on it.


Do you use Claude Code, or do you use the models from some other tool?

I find it quite hard to hit the limits with Claude Code, but I have several colleagues complaining a lot about hitting limits and they use Cursor. Recently they also seem to be dealing with poor results (context rot?) a lot, which I haven't really encountered yet.

I wonder if Claude Code is doing something smart/special


In my case I've had it (Opus Thinking in CC) hit 80% of the 5-hour limit and 100% of the context window with one single tricky prompt, only to end up with worthless output.

Codex at least 'knows' to give up in half the time and 1/10th of the limits when that happens.


I don't want to be That Guy, but if you're "arguing with hallucinations" with an AI Agent in 2026 you're either holding it wrong or you're working on something highly nonstandard.

I have a found Codex to be an exceptional code-reviewer of Claude's work.

Your goal should be to run agents all the time, all in parallel. If you’re not hitting limits, you’re massively underutilizing the VC intelligence subsidy

https://hyperengineering.bottlenecklabs.com/p/the-infinite-m...


Hey thank you for calling out the broken link. That should be fixed now. Will make sure to track down the other broken links. We'll track down why loading is taking a while for you. Should definitely be snappier.

Is this the only announcement for Apple platform devs?

I thought Codex team tweeted about something coming for Xcode users - but maybe it just meant devs who are Apple users, not devs working on Apple platform apps...


Same here. From my experience, codex usually knocks backend/highly "logical?" tasks out of the park while fairly basic front-end/UI tasks it stumbles over at times.

But overall it does seem to be consistently improving. Looking to see how this makes it easier to work with.


Backend, regardless of language or framework are often set in stone. There's a well defined/most used way for everything. Especially since most apps when reduced is CRUD. Frontend by the nature of how frontend works, can be completely different from project to project if one wants to architect it efficiently.

Cool, looks like I'll stay on Cursor. All alternatives come out buggy, they care a lot about developer experience.

BTW OpenAI should think a bit about polishing their main apps instead of trying to come out with new ones while the originals are still buggy.


(I work on Codex) One detail you might appreciate is that we built the app with a ton of code sharing with the CLI (as core agent harness) and the VSCode extension (UI layer), so that as we improve any of those, we polish them all.

Any chance you'll enable remote development on a self-hosted machine with this app?

Ie. I think the codex webapp on a self-hosted machine would be great. This is impotant when you need a beefier machine (with potentially a GPU).


Not going to solve your exact problem but I started this project with this approach in mind

https://github.com/jgbrwn/vibebin


This should be table stakes by now. That's the beauty of these cli tools and how they scale so well.

What are the benefits of using the codex webapp?

Working remotely with the app would truly be great

Interested in this as well.

Any reason to switch from vscode with codex to this app? To me it looks like this app is more for non-developers but maybe I’m missing something

Good question! VS Code is still a great place for deep, hands-on coding with the Codex IDE extension.

We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!


I already have multiple projects that I manage in full-screen via vscode. I just move from one to the other using “cmd” + “->” . You should be aware that the Claude Code extension for vscode is way better than codex extension so perhaps you should work a bit on that as well. Even if the agents do 80% of work I still need to check what they do and a familiar IDE seems the first choice of existing/old school developer

ok , 'projects' but this would make a lot more sense if we could connect remotely to the projects which works without a problem using the IDE plugin, so right now I don't see any advantage of using this

Awesome. Any chance we will see a phone app?

I know coding on a phone sounds stupid, but with an agent it’s mostly approvals and small comments.


The ChatGPT app on iOS has a Codex page, though it only seems to be for the "cloud" version.

Looks like another Claude App/Cowork-type competitor with slightly different tradeoffs (Cowork just calls Claude Code in a VM, this just calls Codex CLI with OS sandboxing).

Here's the Codex tech stack in case anyone was interested like me.

Framework: Electron 40.0.0

Frontend:

- React 19.2.0

- Jotai (state management)

- TanStack React Form

- Vite (bundler)

- TypeScript

Backend/Main Process:

- Node.js

- better-sqlite3 (local database)

- node-pty (terminal emulation)

- Zod (validation)

- Immer (immutable state)

Build & Dev:

- pnpm (package manager)

- Electron Forge

- Vitest (testing)

- ESLint + Prettier

Native/macOS:

- Sparkle (auto-updates)

- Squirrel (installer)

- electron-liquid-glass (macOS vibrancy effects)

- Sentry (error tracking)


They have the same stack of a boot camper, quite telling.

The use of the name Codex and the focus on diffs and worktrees suggests this is still more dev-focused than Cowork.

It's a smart move – while Codex has the same aspirations, limiting it to savvy power users will likely lead to better feedback, and less catastrophic misuse.

> this just calls Codex CLI with OS sandboxing

The git and terminal views are a big plus for me. I usually have those open and active in addition to my codex CLI sessions.

Excited to try skills, too.


Is the integration with Sentry native or via MCP ?

What does Sentry via MCP even mean? You want the LLM to call Sentry itself whenever it encounters an error?

Meaning sentry exposes an MCP layer with a tool call layer and tool registry. In this case, the layer is provided by Sentry. Native would mean if calling specific Sentry APIs is provided as a specific integration path depending on the context. Atleast thats how I categorize.

I'm so confused. Sentry is a native client crash reporting tool. What does this have to do with MCP or the LLM itself? Do you mean when interpreting the crash data?

Sentry provides a MCP server where your LLM can call the Sentry MCP and answer questions like number of crashes in the last X days etc.

The LLM gets the data from Sentry using Sentry MCP.


It's basically what Emdash (https://www.emdash.sh/), Conductor (https://www.conductor.build/) & CO have been building but as first class product from OpenAI.

Begs the question if Anthropic will follow up with a first-class Claude Code "multi agent" (git worktree) app themselves.



oh i didn't know that claude code has a desktop app already

And it uses worktrees.

It isn’t its own app, but it’s built in to their desktop, mobile and web apps.

I am not sure if multi agent approach is what it is hyped up to be. As long we are working on parallel work streams with defined contracts (say an agreed upon API def that backend implements and frontend uses), I'd assume that running independent agent coding sessions is faster and in fact more desirable so that neither side bends the code to comply with under specified contracts.

Usually I find the hype is centered around creating software no one cares about. If you're creating a prototype for dozens of investors to demo - I seriously doubt you'd take the "mainstream" approach.

I never heard of Emdash before and I am following on AI tools closely. It just shows you how much noise there is and how hard is to promote the apps. Emdash looks solid. I almost went to build something similar because I wasn't aware of it.

Maybe a dumb question on my side; but if you are using a GUI like emdash with Claude Code, are you getting the full claude code harness under the hood or are you "just" leveraging the model ?

I can answer for Conductor: you're getting the full Claude Code, it's just a GUI wrapper on top of CC. It makes it easy to create worktrees (1 click) and manage them.

I don't think this is true. Try running `/skills` or `/context` in both and you'll see.

Hey, Conductor founder here. Conductor is built on Anthropic's Agents SDK, which exposes most (but not all) of Claude Code's features.

https://platform.claude.com/docs/en/agent-sdk/overview


Thanks for clarifying, just wanted to point out it's not 1 to 1 with CC. Happy user of Conductor here btw, great product!

yeah, I wanted a better terminal for operating many TUI agent's at once and none of these worked because they all want to own the agent.

I ended up building a terminal[0] with Tauri and xterm that works exactly how I want.

0 - screenshot: https://x.com/thisritchie/status/2016861571897606504?s=20


looks like we both did haha: https://github.com/saadnvd1/aTerm

Emdash is inducing CC, Codex, etc. natively. Therefore users are getting the raw version of each agent.

They have Claude Code web in research preview

It still doesn't support plan mode... I'm really confused why that's so hard to do

The landing page for the demo game "Voxel Velocity" mentions "<Enter> start" at the bottom, but <Enter> actually changes selection. One would think that after 7mm tokens and use of a QA agent, they would catch something like this.

It's interesting, isn't it? On the one hand the game is quite impressive. Although it doesn't have anything particularly novel (and it shouldn't, given the prompt), it still would have taken me several days, probably a week, working nonstop. On the other hand, there's plenty of paper cuts.

I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.


It's also interesting how the functionality of the game barely changes between 60k tokens, 800k tokens, and 7MM tokens. It seems like the additional tokens made the game look more finished, but it plays almost exactly the same in all of them.

I wonder what it was doing with all those tokens?


Sadly, my own small game-dev adventures look similar: I can implement the core mechanics fairly quickly, but polishing the game takes ages.

UPDATE: without AI usage at all (just to clarify).


I'd bet the initial token usage is all net new while the later token usage probably has reading+regenerating significant portions of the project for individual minor changes/fixes.

E.g. I wouldn't be surprised if identifying the lack of touch screen support on the menu, feeding it in, and then regenerating the menu code sometime between 800k and 7MM took a lot of tokens.


I'm a Claude Code user primarily. The best UI based orchestrator I've used is Zenflow by Zencoder.ai -- I am in no way affiliated with them, but their UI / tool can connect to any model or service you have. They offer their own model but I've not used it.

What I like is that the sessions are highly configurable from their plan.md which translates a md document into a process. So you can tweak and add steps. This is similar to some of the other workflow tools I've seen around hooks and such -- but presented in a way that is easy for me to use. I also like that it can update the plan.md as it goes to dynamically add steps and even add "hooks" as needed based on the problem.


Always sounds so interesting and then I do a search only to found out it's another product trying to sell you your 20th "AI credit package." I really don't see how these apps will last that long. I pay for the big three already - and no I don't want to cancel them just so I can use your product.

Aren't there 500+ aggregator services?

It seems the big feature is working agents in parallel? I've been working agents in parallel in Claude Code for almost 9 months now. Just create a command in .claude/commands that references an agent in .claude/agents. You can also just call parallel default Task agents to work concurrently.

Using slash commands and agents has been a game changer for me for anything from creating and executing on plans to following proper CI/CD policies when I commit changes.

To Codex more generally, I love it for surgical changes or whenever Claude chases its tail. It's also very, very good at finding Claude's blindspots on plans. Using AI tools adversarially is another big win in terms of getting things 90% right the first time. Once you get the right execution plan with the right code snippets, Claude is essentially a very fast typer. That's how I prefer to do AI-assisted development personally.

That said, I agree with the comments on tokens. I can use Codex until the sun goes down on $20/month. I use the $200/month pro plan with Claude and have only maxxed out a couple times, but I do find the volume to quality to be better with Claude. So far it's worth the money.


How about us, Linux users? This is Mac only. Do they plan to support CLI version with all the features they are adding to desktop app?

Hi! Romain here, I work at OpenAI. The team actually built the Codex app in Electron so we can support both Windows and Linux very soon. Stay tuned!

Do you plan to release a build for Mac Intel?

Nice, thank you for sharing!

Are you planning to open-source it?

Let me guess, you use MacOS yourself?

not only is it mac only, it appears to be arm only as well. App won't launch on my intel mac

Yeah, I'm having the same issue. Disappointing limitations.

Guess MacOS gives you pass for early-access stuff, right? /s

From a developer's perspective it makes sense, though. You can test experimental stuff where configurations are almost the same in terms of OS and underlying hardware, so no weird, edge-case bugs at this stage.


- looks like OpenAIs answer to Claude Code Desktop / Cowork

- workspace agent runner apps (like Conductor) get more and more obsolete

- "vibe working" is becoming a thing - people use folder based agents to do their work (not just coding)

- new workflows seem to be evolving into folder based workspaces, where agents can self-configure MCP servers and skills + memory files and instructions

kinda interested to see if openai has the ideas & shipping power to compete with anthropic going forward; anthropic does not only have an edge over openai because of how op their models are at coding, but also because they innovate on workflows and ai tooling standards; openai so far has only followed in adoption (mcp, skills, now codex desktop) but rarely pushed the SOTA themselves.


Also interesting that they are both only for macOS. I’m feeling a bit left out on the Windows and Linux side, but this seems like an ongoing trend.

my guess is that openai/anthropic employees work on macOS and mostly vibe code these new applications (be it Atlas browser or now Codex Desktop); i wouldn't be surprised if Codex Desktop was built in a month or less;

linux / windows requires extra testing as well as some adjustments to the software stack (e.g. liquid glass only works on mac); to get the thing out the door ASAP, they release macos first.


We did train Codex models natively on Windows - https://openai.com/index/introducing-gpt-5-2-codex/ (and even 5.1-codex-max)

I appreciate this (as a Windows user) but I'm also curious how necessary this was.

Like I notice in Codex in PhpStorm it uses Get-Whatever style PowerShell commands but firstly, I have a perfectly working Git-Bash installed that's like 98% compatible with Linux and Mac. Could it not use that instead of being retrained on Windows-centric commands?

But better yet, probably 95% of the commands it actually needs to run are like cat and ripgrep. Can't you just bundle the top 20 commands, make them OS-agnostic and train on that?

The last tiny bit of the puzzle I would think is the stuff that actually is OS-specific, but I don't know what that would be. Maybe some differences in file systems, sandboxing, networking.


A lot of companies that use Windows are likely to use Microsoft Office products, and they were all basically forced to sign a non-compete where they can't run other models- just copilot.

I'm so sick and tired of the macOS elitism in the AI/LLM world.

It's just realism.

MacOS is unix under the hood so the models can just use bash and cli tools easily instead of dealing with WSL or Powershell.

MacOS has built-in sandboxing at a better level than Windows (afaik the Codex App is delayed for Windows due to sandboxing complexities)

Also the vast majority of devs use MacBooks unless they work for Microsoft or are in a company where the vast majority of employees are locked to Windows for some reason (usually software related).


To me, the obvious next step for these companies is to integrate their products with web hosting. At this point, the remaining hurdle for non-developers is deploying their creations to the cloud with built-in monetization.

I think deploying can already be done with the help of LLMs using docker and vpc's (e.g. hetzner and co.) rather easily.

What I struggle with is the legal overhead of e.g. collecting money for an app/website. I have a semi-finished app which I know I could delploy within a few hours but to collect money, living in Germany is a minefield from what I understand. I don't want my name made public with the app. GmbH (LLCs) cost thousands (?). The whole GDPR minefield, google-font usage scam etc. makes me hold back.

Googling/reddit only gives so much insights.

If someone has a good reference about starting a SaaS/App from within EU/Germany with all the legalities etc. I'd be super interested!


Just tell it to use your gcp/aws account using the cli, makes it infinitely powerful in terms of deployment. (Also, while I might miss some parts of programming that I have given to AI, I certainly don't miss working with clouds).

> Just tell it to use your gcp/aws account using the cli

Please don't.

People burning through their tokens allowance on Claude Code is one thing.

People having their agent unknowingly provisioning thousands of $ of cloud resources is something completely different.


This is also on the cloud providers for not giving us good tools to manage costs.

How about, "tell the agent to write instructions for cloud deployment with a cost estimate"

and specifically, the big companies, in a way that people notice. Claude Artifacts, AI Studio, etc. all kinda suck. If you have used Manus or connected your own CF, GCP, AWS, etc. you see how easy it could be if one of the big guys wanted it to be (or could get out of their own way).

the big boys probably don't want people who don't know sec deploying on their infra lol.


Deploying from Antigravity is as easy as say connecting the Firebase MCP [1] and asking it "deploy my app to firebase".

[1] https://firebase.google.com/docs/ai-assistance/mcp-server


I dont think these are made for non-devs, Lovable and other which are built for non-devs already provide hosting.

We have been working on this, letting any coding agent define infrastructure so we can define it effortlessly: https://specific.dev. We aren't just targeting non-developers though, we think this is useful to anyone building primarily through coding agents.

Replit already does this

interestingly opencode's first product was an IaC platform... seems to be where this is all going.

Mac only. Again.

Apple is great but this is OpenAI devs showing their disconnect from the mainstream. Its complacent at best, contemptuous at worst.

SamA or somebody really needs to give the product managers here a kick up the arse.


Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!

Electron? Why can't Codex write, or at least translate, your application to native code instead of using a multi-hundred-mb browser wrapper to display text? Is this the future of software engineering Codex is promising me?

Only thing i'd add re windows is it's taking us some time to get really solid sandboxing working on Windows, where there are fewer OS-level primitives for it. There's some more at https://developers.openai.com/codex/windows and we'd love help with testing and feedback to make it robust.

Curios why electron not native?

Wouldn’t native give better performance and more system integration?


He literally says why electron in his comment that you are replying to

Going cross platform doesn’t sound the main reason (or I hope not). For a company that size, is it really hard to hire specialised small team?! It would be a good show case for their Codex too

It is hard because this product will likely be obsolete next year based on how quickly AI is changing and evolving. Speed is king when you're on the frontier

Yes, exactly. "Just hire a small team and build native apps for another OS" is the recipe for taking a year to deliver instead of a month

They presumably use codex to build this. LLMs output is non-deterministic. Harder to keep the same logic across.

Would I love to see swiftui on macos, wpf/winui om windows, whatever qt hell it is on linux? Sure. But it is what it is.

I am glad the codex-cli is rust and native. Because claude code and opencode are not: react, solidjs and what have you for a tree layer.

Then again, if codex builds codex, let it cook and port if AI is great. Otherwise, it’s claim chowder


When you're a trillion dollar company that burns more coal than Bangladesh in order to harness a hyperintelligent Machine God to serve your whims, you don't have the resources to maintain native clients for three separate targets.

If you were going to release a product for developers as soon as it was ready for developers to try, such that you could only launch on one platform and then follow up later with the rest, macOS is the obvious choice. There's nothing contemptuous about that.


Kudos to the OpenAI reps for responding to my comment and doing so politely.

My ire was provoked by this following on from the Windows ChatGPT app that was just a container for the webpage compared to the earlier bells and whistles Mac app. Perceptions are built on those sorts of decisions.


Because of that windows had thinking budget selectors for months before ios and macos (those got this only last week)

Windows is almost ready. It's already running but we are solving a few more things before the release to make sure it works well.

OpenAI, ChatGPT, Codex

So many of the things that pioneered the way for the truly good (Claude, Gemini) to evolve. I am thankful for what they have done.

But the quality is gone, and they are now in catch-up mode. This is clear, not just from the quality of GPT-5.x outputs, but from this article.

They launch something new, flashy, should get the attention of all of us. And yet, they only launch to Apple devices?

Then, there are typos in the article. Again. I can't believe they would be sloppy about this with so much on the line. EDIT: since I know someone will ask, couple of examples - "7MM Tokens", "...this prompt initial prompt..."

And why are they not giving the full prompt used for these examples? "...that we've summarized for clarity" but we want to see the actual prompt. How unclear do we need to make our prompts to get to the level that you're showing us? Slight red flag there.

Anyway, good luck to them, and I hope it improves! Happy to try it out when it does, or at the very least, when it exists for a platform I own.


Not sure when you last evaluated the tools, but I strongly prefer Codex to Claude Code and Gemini.

Codex gets complex tasks right and I don't keep hitting usage limits constantly. (this is comparing the 20$ ChatGPT to the 200$ Claude Pro Max plans fwiw)

The tooling around ChatGPT and Codex is less, but their models are far more dependable imo than Antropic's at this very moment.


I don’t hit Codex limits because it’s so much slower, is what I’ve found personally.

I am not sure how those TUI are going to fare against multi providers ones like opencode.

The main thing I noticed in the video is that they have heavily sped up all the code generation sections... seems to be on 5x speed or more. (because people got used to how fast and good Sonnet, and especially Gemini 3.0 Flash, are)

> truly good (Claude, Gemini) to evolve

Claude yes, but Codex is much better than Gemini in every way that matters except speed in my experience.

Gemini 3 Flash is an amazing model, but Gemini 3 Pro isn't great. It can do good work, but it's pretty random if it will or it will go off the rails and do completely the wrong thing. OTOH GPT 5.2 Codex with high thinking is the best model currently available (slightly better than Opus 4.5)


I can't speak to the typos, but launching first for MacOS not something new for OpenAI. They did the same with their dedicated desktop client.

What’s the “7MM Tokens” typo?

Looks like they forgot the part of the code editor where you can… edit code. Claude Code in Zed is about the most optimal experience I can imagine. I want the agent on the side and a code editor in the middle.

That’s not really a negative for me as I can easily jump into vscode where I already have my workspace for coding set up exactly as I like it. This being a completely separate app just to get the agentic work right is a good direction imo

Yeah but its annoying to find the file the agent just edited without any IDE/editor integration, you have to add that command which opens the file in vscode after editing.

It would be nice to have an integrated development environment.

How dare you want your IDE to be integrated!

Usage like this is becoming a rarity. Most people are editing significantly less and "agent interfaces" are slowly taking the focus.

"most" people aren't even using AI yet

Of those that are, most are not vibe coding, so an editor is still required at many points


For greenfield apps you can vibecode it. For existing complex apps, where existing products where customers pay us a lot of money for working software, understanding the changes and context surrounding them in the code is critical or else nobody knows how the system works anymore and maintenance and support becomes impossible.

I had been procrastinating putting in the effort to find a decent web designer to redesign our company’s website because I couldn’t stomach the hours I would need to put in to educate them about our messaging and to slowly go around and around iteratively to get the design nailed.

Last week, I decided to try building the site myself using Codex (the terminal one). I chose Astro as the framework because I wanted to learn about it. I fed it some marketing framework materials (positioning statements and whatnot) and showed it some website designs that we like. I then asked it to produce a first cut and it one-shotted a pretty decent bit of output.

AGI is definitely a few more years away, because I’ve since invested probably 30 hours of iteration to make the site into something that is closer to what I eventually want to launch. But here’s the thing: I never intended for Codex to produce THE final website. But now I’m thinking, “maybe we can?” On my team, we have just enough expertise and design know-how to at least know what looks good and we are developers so we definitely know what good code looks like. And Codex is nailing it on both those fronts.

As I said, we’re far from AGI. There’s no way I can one-shot something like this. It requires iteration with humans who have years of “context” built up. But maybe the days of hiring a designer and just praying that they somehow get it right are behind us.


PS: Yes, I spent several hours on the weekend getting Codex to add animations, sound effects, and a mini game to our home page hero graphic. That was fun. I look forward to the creativity that people unleash with tools like this in the coming months.

I think that's the real enabler. Iterating on some polish details that we know the shape of but is hard to get the nuances right.

Link to the site?

Bit of a buried lede:

> For a limited time we're including Codex with ChatGPT Free

Is this the first free frontier coding agent? (I know there have been OSS coding agents for years, but not Codex/Claude Code.)


That depends on whether Gemini CLI counts. I've had generally bad experiences with it, but it is free for at least some usage.

Google also has aistudio.google.com which is Lovable competitor and its free for unlimited use. That seems to work so much better than gemini CLI even on similar tasks

This will actually work well with my current workflow: dictation for prompts, parallel execution, and working on multiple bigger and smaller projects so waiting times while Codex is coding are fully utilized, plus easy commits with auto commit messages. Wow, thank you for this. Since skills are now first class tools, I will give it a try and see what I can accomplish with them.

I know/hope some OpenAI people are lurking in the comments and perhaps they will implement this, or at least consider it, but I would love to be able to use @ to add files via voice input as if I had typed it. So when I say "change the thingy at route slash to slash somewhere slash page dot tsx", I will get the same prompt as if I had typed it on my keyboard, including the file pill UI element shown in the input box. Same for slash commands. Voice is a great input modality, please make it a first class input. You are 90% there, this way I don't need my dictation app (Handy, highly recommended) anymore.

Also, I see myself using the built in console often to ls, cat, and rg to still follow old patterns, and I would love to pin the console to a specific side of the screen instead of having it at the bottom and pls support terminal tabs or I need to learn tmux.


So much this. I'm eagerly waiting to see what anthropic and OpenAI do to make dictation-first interaction a first class citizen instead of requiring me to use a separate app like Super Whisper. It would dramatically improve complex, flow-breaking interactions when adding files, referencing users or commands, etc.

Importantly I want full voice control over the app and interactions not just dictating prompts.


This looks interesting and I use Codex a fair bit already in vscode etc, but I'm having trouble leaving a 'code editor with AI' to an environment that sort of looks like it puts the code as a hidden secondary artefact. I guess the key thing is the multi agent spinning plates part.

(I work on Codex) I think for us the big unlock was GPT-5.2 and GPT-5.2-Codex, where we found ourselves needing to make many fewer manual edits.

I find that the case too. For more complex things my future ask would be something that perhaps formalized verification/testing into the AI dev cycle? My confidence in not needing to see code is directly proportional in my level of comfort in test coverage (even if quite high level UI/integration mechanisms rather than 1 != 0 unit stuff)

> "Localize my app and add the option to change units"

To me this still feels like the wrong way to interact with a coding agent. Does this lead people to success? I've never seen it not go off the rails in some way unless you provide clear boundaries as to what the scope of the expected change is. It's gonna write code if you don't even want it to yet, it's gonna write the test first or the logic first, whichever you don't want it to do. It'll be much too verbose or much too hacky, etc.


The better models can handle that prompt assuming there is an existing clean codebase and the scope of the task is not too large. The existing code can act as an implicit boundary.

Weaker models give your experience, or when using a 100% LLM codebase I think it can end up in a hall of mirrors.

Now I have an idea to try, have a 2nd LLM processing pass that normalizes the vibe-code to some personal style and standard to break it out of the Stack Overflow snippet maze it can get itself in.


I've had no issues with prompts like that. I use Cursor with their plan mode, so I get a nice markdown file to iterate on or edit myself before it actually does anything.

100%

First phase: Plan. Mandatory to complete, as well as get AI feedback from a separate context or model. Iterate until complete.

Only then move on to the Second Phase: make edits.

Better planning == Better execution


Until a few days ago (when I switched to Codex), I would have agreed. My workflow was "thoroughly written issues" -> plan -> implement. Without the plan step, there is a high likelyhood that Claude Code (both normal or with GLM-4.7) or Cursor drift off in a wrong direction.

With Codex, I increasingly can skip the plan step, and it just toils along until it has finished the issue. It can be more "lazy" at times and ask before going ahead more often, but usually in a reasonable scope (and sometimes at points where I think other services would have gone ahead on a wrong tangent and burnt more tokens of their more limited usage).

I wouldn't be surprised that with the next 1-2 model iterations a plan step won't be worth the effort anymore, given a good enough initial written issue.


I still use tons of non-plan mode edits with cursor too. The example prompt above I'd plan it out first just to make sure it does it in a way I want since I personally know there are tons of ways to implement it. But for simple changes or when I don't want a plan on purpose I just use a normal agent.

And then

> gh-address-comments address comments

Inspiring stuff. I would love to be the one writing GH comments here. /s

But maybe there's a complementary gh-leave-comments to have it review PRs for you too.


Somewhat underwhelmed. I consider agents to be a sidetrack. The key insight from the Recursive Language Models paper is that requirements, implementation plans, and other types of core information should not be part of context but exist as immutable objects that can be referenced as a source of truth. In practice this just means creating an .md file per stage (spec, analysis, implementation plan, implementation summary, verification and test plan, manual qa plan, global state reference doc).

I created this using PLANS.md and it basically replicates a kanban/scrum process with gated approvals per stage, locked artifacts when it moves to next stage, etc. It works very well and it doesnt need a UI. Sure, I could have several agents running at the same time, but I believe manual QA is key to keeping the codebase clean, so time spent on this today means that future requirements can be implemented 10x faster than with a messy codebase.


This is what I've been doing. Iterating on specs is better than iterating on code. More token efficient and easier to review. Good code effortlessly follows from good specs. It's also a good way to stop the code turning into quicksand (aside from constraining the code with e2e tests, CLI shape, etc).

But what is your concept of "stages"? For me, the spec files are a MECE decomposition, each file is responsible for its unique silo (one file owns repo layout, etc), with cross references between them if needed to eliminate redundancy. There's no hierarchy between them. But I'm open to new approaches.


The stages are modelled after a kanban board. So you can have whichever stages you think are important for your LLM development workflow. These are mine:

00: Iterate on requirements with ChatGPT outside of the IDE. Save as a markdown requirements doc in the repo

01: Inside the IDE; Analysis of current codebase based on the scope of the requirements

02: Based on 00 and 01, write the implementation plan. Implement the plan

03: Verification of implementation coverage and testing

04: Implementation summary

05: Manual QA based on generated doc

06: Update global STATE.md and DECISIONS.md that documents the app, and the what and why of every requirement

Every stage has a single .md as output and after the stage is finished the doc is locked. Every stage takes the previous stages' docs as input.

I have a half-finished draft with more details and a benchmark (need to re-run it since a missing dependency interrupted the runs)

https://dilemmaworks.com/implementing-recursive-language-mod...


An idea just came into my mind. What if an agent could spawn other agents and provide them with immutable resource files and a 'chrooted' mutable directory those spawned agents could use recursively to prepare immutable resources for other recursively called sub-agents. The immutability and chrooting could be enforced by the harness.

Which paper?

Recursive Language Models by Alex Zhang/MIT

@dworks: Good insights. Thanks!

If you add a dialectic between Opus 4.5 and GPT 5.2 (not the Codex variant), your workflow - which I use as well, albeit slightly differently [1] - may work even better.

This dialectic also has the happy side-effect of being fairly token efficient.

IME, Claude Code employs much better CLI tooling+sandboxing when implementing while GPT 5.2 does excellent multifaceted critique even in complex situations.

[1]

- spec requirement / iterate spec until dialectic is exhausted, then markdown

- plan / iterate plan until dialectic is exhausted, then markdown

- implement / curl-test + manual test / code review until dialectic is exhausted

- update previous repo context checkpoint (plus README.md and AGENTS.md) in markdown


adding another external model/agent is exactly what I have been planning as the next step. in fact i already paste the implementation and test summaries into chatgpt, and it is extremely helpful in hardening requirements, making them more extensible, or picking up gaps between the implementations and the initial specs. it would be very useful to have this in the workflow itself, rather than the coding agent reviewing its own work - there is a sense that it is getting tunnel visioned.

i agree that CC seems like a better harness, but I think GPT is a better model. So I will keep it all inside the Codex VSCode plugin workflow.


Genuinely curious if people would just let this rip with no obvious isolation?

I’m aware Mac OS has some isolation/sandboxes but without running codex via docker I wouldn’t be running codex.

(Appreciate there are still risks)


(I work on Codex) We have a robust sandbox for macOS and Linux. Not quite yet for Windows, but working on that! Docs: https://developers.openai.com/codex/security

Shameless plug, but you can sandbox codex cli without a container using my macOS app: https://multitui.com

This is a really nice tool! (Also, I love the old school animated GIFs in the site's footer.)

I wouldn't trust it. I'm moving to always running AI coding in a full VM.

I really look forward to using this. I tried Codex first time yesterday and it was able to complete a task (i.e. drawing Penrose tilings) that Claude Code previously failed at. Also a little overwhelmed by all the new features that this app brings. I feel that I'm behind all the fancy new tools.

I'm still waiting for the big pivotal moment in this space, I think there is a lot of potential with rethinking an IDE to be Agent first, and lots of what is out there is still lacking. (It's like we all don't know what we don't know, so we are just recycling UX around trying to solve it)

I keep coming back to my basic terminal with tmux running multiple sessions. I recently though forked this https://github.com/tiann/hapi and been loving using tailscale to expose my setup on my mobile device for convenience (plus the voice input there)


There is little to no integration between deterministic IDE features(like refactorings) and LLMs. For example I don't want a statistical tool to rename a method by predicting tokens, I want it to use IDE features and not via another higher abstraction protocol like mpc, I want deeper integration. Sometimes I look at comments in code and think "why can't I have an agent checking if the content of a comment actually reflect the code below?" I feel like we're light years away from a killer integration

This might actually be another area language servers shine. As I understand it, the TS Language Server can do renames. Ergo, we ought to be able to have the LLM ask the lang server to do the rename instead of trying to do it itself. That'd be easier than trying to integrate with each IDE individually. (Whereby "IDE" seems to be synonymous with "VSCode" lately...)

Agree, another improvement i'd like along the lines or renames is lsp suggestions for method names, enums, functions, etc The llm should be able to autocomplete given lsp available symbols, this way it would avoid far less hallucinated methods

Claude, at least, already supports LSP servers though. https://code.claude.com/docs/en/plugins-reference#lsp-server...

After using CC in VSCode a bit, I find it makes liberal use of Pylance, both proactively (in-thread) as well as during the lifecycle (guessing via hooks). It’s almost annoying how, and I’d like to figure out how to get it to use repos lint rules.

Deep integrations are hard and the AI companies are just winging it when it comes to eating their own dog food. Their apps are bare bones, somewhat flaky, and overall not that impressive from a UX point of view.

It's very obvious that while their AI teams are top notch, their product teams are very middle of the road. Including design. Even though they apparently engaged Jony Ive, I can't actually see his 'touch' on anything they have. You'd expect them to have a much higher level of ambition when it comes to their own products. But they seem stuck getting even the basics shipping. I use Chat GPT for Desktop. It's alright but it seems to have stagnated a bit and it has some annoying bugs and flakiness. Random shit seems to break regularly with releases as well.

Another good example of the lack of vision/product management is the #1 and oldest use case for LLMs since day 1: generating text. You'd expect somebody to maybe have come up with the genius idea of "eh, hmm, you know, I wonder if we can do better than pasting blobs of markdown rendered HTML to and/from a word processor from a f**ing sidebar".

Where's the ultimate agentic word processor? The ultimate writing experience? It's not there. Chat GPT is hopelessly clumsy doing even the most basic things in word processors. It can't restructure your document. It can't insert/delete bits of text. It can't use any of the formatting and styling controls. It can't do that in the UI. It can't do that at the file level. It's just not very good at doing anything more than generating bits of text with very basic markdown styling that you might copy paste to your word processor. It won't match the styling you have. Last time I checked Gemini in Google docs it was equally useless. I don't have MS Office but I haven't heard anything that suggests it is better.

For whatever reason, this has not been a priority (bad product management?) or they simply don't have the creativity to see the rather obvious integration issues in front of them.

Yes making those is a lot of work and requires a bit of planning. But wasn't the point of agentic coding that that's now easy? Apparently not.


Or maybe, hear me out, we don't need any of this ""agent"" first shiny thingy

yeah, TUIs for AI is just lazy work imho. I'm glad at least this time it's a macOS app, but it's still just a shitty chat interface

also this feels like a unique opportunity to take some of that astronomical funding and point it towards building the right tooling for building a performant cross-platform UI toolkit in a memory-safe language—not to mention a great way for these companies to earn some goodwill from the FOSS community


> For a limited time, Codex will also be available to ChatGPT Free and Go users to help build more with agents. We’re also doubling rate limits for existing Codex users across all paid plans during this period.

Is there more information about it? For how long and what are the limits?


They are probably providing it for free for 1 month.


OT: I never liked about codex how it didn't ask for confirmations before editing. While Claude has auto accept off by default I never understood why codex didn't have it. I want to iterate on LLMs edit suggestions.

Did they fix it?

Otherwise I'm not interested.


At least codex inside pycharm has auto edit off by default.

Why do I have to manually switch between medium, high, extra high?? This is comedy at this point, Claude Code "just works" without having to think about obscure stuff like this and it still produces a better output. OpenAI is just embarassing at this point, ChatGPT also got ruined with gpt-5.2 which talks and has the brain of a 2 year old, even gpt-3.5-turbo was better. Did they still have no successful training run?

Claude responds differently to "think", "think hard", and "think very hard". Just because it's hidden to you, doesn't mean a user doesn't have a choice.

Saying gpt-3.5-turbo is better than gpt-5.2 makes me think something you got some of them hidden motives.

https://code.claude.com/docs/en/common-workflows#use-extende...


’’’Phrases like “think”, “think hard”, “ultrathink”, and “think more” are interpreted as regular prompt instructions and don’t allocate thinking tokens.’’’

They dont allocate thinking tokens but they do change model behavior.

I was getting this in my Claude code app, it seems clear to me that they didn’t want users to do that anymore and it was deprecated. https://i.redd.it/jvemmk1wdndg1.jpeg

Thx for the correction. Changed a couple weeks ago. https://decodeclaude.com/ultrathink-deprecated/

Nice blog, this post is interesting: https://decodeclaude.com/compaction-deep-dive/ Didn't know about Microcompaction!

If you're a big context/compaction fan and want another fun fact, did you know that instead of doing regular compaction (prompting the agent to summarize the conversation in a particular way and starting the new conversation with that), Codex passes around a compressed, encrypted object that supposedly preserves the latent space of the previous conversation in the new conversation.

https://openai.com/index/unrolling-the-codex-agent-loop/ https://platform.openai.com/docs/guides/conversation-state#c...

Context management is the new frontier for these labs.


Crazy free tier. I reckon I used ~2 weeks of the claude $20 subscription within an hour. Spawned like 12 semi big tasks and I still didn't see no warnings.

Not totally understanding this. Crazy free tier but that it used a crazy amount of tokens that cost you? Sry just trynna see what you’re saying

My bad then. I meant that it's "Crazy Good" as in that the free tier gave me a tremendous amount of tokens.

What I didn't realize though, is that the limit doesn't reset each 5 hours as is the case for claude. I hit the limit of the free tier about 2 hours in, and while I was expecting to be able to continue later today, it alerts me that I can continue in a week.

So my hype for the amount of tokens one gets compared to claude was a bit too eager. Hitting the limit and having to wait a week probably means that we get a comparable token amount vs the $20 claude plan. I wonder how much more I'd get when buying the $20 plus package. The pricing page doesn't make that clear (since there was no free plan before yesterday I guess): https://developers.openai.com/codex/pricing/


Why make a desktop windowing system app, for a user group who runs a bunch of simultaneous terminal sessions with tear-off tabs or tmux panels, and then force everything into one window that can only display a single session at a time?

The Open button and then codex resume --last is good, but it's a waste and The Wrong Abstraction not to make instantiable conversation windows from the get-go.


Bugs me they treat MacOS as first class. Do people actually develop on a Mac in 2026? Why not just start with Linux?

Majority of devs > 60% in the valley use MacOS for development. Apple hardware is best in class. It's unix like. Cost isn't an issue. Macbooks look cool. Many reasons.

Can't build iOS apps on anything else sadly.

I mean if they were targeting "software engineers" in general then Windows would be the obvious choice in 2026 as much as in 2006. But these early releases are all about the SF bubble where Mac is very much dominant.

Really? I frankly don’t know anyone who’s not on Linux. If you do any AI/ML you basically find yourself on a Linux box eventually. Perhaps I live in a bubble.

Surely it varies a lot and everyone is in an industry bubble to some extent, but from my experience in some non-tech industries (healthcare, manufacturing), Linux workstations were nonexistent and working with the Linux servers was more a sign of an ops role. People who wrote code for a living didn't touch them directly. Last StackOverflow survey [1] puts it at something like 50% use Windows at work, 30% Mac, 20-40% Linux (breakdown of WSL and Ubuntu as categories seems confusing, maybe the raw data is better).

[1] https://survey.stackoverflow.co/2025/technology/#1-computer-...


Yes, you live in a bubble.

How does Codex mac app compare with Cursor? If anyone who tried both can explain here?

My experience with Cursor is generally good and I like that it gives me UX of using VS Code and also allows selection of multiple models to choose if one model is stuck on the prompt and does not work.


Coding agents with full automation like this require a different workflow that is almost purely conversational compared to Cursor/Windsurf/VS Code. It requires more trust in the model (but you can always keep Cursor open off to the side and verify its edits). But, once you get into the right rhythm with it, it can be _really_ powerful.

Double the codex limits for 2 months is very compelling. The limits are already generous.

AI companies are trying way too hard to make us think that we can be the copilot now. 90% of the time (if not more) you definitely need to recode what the model spits to you. Vibe coders with no experience in coding are just creating work for coders that do know what they are doing

Best part of the Codex app launch is that, OpenAI has opened the whole Codex ecosystem (CLI, Web, IDE Extensions) for free ChatGPT users. And x2 usage for Plus, Pro. This is I think to gain developers' attraction from Claude Code.

Is it open source? Do they disclose which framework they use for the GUI? Is it Electron or Tauri?

lol ofc not

looks like the same framework they used to build chatgpt desktop (electron)

edit - from another comment:

> Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!


I guess the next it was meant to happen...I tried Google's Antigravity and found it quite buggy.

May give a go at this and Claude Code desktop as well, but Cursor guys are still working the hardest to keep themselves alive.


This is an ode to opencode and how openai, very strangely, is just porting layout and feature of real open-source.

So much valuation, so much intern competetion and shenanigans than the creatives left.


Interesting timing for me personally as I just switched from running Codex in multiple tabs in Cursor to Ghostty. It had nicer fonts by default, better tab switching that was consistent with the keyboard shortcut to switch to any tab on Mac, and it had native notifications that would ping when Codex had finished. Worktrees requiring manual configuration was probably the one sticking point, so definitely looking forward to this.

No.

I am glad to not depend on AI. It would annoy me to no ends how it tries to assimilate everything. It's like systemd on roids in this aspect. It will swallow up more and more tasks. Granted, in a way this is saying "then it was not necessary to have this things anymore now that AI solves it all", but I am skeptical of "the praised land" here. Skynet was not trusted back in 1982 or so. I don't trust AI either.


I'm the same way but I've got the gloomy sense that folks like us are about to be swept aside by the flood if we don't "adapt."

I got invites to seven AI-centered meetings late last week.


Same. And indeed, it's here. The genie is not going back into the bottle, so we have to learn how to live in this new world.

Eric Schmidt has spoken a lot recently about how it's one of the biggest advances in human history and it's hard to disagree with him, even if some aspects make me anxious.


One of the biggest advances in human history, and yet the owners of the technology with access to an unlimited number of "agents" using frontier models still can't release a desktop chat application without using Electron to bring in several hundred mb of bloat for displaying text. Someone's going to have to explain this one to me because the math is not mathing.

Exactly. If AI really worked, they would've released a native app. And it wouldn't take much to also get a Windows and a Linux native app, wouldn't it?

Apparently, the Codex app itself is proof that AI is not that good at doing what people think it does.


How come if I download code from GitHub, rename some stuff, and republish it under another license I’m a bad guy, but if I ask ChatGFY to do it for me I’m a 10x Chad? … someone is gonna figure that part out in court. I remember what code SCO used to make hay, and I know what side the MPAA, RIAA, Google, and NVidia Are gonna be on at the end of the day.

Replacing workers with things you can’t beat, sue, intimidate, or cajole? Someone is gonna do something to make that not cool in MBA land. I think if one of my employees LL-MessedUp something, and I were upset, watching that same person stealing my money haplessly turn to an LLM for help might land me in jail.

I kinda love LLMs, I’ve always struggled to write emails without calling people names. There’s some clear coding tooling utility. But some of this current hype wave is insano-balls from a business perspective. Pets.com X here’s-my-ssh-keys. Just wild.


I think a lot of AI talk doesn't explain where it shines the brightest (imo): Write the code you don't want to write.

I've recently had an issue "add VNC authentication" which covers adding vnc password auth to our inhouse vnc server at work.

This is not hard, but just a bit of tedious work getting the plumbing done, adding some UI for the settings, fiddle with some bits according to the spec.

But it's (at least to me) not very enjoyable, there is nothing to learn, nothing new to discover, no much creativity necessary etc. and this is where Codex comes in. As long as you give it clearly scoped tasks in an environment where it can use existing structures and convetions, it will deliver. In this case it implemented 85% of the feature perfectly and I only had to tweak minor things like refactor 1-2 functions. Obviously I read and understood and checked everything it wrote, that is an absolute must for serious work.

So my point is, use AI as the "code monkey". I believe most developers enjoy the creative aspects of the job, but not the "type C++ on your keyboard". AI can help with the latter, it will type what you tell it and you can focuse on the architecture and creative part of the whole thing.

You don't have to trust AI in that sense, use it like autocompletion, you can program perfectly fine without it but it makes your fingers hurt more.


i wonder if the skills will divide a bit. That there will be those who still program by hand - and this will be a needed skill, though AI will be a part of their daily toolset to a greater or lesser degree.

Then there will be the AI wranglers who act almost like DevOps engineers for the AI - producing software in a different way ...


Good luck.

I feel the same way about using the Internet or books to code. I'd rather just have the source code so that I'm not dependent on anything other then my own brain.

It would be nice if it didn't have to be all local. I'd love a managed cluster feature where you could just blast some workloads off to some designated server or cluster and manage them remotely, share progress with teammates, etc. (Not "cloud" though; I'd still want them on the internal network). I imagine something like that is in the works.

Not going to solve your exact problem but I started this project with this approach in mind. It is exposed to the Internet though on a VPS or server but using caddy basic auth in front of the coding url

https://github.com/jgbrwn/vibebin


I do it with ssh and tmux. I suppose tools could make it better.

Is there any marked difference or benefit over Claude Code?

It’s possible to run up to 4 agents at once vs. Claude Code’s single thread. Sometimes I’ll find meaningful quality differences between what agents produce.

Interesting. Has anyone found running multiple parallel agents useful in practice?

You'd save time compared with running them in serial obviously?

How is this better than vscode with the codex extension?

Thanks for asking this - I had the same question.

No access via paid API? C’mon guys. Big enterprise consume codex and Claude via vertex and azure foundry, because it’s already available on their contracts.

And they don’t mind the usage cost. Please let me spend 5k monthly on this.


I’ve been using codex regularly and it’s pretty good at model extra high with pretty generous context.

From the video, I can see how this app would be useful in:

- Creating branches without having to open another terminal, or creating a new branch before the session.

- Seeing diff in the same app.

- working on multiple sessions at once without switching CLI

- I quite like the “address the comments”, I can see how this would be valuable

I will give it a try for sure


Maybe it's because I'm not used to the flow, but I prefer to work directly on the machine where I'm logged in via ssh, instead of working "somewhere in a git tree", and then have to deploy/test/etc.

Once this app (or a similar app by Anthropic) will allow me to have the same level of "orchestration" but on a remote machine, I'll test it.


Not going to solve your exact problem but I started this project with this approach in mind

https://github.com/jgbrwn/vibebin


Would like to see non-dev workflow that benefit from this app.

I typically bounce between Claude Code and Codex for the same project, and generally enjoy using both to check each other.

One cool thing about this: upon installing it immediately found all previous projects I've used with Codex and has those projects in the sidebar with all of the "threads" (sessions) I've had with Codex on these projects!


I don't know you, but apart from ai tools race fatigue(feel pretty much like frameworks fatigue), all I see is mouse traveling a lot between far distant small elements, buttons and textareas. AI should have brought innovation even in UIs we basically stopped innovating there

I really like it!

Does anybody know when Codex is going to roll out subagent support? That has been an absolute game changer in Claude Code. It lets me run with a single session for so much longer and chip away at much more complex tasks. This was my biggest pain point when I used Codex last week.

It's already out.

Can you explain how to use it? I’ve tried asking it to do “create 3 files using multiple sub agents” and other similar wording. It never works.

Is it in the main Codex build? There doesn’t seem to be an experiment for it.

https://github.com/openai/codex/issues/2604


When can I get remote access in the iPhone app? Start on my laptop, check results using Tailscale/VPN and add follow up’s on the mobile to run on the computer. Know many that would love this feature.

Right? Why not focus on a nice mobile handoff experience? Not being tethered to your desktop or laptop for work is such a game changer! This codex app is like exactly the workflow I use but the fact that I can't pick up where I left off on my phone is just braindead stupid!

Can this interact with remote (over ssh) codex agents?

every day waking up, reading hn and thinking "This new AI xy is insane - it will change everything" then start my daily work in low code integration and thinking "How long will this last?

Something similar but for any ACP server: https://github.com/inercia/mitto

Is everything OpenAI do/release now a response to something Anthropic have recently released?

I remember the days when it was worth reading about their latest research/release. Halcyon days indeed.


I'm excited to try this out, it seems like it would solve a lot of my workflow issues. I hope there is the ability to review/edit research docs and plans it generates and not just code.

Having dictation and worktree support built in is nice. Currently there is a whole ecosystem of tools implementing similar functionality for Claude Code. The automations look cool too!

Is this not just a skinned version of Goose: https://block.github.io/goose/

they are all copies of each other. Did you expect them to build something completely new? Software Development is stuck in an AI hole where we only build AI features.

In the end this and all other 89372304 AI projects are just OpenAPI/Anthropic API wrappers, but at least one has 1st party support which maybe gives it a slight advantage?

I'm managing context with codex inside VSCode using different threads. I'm trying to figure out if there are use cases where I'd rather be in this app.

has someone figured out on how to set it to yolo mode yet? super unuseable with this constant asking ....

i've been using ai vibe coding tools since Copilot was basically spicy autocomplete, and this feels like the next obvious step: less “help me type” and more “please do this while I watch nervously.” The agent model sounds powerful, but in practice it’s still a lot of supervision, retries, and quiet hope it doesn’t hallucinate itself into a refactor I didn’t ask for.

So if the agent struggles even when you are working with it, how will it be better working alone? This is why I never let the agent work by themselves. I'm very opinionated about my code.

Does it somehow gain some superpower from being left alone?


More simple and similar app: vibe-kanban

https://www.vibekanban.com/


We are certainly approaching the point where a high end MacBook Pro for development isn’t required. Feels very close to just being able to use an iPad? My current workplace deploy on Vercel, we already test actively on feature branches and the models have gotten so good that you can reliably just commit what they’ve done (with linting and type check hooks etc) and in the rare event something is broken, follow up with a new commit.

Judging from the performance issues with Claude Code, you won't be able to run a decent agentic cli or desktop workflow orchestrator on anything other than mbp.

* Replace MacBook with any high-end machine that requires a bunch of configuration for local dev. And replace iPad with any thin client that’s much more portable and convenient, eg glasses that display a screen and Bluetooth keyboard. Why shouldn’t I be able to describe the changes I want to an LLM from a coffee shop?

I did a whole feature in a game I'm building while riding in a car to dinner in Claude this weekend. It could do everything itself except the PR (kicked me over to the GitHub app for that).

I do wonder how that impacted my usage vs doing it locally, and I do with usage were more visible in the mobile app (at least give me my statusline). But it worked.


is this bye bye v0, bolt, lovable, base 44 and the 45456856468943659874651654896795256674 competitors that let you vibe code apps from their webapp?

ChatGPT can’t even write me a simple working AutoHotKey script so I’m not sure why I’d trust it with any actual coding. As I’ve done for about the past year with OpenAI showcases like this, this elicited an ‘Oh, that’s kinda neat, I’ll just wait for Gemini to do something similar so it will actually work’ from me.

Wow, this is nearly an exact copy of Codex Monitor[1]: voice mode, project + threads/agents, git panel, PR button, terminal drawer, IDE integrations, local/worktree/cloud edits, archiving threads, etc.

[1] https://github.com/Dimillian/CodexMonitor


Codex Monitor seems like an Antigravity Agent Manager clone. It came out after, too.

Bunch of the features u listed were already in the codex extension too. False outrage it its finest.


I have both Codex Monitor and this new Codex app open side by side right now; aside from the theme, I struggle to tell them apart. Antigravity's Agent Manager is obviously different, but these two are twins.

I have a very hard time getting worked up over this. There are a ton of entrants in this category, they all generally look the same. Cribbing features seems par for the course.

Antigravity is a white labeled $2B pork of Windsurf, so it really starts there, but maybe someone knows what windsurf derived from to keep the chain going?

cursor?

from what I can tell, the people behind windsurf were at it first

oh, codeium? that was them?

Maybe github copilot then


This is the 5th OpenAI product called Codex if I'm counting correctly

These paid offerings geared toward software development must be a hell of a lot "smarter" than the regular chatbots. The amount of nonsense and bad or outright wrong code Gemini and ChatGPT throw at me lately is off the charts. I feel like they are getting dumber.

Yes they are, the fact that the agents have full access to your local project files makes a gigantic difference.

They do *very* well at things like: "Explain what this class does" or "Find the biggest pain points of the project architecture".

No comparison to regular ChatGPT when it comes to software development. I suggest trying it out, and not by saying "implement game" but rather try it by giving it clear scoped tasks where the AI doesn't have to think or abstract/generalize. So as some kind of code-monkey.


I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved. I prefer Claude code right now because it’s a better product . Gemini just has a weird context window that poisons the rest of the code generated (when online) ChatGPT Codex vs Claude I feel that Claude is a better product and I don’t use enough tokens to for Claude Pro at $100 and just have a regular ChatGPT subscription for productivity tasks .

> I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved.

I think it's clear now that the pace of model improvements is asymptotic (or at least it's reached a local maxima) and the model itself provides no moat. (Every few weeks last year, the perception of "the best model" changed, based on basically nothing other than random vibes and hearsay.)

As a result, the labs are starting to focus on vertical integration (that is, building up the product stack) to deepen their moat.


> I think it's clear now that the pace of model improvements is asymptotic

As much as I wish it were, I don't think this is clear at all... it's only been a couple months since Opus 4.5, after all, which many developers state was a major change compared to previous models.


Like I said, lots of vibes and hearsay! :)

The models are definitely continuing to improve; it's more of a question of whether we're reaching diminishing returns. It might make sense to spend $X billion to train a new model that's 100% better, but it makes much less sense to spend $X0 billion to train a new model that's 10% better. (Numbers all made up, obviously.)


It’s the inconsistency that gets me. Very similar tasks, similar complexity, same code base, same prompting:

Session A knocks it out of the park. Chef’s kiss.

Session B just does some random vandalism.


This does look like it would simplify some aspects of using Codex on Mac, however, when I first saw the headline I thought this was going to be a phone app. And that started running a whole list of ideas through my brain... :(

But overall, looks very nice and I'm looking forward to giving it a try.


I don't know why any frontier model lab can't ship a mobile app that doesn't use a cloud VM but is able to connect to your laptop/server and work against local files on there when on the same network (e.g.: on TailScale). Or even better act as a remote control for a harness running on that remote device, so you couldn't seamlessly switch between phone and laptop/server.

I'm also so baffled by this. I had to write my own app to be able to do seamless handoff between my laptop/desktop/phone and it works for me (https://github.com/kzahel/yepanywhere - nice web interface for claude using their SDK, MIT, E2E relay included, no tailscale required) but I'm so baffled why this isn't first priority. Why all these desktop apps?

This looks awesome! And incredibly polished. Exactly the approach I take to vibebin-- I may have to integrate yep anywhere into it (if that's ok) as an additional webui!

https://github.com/jgbrwn/vibebin

Although I would need it to listen on 0.0.0.0 instead of localhost because I use LXC containers so caddy on the host proxies to the container 10.x address. Hopefully yep has a startup flag for that. I saw that you can specify the port but didn't see listening address mentioned.


Cool! Your project sounds really interesting. I would love to try it out, especially if you integrated yep! Yes it has yepanywhere --host 0.0.0.0 or you can use HOST env var.

Currently using opencode with Codex 5.2 and wondering why I should switch.

seems like I need to update my toolset for the 3rd time this week

Does this support users who access Codex via Azure OpenAI API keys?

Not to rain on the parade, but this app feels to me ... unpolished. Some of the options in the demo feels less thought out and just put together.

I will try it out, but is this just me, or product/UX side of recent OpenAI products are sort of ... skipped over? It is good that agents help ship software quickly, but please no half-baked stuff like Altas 2.0 again ...


I don’t get why they announce it as a “Mac app” when the UI looks and feels nothing like a Mac app. Also electron… again.

Why not flex some of those codex skills for a proper native app…


What else do you expect from vibecoding? Even the announcement for this app is LLM generated.

This is true. The font and animation feels basic to me, even as a programmer focused app

Built an open source lightweight version of this that works with any cli agent: https://github.com/built-by-as/FleetCode

This is so garbage. OpenAI is never catching up.

I'm waiting iOS version

The inclusion of a live vibe-coded game on the webpage is fun, except the game barely works and it's odd they didn't attempt any polish/QA for what is ostensibly a PR announcement. It just adds more fuel to the fire to the argument that vibecoding results in AI slop.

To be fair the premise is that they 1 shotted it. I'd just be suspicious if it were any better (the POC is that is just about works)

I agree, if it had been polished I would have not trusted the demo at all, the fact it shows what you can potentially expect from a one-shot is cooler.

Hey, that's great OpenAI. Now add about 6 zeroes to the end of the weekly token limit for your customers and maybe we could use the app

Maybe I'm just not getting it, but I just don't give a flying fuck about any of this crap.

Like, seriously, this is the grand new vision of using a computer, this is the interface to these LLMs we're settling on? This is the best we could come up with? Having an army of chatbots chatting to each other running basic build commands in a terminal while we what? Supervise them? Yell at them? When am I getting manager pay bumps then?

Sorry. I'll stick with occasionally chatting with one of these things in a sandboxed web browser on a single difficult problem I'm having. I just don't see literally any value in using them this way. More power to the rest of you.



Given the prevalence of Opencode and its ability to use any model and provider I don't see reason why would anyone bother with random vendors half-assed tools.

For starters, money. There is no better value out there that I'm aware of than Claude Code Max. Claude Code also just works way better than Opencode, in my experience. Though I know there are those that have experienced the exact opposite.

I find Claude Code bloated and a bit clunky. Those same Claude models work better in Opencode, where I can also combine them with other providers.

The fact that Anthropic recently started blocking their coding plans usage from other tools is telling. They are in the phase where they realize they can't compete in an open field and need to go back behind their fortress gates and hope to endure a siege from the stronger opponents.


Are you calling OpenAI a random vendor?

That's like calling Coca Cola a random beverage vendor


Yes, OpenAI is a random development tool vendor. In the same way Volkswagen is a random sausage vendor.

Do you drink your Coca Cola directly from the Coca Cola packaged bottle?

Or do you prefer to sip it in the cup of your choice and drink it from there? The same cup you use to drink Pepsi, Fanta, milk, and other beverages.


why would it need local network access though, I wonder?

No Linux support? :(

> and we're doubling the rate limits on Plus, Pro, Business, Enterprise, and Edu plans.

I love competition


Tried it, not impressed. Terrible UX and generally just confusing. Didn't really intuitively know where to go and why. The thing that made me most mad was the clunkiness around pulling specific files into the chat input as context. Like wtf, bad, bad, bad!

I'm sorry, but the music on the demo video is the most atrocious nonsense. I know it's crazy, but it makes me hate the app. openAI is falling off hard.

For pure code generation is ChatGPT 5.2 so much better than Claude opus 4.5 thinking to have me switch? I’m basically all in on Claude.

Sure I could move to open code and use them as commodities but I’ve gotten use to Claude code and like using the vendors first party app.


I really want to like the native Mac app aesthetic but I kinda hate it. It screams minimalist but also clearly tells me it’s not meant for a power user. That ruggedness and sensitivity is missing.

What are the max context sizes ?

Kind of embarrassing to demo "Please change this string to gpt-5.2". Presumably the diff UI doesn't let you edit the text manually? Or are they demonstrating being so AI-brained you refuse to type anything yourself?

It keeps offering me to "Get Plus" even though I am signed and already have a Plus plan.

Codex really grown on me lately. I re-signed to try it out on a project I have and it turned out to be really great addition to my toolkit.

It isn't always perfect and it's cli (how I mostly use it) isn't as sophisticated as OpenCode which is my default.

I am happy with this app, I am using Superset, terminal app which suprisingly is well positioned to help you if you work in cli like I do. But like I said, new desktop app seems like a solid addition.


> Work with multiple agents in parallel

But you can already do that, in the terminal. Open your favourite terminal, use splits or tmux and spin up as many claude code or codex instances as you want. In parallel. I do it constantly. For all kinds of tasks, not only coding.


But they don't communicate. These do.

Does the Codex app host MCP Apps?

Eh. Kicked the tires for a few minutes. Back to the old clunker app.

No worries. I'm not their target demographic, anyway.


> We're also excited to show more people what's now possible with Codex . For a limited time we're including Codex with ChatGPT Free and Go, and we're doubling the rate limits on Plus, Pro, Business, Enterprise, and Edu plans.

Translated from Marketingspeak, this is presumably "we're also desperate for some people to actually use it because everyone shrugged and went back to Claude Code when we released it".


I dunno, feels like the models have different weak/strong points, sometimes I can sit with Claude Code for an hour with some issue, try it with Codex and have it solved in five minutes, and also the opposite happens. I tend to use Codex mostly when I care more about correctness and not missing anything, Claude when it's more important I do it fast and I know exactly what it needs to do, Codex seems to require less hand-holding. Of course, just anecdotal.

GPT models definitely seem stronger when they "get it" and in the types of problems they "get", while claude seems more holistic but not "as smart" as some of the spikes GPT can get.

Yeah this is clearly just a marketing re-release but if they've executed well i'm happy to try it

They also claim 2x usage from December (though 2x a tiny amount is still a tiny amount)

cool, another mac app, fuck windows users i guess

its a mac-only electron app

Another boring update from OpenAI. Why would I want an orchestration tool tied to one model? Part of the value of orchestration tools is using the most appropriate and/or cost effective model for the task, sub-task, etc



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: