Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Optane is magical and was mismanaged by both Micron and Intel. The article doesn't really express what we have lost.

The DIMMs were expensive and didn't have broad compatibility. They should have made them in a regular DDR3/4/5 format and half the cost of main memory, or 2x to 4x the density for the same price. The density roadmap had them doubling in capacity with each generation. 1TB per DIMM slot.

The Register is right in that it was the newest thing to come along in 50 years. Reliability, performance, flexibility, all could have made huge strides under PMEM/Optane architectures. Intel thought it had a golden goose (it did) and that the top profit margin enterprise workloads would flock to it.

Intel has everything it needs to succeed except for internal politics. But that is true of all large corpse these days. FPGA fabric built into the Optane DIMM would have been magic.



The article does however mention the basic issue which led to Optane's failure (at least if I understand it correctly): that all current OSes have a concept of primary and secondary storage, whereas for getting optimal performance out of Optane, you need to have several levels of memory storage (which apparently have to be managed by the OS, as opposed to CPU caches): a program's code can stay in the Optane memory, while the stack and heap have to be in "real" RAM to reduce wear on the Optane memory when the program is executed? So Intel would have had to develop an OS for it, or worked on extending e.g. the Linux kernel in that direction.


FWIW this article, like The Register articles of olde, did bring this point up.

Multics had this concept back 60 years ago; like many useful Multics ideas (cough security) it was jettisoned in Unix. Most interestingly, Organick’s book describes a hierarchy of fast to slow, with the slowest being tape backup :-).

More recently HP’s “The Machine” was designed this way — the article touches on this but if you don’t know what they are talking about that HP paragraph looks like it was accidentally jammed in.

With the increased use of databases and key stores plus the dominance of mobile platforms, this single store model will probably be back.


> like The Register articles of olde

Coo, thanks!

That is what I'm going for, yes. :-) Or at least, when I have the time, it is, anyway...

(If it's not obvious, I wrote the linked article.)


El reg has had two truly great writers in its history; perhaps you'll be the third!


Oh my word. I am honoured.

JOOI: who are you thinking of? I rate Orlowski, even though I disagree with Andrew about a lot of stuff IRL.


> or worked on extending e.g. the Linux kernel in that direction.

Iirc Intel was the biggest code contributor for 2021 to the kernel. So they already have the expertise.

Running code from Optane would likely be a mistake, though. Jumps would be very expensive due to the extra latency (unless they hit the cache). Perhaps it can work with page swaps but still.


There's already plenty of code to handle NUMA, it shouldn't have been too hard to add proper support for that.


It adds another layer - while NUMA is about latencies, Optane memory is about latency AND how frequently the information is changed (to minimize wear on the device). There are some obvious signals (disk caches would be one), but richer semantics would help a lot to make it perform.


I am a bit ignorant in OS stuff but isnt that managed by the filesystem? I think some people were using them as LOG drives in ZFS fs.


Be wary mixing Optane drives (what you seem to be referring to) and Optane memory (parent commenter).


That's described in the article, that they could be used in Linux, but only as a volume. Apparently (if I read the article correctly) that wasn't optimal for exploiting the higher performance.


As I understand it Linux's VFS and Block layer aren't really designed to exploit the type of throughput and latency that Optane provided.

What I'm curious about is what happened to all the PMem (Persistent Memory) and NVRAM (Non-volatile RAM) work that went into recent Linux kernels and libraries.


I thought that with DAX, it was possible to have a real file system, but access to the actual bits goes straight to the backing storage, not the page cache (as it would with traditional block devices). So the block layer and VFS inefficiencies would not necessarily matter, at least if you can use DAX.


> Apparently (if I read the article correctly) that wasn't optimal for exploiting the higher performance.

That can be compensated by tuning the price point. If the performance isn't there, make it cheaper until it's better than the alternatives.


This is Itanium all over again


Itanium was a bet on compilers being able to extract parallelism from the source code. Maybe JIT optimizations, where code is rewritten on the fly, more or less the same way as a reordering buffer works, would have done the trick, but, in the end, Itanium fell short of its performance promises. I'm not sure compilers alone can do it.


The trouble with Itanium was that memory latency is a limiting factor. A CPU knows what is in the cache but the compiler doesn’t. In fact, a major use of parallelism in CPUs is to hide latency and that involves a flexibility of scheduling greater than bundling groups of instructions together. (Think of 3 instructions hanging because 1 needs to load something as opposed to a system which might be able to barrel on and run a few non-dependent instructions.)


> A CPU knows what is in the cache but the compiler doesn’t.

You could kind of try to sidestep that by inserting preload instructions ahead of the instructions that would use the values but I agree that doing that at runtime would probably be better. Maybe a JIT runtime could help, but, still, that's beyond the control of the compiler.


https://www.researchgate.net/publication/3044999_Beating_in-...

This one was never realized AFAIK, but addresses a lot of the latency issues of the IPF designs due to their relatively static nature. Still does not take the massive cost of reservation stations. I think being open will be a major requirement going forward. It is hard to build trust in a black box and RISC-V is about to become a real alternative. Being modular, open and growing up from smaller systems seems like a winning strategy to me.


I can't wait to have a completely Windows-proof desktop again (even though my last SPARC still has an x86 coprocessor board) and my IBM PPC machine had Windows NT for it.


I did some research on

https://en.wikipedia.org/wiki/Transport_triggered_architectu...

which is an approach to CPU design that puts specialized CPU design in the reach of quite a few people (say on an FPGA) but the main weakness of it is that it is not smart at all about fetching. Although it is not hard at all to make that kind of CPU N-wide you are certainly going to have all N lines wait for a fetch.

Seems to me though that that kind of system could be built with a custom fetcher that would let you work around some of the challenges.


Also a major issue was the big iron shared memory systems that Itanium targeted. They collectively fell out of favor with the trend towards Virtualised x64 commodity clusters and later the cloud. I’d love to see an EPIC strategy for a CPU targeted at smaller systems. Perhaps implementing webassembly based actors in hardware and support for messaging and pmem. Wasmcloud in hardware.


The way I see, disaggregation will be a thing and we may be back to large shared memory racks in cloud datacenters. While an individual cloud machine can range from small to very large (effectively one fully dedicated host) a rack-sized server enables much larger (and profitable) offerings. Scaling up has limits, but it's very comfortable to have the option when needed.


Hardly an Intel specific problem (frowns at Alpha)


"corpse" - vs "corps" - kinda funny if it was intentional; maybe funnier if it wasn't


I mean, corpses do have everything in them to succeed; their cells just can't agree on a actually doing that anymore.


Not to be pedantic but most corpses alive today don't have anything resembling cell walls let alone cells.


To be pedantic, most corpses aren't alive today.


To further the pedantry, parent didn't make any claims about most corpses. Just most corpses that are alive today.


I saw the typo, chuckled and left it in.


I'm not sure I really understand why Optane went in the DIMM slots instead of the PCIe slots. I know PCIe storage is usually NVMe, but you could also expose the storage as a large memory mapped region. You wouldn't need explicit support from the cpu/chipset/motherboard, although you probably want resizable BAR support, cause the region would presumably be pretty large, and maybe you want to do something for older systems that can't handle a giant region. From the software side of things, PCIe mapped memory is more or less the same as physical memory; from the hardware side, you'd be taking a different path to get there, so you'd have different bottlenecks, but PCIe5 is here with 2x the bandwidth and there's a limited number of applications that really max out their PCIe lanes.


PCIe Optane was a thing and it achieved 10us latency whereas today's fastest SSDs get 40us. IIRC the DIMM version of Optane was <1us, literally an order of magnitude faster!

I'd expect PCIe to contribute ~3us latency and interrupt/context-switch/kernel to contribute ~5us. You could eliminate the latter, but 3us is still kinda slow.


An order of magnitude faster, but still a few cache hierarchies away from where the actual computing is happening.

But the real dealbreaker was that there are also similar indirections in the other direction. It's just so very, very common in modern architectures to not permanently store data on the first drive it hits. That first level of persistence is usually just some queue log where the write takes a short breath, safe from power failure for the first time, before getting passed on to some more distributed persistence story. Optane wouldn't change this general setup at all, because those layers exist for more reasons than just the power failure scenario. Optane might change implementation details of that first layer, but it wouldn't question the layers. Because optane is only answer to power failure, not to the data center burning down.


Optane is either the fastest SSD or the slowest RAM. If it is doing the job of RAM it might hurt performance, not help it. It’s not clear adding another level to the cache is really going to help.


For consumers notebook, Optane could be fast swap drive. It's pity manufacturers didn't get this done. Swap should be just fast to make application switch fast enough. Optane could do it.


Far too specific, plenty of work loads where that wouldn't make a difference at all. And of those remaining, chances are that a bump to the next main memory price tier would completely change things, more so than optane swap.

Where I'd expect the technology to really shine is in flash controllers: give that thing a generous serving of optane cache to play with in its fancy wear leveling rites, power-loss-persistent and sufficiently closely integrated (same board) to allow software to confidently assume "committed" without the flash even touched if the write happens within a tight working set (you might want to set certain file system parameters up for "it's ok to write the same address many times"). You could even include the optane in the advertised capacity! And the swap space use case would be included without even trying.


They kinda did that with Optane+flash 1TB M.2 consumer drives like the H10 and H20 series. Unfortunately they never merged the controllers so each one only gets 2 PCIe lanes and you need Intel's software (which is only compatible with Intel CPUs because why not) to do software RAID (which has a terrible reputation on Windows for losing data)

It achieved 14us latency (by acking writes before they hit flash) which is extremely good (better than today's fastest SSDs) but the 32G Optane cache doesn't go very far when most enthusiast consumers are gamers that blow the cache every time they download an update. A lotta games are also larger than 32G so if you use more a few large programs, the cache just isn't large enough. So you end up with a product that's kinda expensive, but still not quite good enough to take the performance crown.


Wow, so close, yet still a total failure. So the data would have to hit PCIe when copying/moving between optane and flash?


Yep! With the older CPUs it was released alongside, I believe it was even worse: The data would go through PCIe into the CPU cache, then into DRAM, and then back down PCIe to flash while wasting a few CPU cycles along the way. It looked terrible on benchmarks (literally half the MB/s compared to cheaper flash SSDs that used all 4 lanes) but wasn't too bad in practice since NVME drives can do many requests in parallel.

The real issue was size. I've got 100M CPU cache (5800x3d), 24G VRAM (3090), 64G DRAM, and 2TB flash. I'd need at least 128G of Optane for it to make sense in my cache hierarchy and improve game load times. I could get the $3000 data center Optane SSDs, but that's kinda hard to justify when it's as expensive as my entire PC.


I mean... with modern memory compression and SSD speeds, is this really neccesary? If you've got an NVMe SSD, you've most likely got enough bandwidth to never notice the swap kicking in (especially with a generous swappiness value). Sure there might be a couple workloads that could benefit from that kind of upgrade, but I think most people building memory-optimized rigs would rather just buy more memory, since it goes in the same DIMM anyways.


I mentioned consumers notebooks as main target for such optimization. A lot of "memory" for apps and real hibernation are main profits.



Sure, but looks like that was NVMe, not just a big bunch of bytes, so whatever magical future was enabled by having a big bunch of non-volatile bytes wasn't enabled by this. (unless there was a way to change it into big bunch of bytes mode?)


My guess is that to get low latency you have to be interacting with a memory controller that expects low latency (like dram) I don’t know how many cycles it would take as a difference, but my impression is that pcie trades throughput for latency compared to dram


IIRC permissible lane skew in PCIe is like 30 ns, which means that the latency due to deskewing alone in PCIe already approaches main memory latency.


Probably because the way they did it allowed it to participate in the caching hierarchy with MESI and all the rest of it.


[Article author here]

FWIW I tried to explain why that was the important aspect of the tech in the article.

I also covered it in some depth in my most recent FOSDEM talk, which is linked in the article. As is the script if you don't have time for the video.


I guess it's because DIMM is physically closer to CPU than PCIe, so you get better bandwidth thanks to that (think that the data travels at the speed of light - the closer the faster).


That is just not true, in any shape or form.

It wasn't mismanaged at all. Neither Intel or Micron could make it cost competitive. Optane had its shining moment when DRAM and NAND ( both are commodities like ) skyrocketed 3x the price, at one point making Samsung the most profitable company even surpassing Apple. There were roadmaps for power, reliability, performance and density improvement, literally everything except cost. Micron wasn't even the cheer leader, they were only happy to play along with Intel because Intel made the commitment to buy enough capacity for Micron to sustain the business.

And Optane could barely compete when DRAM and NAND price were at its peak, making zero profits on most of its revenue. Imagine when DRAM and NAND are back to normal.

The future of memory is either extremely low power on Mobile SoC or Ultra High Bandwidth in the server with 128 Cores. Both of these doesn't fit into Optane Roadmap.

The market is so small that despite Intel was giving away its optane product and selling it at zero margin they still couldn't fill the minimum orders commitment to Micron and had to paid its penalty.


[Article author here]

> Neither Intel or Micron could make it cost competitive.

Cost-competitive with what?

I think there are two misconceptions here.

1. It is a kind of memory, not disk. But it is non-volatile large-scale RAM. There isn't any other terabyte-scale non-volatile memory for it to compete against. So how can it fail to compete on cost when there was nothing for it to compete with?

2. As memory it was extremely competitive with conventional volatile memory, being much larger. ITRO n times (for integer values of n whatever that value may be) more Gb/$ is being competitive, IMHO.

Against flash storage: sure, more expensive, but orders of magnitude faster and orders of magnitude more write cycles -- that is highly competitive, isn't it?


I am very late to this reply.

At its peak, Optane Memory had double ( or nearly triple ) the memory capacity for the same price, at practically zero profit margin. Once DRAM prices falls, this advantage is gone. Remember this was compared to ECC high capacity DRAM module, the highest margin in the whole DRAM market.

It was ( and still is ) a wonderfully technology, but Optane Memory didn't serve a large enough market where non-volatile memory had obvious advantage. As memory it offers much lower bandwidth and response time than DDR4, which means substantial amount of Server workload does not fit Optane Memory's performance characteristics.

And again neither Intel or Micron is making any profits out of it. With no clear roadmap of cost reduction. Compared to conventional DRAM with DDR5 and NAND with Z-NAND.

The common misconception is that high write cycles, random read write, non-volatile memory had its place in the market for a price. Turns out this is a classic market mis-fit.


Um. I have to ask: did you read the article? All of it?

Because what you're offering as a comment is sort of a backwards version of the argument I'm making in the article.

My argument is that it flopped because of the monoculture of C21 OS design that means we lack OSes that can take full advantage of persistent-memory computers.

Yours seems to be that the kit wasn't competitive. I think that's a conflation of multiple category errors.

[1] It is not in the same product category as either DRAM or Flash.

It's not RAM: RAM is volatile, Optane is nonvolatile. But it can be used as primary storage, that is, appearing in the CPU memory map. It is not secondary storage, that is, storage requiring any kind of block-based controller handling.

It's not Flash: Flash is not word-writable and thus cannot be primary storage.

Because of decades of technical debt in C21 IT people do not know this vital primary/secondary distinction well. As a result they can only think in terms of two different types. Optane is not even neither type: it's both. It blurs the distinction. That's why it was important tech.

[2] Thus it is a fallacy to compare the price, or performance, or price:performance of a new tech that eliminates the primary/secondary split with either primary or secondary storage.

It is traditional in IT to make comparisons with automobiles.

It's like criticising cars because they are not good bicycles and they are not good aeroplanes.

These are different vehicles with different characteristics for different types of transport. To say that cars are bad bikes, or bad aeroplanes, means that you have failed to understand that cars are not either.

Cars are better for some things than bicycles. They are better for totally different things than aeroplanes.

You are trying to judge cars by the criteria of bicycles and aeroplanes, because you've never seen a car before and you're not used to thinking about cars. You're used to 2 categories and this is not either. It's not in the middle. It's a different category.

That is what the article was about.


Optane was still cheaper than DRAM modules of same capacity. And were twice larger for same price.

Really, I dream of Optane as a "super-fast swap drive" that "extends" RAM (that is what some manufacturers do with NVMe this days). Then it will be ease to have notebook with 256G "RAM" with actual 16G of DRAM and 256G Optane swap. Most users will not notice any difference compared to actual 256G DRAM since Optane is fast enough to make application switch fast. At least my Google Chrome and Intellij Idea could live in peace in such setup, each eating many gigabytes "RAM" since I don't code and watch sites at the same microsecond.


16GB optanes with an ordinary SSD interface are pretty widespread in China and DIYers do use them for swap/pagefile. The problem is that they take up a whole M.2 or NVMe slot with the corresponding number of PCIe lanes…

These drives generally come from budget laptops where the optane is originally used as the system disk. Not a good idea anyways and computer shops usually swap them out. You pay about 28 CNY for 16 GB, which is 2x the average SSD price-per-gig.


DRAM interfaced Optane wouldn't consume NVMe lanes. It should be soldered same way as usual DRAM is soldered in most non-professional (but home or bussiness) notebooks this days.

For example: MacBook M1/M2 has fixed limiter RAM. But with Optane as swap it could be four-eight times larger without user notice on difference.


> The DIMMs were expensive and didn't have broad compatibility. They should have made them in a regular DDR3/4/5 format and half the cost of main memory, or 2x to 4x the density for the same price.

Were the prices intentionally high or was that just because they couldn't compete with the economies of scale of DRAM manufacturers?


The prices were low if you view them as RAM, the densities were crazy. The prices were extremely high if you view them as a small cache SSD in front of your spinning rust, which was how they were marketed and sold.


You're asking for density or cheaper prices, but neither happened. The endurance improved, the IOPS improved, even latency a bit, but density didn't. Or it would've been cheaper than it is. I want to relate this to other forms of storage that went by the wayside, but I hope someone else will try again with a different roadmap and make it work.

FGPAs or SOCs with Optane for instant on/low latency applications would've been amazing.


> FGPAs or SOCs with Optane for instant on/low latency applications would've been amazing.

Looks like a company called Everspin is working on exactly that:

> "will provide FPGA system designers with extremely fast configuration, instant-on boot capability, and rapid updates of critical application parameters such as weighting tables in AI applications."

https://www.theregister.com/2022/08/02/kioxia_everspin_persi...


I don't think it's "gone forever" it's just gonna sleep for a while like "dumb terminals" now we have a hybrid with tons of cloud apps (and most things headed that way) and local apps and mixes of the two where it makes more sense. The same will happen with something like optane. I'm surprised they didn't push it more in "larger" embedded systems since it would have benefitted well from that NVRAM goodness, probably more so than PCs and cloud compute.


Is it possible to have DRAM connected to the bus, but without resetting the DRAM upon reboot? If that is possible, then you would have some form of persistence, provided that power never goes off.


Sadly, users usually reboot because some poorly written piece of software (probably written in C/C++ or Java) is out of wack and resetting RAM is the only way to fix it.

Also keeping RAM running all the time would eat electricity and generate heat.

But I like the general idea. It definitely would be a good option.


> users usually reboot because some poorly written piece of software

You can have that in Java - in any runtime environment, if your java processes keep consuming too much memory - swapping starts, on linux you will have the OOM killer, in the end the thing will reboot. Also operating systems can have bugs, hardware has bugs, nobody is perfect.

> Also keeping RAM running all the time would eat electricity and generate heat.

In a server room you wouldn't turn out the lights, ever.


Honestly people reboot not because something is poorly written, they reboot because it is too difficult to understand how to solve the problem so they go with the short term solution instead


> out of wack and resetting RAM is the only way to fix it.

This is not how this works. It doesn’t matter what language an application is written in, if a process is terminated, that frees all RAM consumed on any commonly used operating system.


>They should have made them in a regular DDR3/4/5 format

There was/is CXL


"large corpse" !!


> all large corpse

Freudian slip? :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: