I don't quite understand how 'retain' and 'release' can be more *memory* efficie...

ianhowson · on Nov 24, 2020

I don't think retain/release perf has anything to do with memory consumption, but I have seen a bunch of reviews claiming that 8GB is perfectly fine.

This is fascinating to me, because:

(a) every 8GB Mac I've used in the past has been unusably slow

(b) since upgrading my 32GB Hackintosh to Big Sur, my usual 40GB working set is only about 20GB.

(c) My 2015 16GB MBPr with Big Sur is also using about half as much physical memory on the same workload. Swappiness is up a little, but I haven't noticed.

So my guess is that something in Big Sur has dramatically reduced memory consumption and that fix is being commingled with the M1 announce.

megablast · on Nov 24, 2020

Every 8gb mac I’ve used before was fine, and that was with running Xcode, Firefox, photoshop, mail, terminal, and other programs.

crazygringo · on Nov 25, 2020

Seriously, I'm utterly baffled by all the people claiming that 8 GB isn't enough for the average user.

The only situation I ever ran into where it was a problem was in trying to run multiple VM's at once.

Otherwise it's just a non-issue. Programs often reserve a lot more memory than they actually use (zero hit in performance) so memory stats are misleading, and the OS is really good at swapping memory not touched in a while to the SSD without you noticing.

Yes, sometimes it takes a couple seconds to switch to a tab I haven't touched in Chrome in days because it's got to swap it back in from the SSD. Who cares?

ianhowson · on Nov 25, 2020

> people claiming that 8 GB isn't enough for the average user

I'm not claiming anything of the sort.

My point is that memory consumption seems to be greatly reduced in Big Sur, and that might make 8GB machines much better to use than before. All of my testing is on Intel machines. It's not exclusively an M1 phenomenon.

I would still recommend 16GB to anyone, and if the extra $200 was a factor, I would recommend that they buy last year's Intel with 16GB of RAM.

ogre_codes · on Nov 25, 2020

> I'm not claiming anything of the sort.

You kind of did:

> (a) every 8GB Mac I've used in the past has been unusably slow

Personally, I stick to 16GB+ as well. Though my wife's Mac likely has 4GB since it's something like 8 years old and it runs just fine.

_a1_ · on Nov 25, 2020

> Otherwise it's just a non-issue.

Nah, sorry, but you're wrong. I had to upgrade my laptop because I wanted to run Firefox, IntelliJ IDEA and an Android emulator on the same machine. Nothing else. This was not possible on 8GB ram.

So it's not like multiple VMs are needed and above scenario is pretty average for a common mobile developer (but still not an average user, I admit)

Second thing is, lots of games require 16 GB RAM. Maybe gamers are still not average users, I don't know.

rsynnott · on Nov 25, 2020

> an Android emulator

Well, that's one (largish) VM.

_a1_ · on Nov 26, 2020

Large? It uses 1.5GB of RAM on average. Common Windows VM uses 6GB if you want it to be able to run a common application on it.

jlokier · on Nov 25, 2020

For me with 16GB in an MBP, there is currently 20.5GB used + swap, and I haven't even started Firefox today, that would add another ~6GB or so.

Usually if I'm running Safari, Firefox and my 4GB Linux VM, that's 16-18GB used up in those. At the moment I have a few other things open, PDF viewer, Word, iTerms, Emacs etc, but nothing huge.

Most of the time this level of usage is ok, but I've had times where I've had to wait 30+ seconds for the UI to respond at all (even the Dock or switching workspaces) and wondered if the system had crashed.

For that reason I'm generally waiting for the next 32GB model before committing, that's assuming I stick with Apple instead of switching back to Linux (which I used for ~20 years before trying the MBP).

bnegreve · on Nov 25, 2020

> Programs often reserve a lot more memory than they actually use (zero hit in performance) so memory stats are misleading, and the OS is really good at swapping memory not touched in a while to the SSD without you noticing.

The stats are absolutely reliable because no physical memory page is allocated until it is actually used to store something. So allocating a large chunk of unused memory wouldn't show in the (physical) memory usage stat.

woah · on Nov 25, 2020

It's the never ending mystery of Hacker News- how do the commenters on here get their computers to run so slowly?

harrygeez · on Nov 25, 2020

Docker and a few language servers can seriously drag your MacBook with 8GB RAM. Throw in another Android emulator...

throw0101a · on Nov 25, 2020

> Seriously, I'm utterly baffled by all the people claiming that 8 GB isn't enough for the average user.

I may be not average, but just my Safari can take up >20GB—at least that's how much my "free" grows in MenuMeters when I quit it.

3np · on Nov 25, 2020

I pretty much daily have to do a closing round to not run out of my 24GiB. That’s all web browsers (Usually 100-200 tabs), vs code with some extensions and 2x4k display.

crazygringo · on Nov 25, 2020

But what do you even mean "run out"? This is what I don't get.

If you have multiple browsers with hundreds of tabs, the majority of those tabs are probably swapped out to your SSD already.

With swapfiles and SSD's, physical memory is less and less relevant except when you're performing very specific computational tasks that actually require everything to be simultaneously in memory -- things like highly complex video effects rendering.

How do you measure "running out" of your 24 GiB? And what happens when you do "run out"?

ianhowson · on Nov 25, 2020

As a human, when I have many tabs open, I observe that everything gets really slow. All applications get slow, but especially the browser.

So I put on my engineering hat and pull up Activity Monitor and further observe (a) high memory pressure, (b) high memory consumption attributed to Chrome or Firefox, (c) high levels of swap usage, (d) high levels of disk I/O attributed to kerneltask or nothing, depending on macOS version, which is the swapper task.

I close some tabs. I then observe that the problems go away.

Swap isn't a silver bullet, not even at 3Gbytes/sec. It is slow. I haven't even touched on GPU memory pressure which swaps back to sysram, which puts further pressure on disk swap.

It's slow.

malthaus · on Nov 25, 2020

It's the equivalent of having 50 stacks of paper documents & magazines sitting unorganized on your desk and complaining about not having space to work on.

A bigger desk is not the solution to this problem.

3np · on Nov 25, 2020

Some of us like our big loaded desks! No one really complained in this thread, just saying some of us will utilize a lot more memory than others.

sph · on Nov 25, 2020

If your tabs are swapped out to SSD, your computer feels incredibly _slow_. SSD are fast, yeah, but multiple orders of magnitude slower than the slowest RAM module.

You can run 4GB if you're fine with having most of your applications swapped out, but the experience will be excruciating.

Physical memory is still as relevant as it was 30 years ago. No offense but if you can't see the problem, you probably have never used a computer with enough RAM to fit everything in memory + have enough spare for file caching.

3np · on Nov 25, 2020

I don't swap. You can do all your arguments about why I should if you want but yes, there are legit reasons not to and there is such a thing as running out of memory in 2020.

crazygringo · on Nov 25, 2020

That's fine for you, but then it's disingenous for you to post your example without mentioning that.

It makes it unrepresentative and doesn't contribute anything useful to the conversation about memory requirements for normal operation.

Scarbutt · on Nov 25, 2020

4GB MBA user here, don't have any problems either running Chrome or Firefox with 10-20 tabs and iTerm (Safari does feel much faster than other two and my dev enviroment is on a remote server though).

wsc981 · on Nov 25, 2020

Have you tried doing Android dev using an Android emulator on an 8 GB Mac? Perhaps in combination with Xamarin?

I did and performance was horrible. For just iOS or macOS dev, 8 GB could be fine though.

jdminhbg · on Nov 24, 2020

iPhones and iPads also have relatively small amounts of RAM compared to Android devices in the same class, so I wonder if Apple is doing something smart with offloading memory to fast SSD storage in a way that isn't noticeable to the user.

raphaelj · on Nov 24, 2020

This is most probably more linked to Java/Kotlin vs Objective-C/Swift. Want an array of 1000 objects in Java ? You'll endup with 1001 allocations and 1000 pointers.

BillinghamJ · on Nov 25, 2020

Pretty sure you'll end up with exactly the same in an NSArray? ObjC is excellent but it's not doing anything fundamentally magic there

dwaite · on Nov 25, 2020

In Swift you can add value types to the heap-backed array directly, in ObjC you can use stack allocated arrays (since you have all of C) and there are optimizations such as NSNumber using tagged pointers.

mantap · on Nov 25, 2020

Theoretically Java should be more memory efficient because it makes fewer guarantees and can move memory around.

The advantage of Swift/ARC is not that it uses less memory per se but that it has a lower high water mark because memory is freed much sooner.

kllrnohj · on Nov 25, 2020

> Theoretically Java should be more memory efficient because it makes fewer guarantees and can move memory around.

Java makes a lot of memory guarantees that are hard to make efficient. Specifically in that it becomes extremely hard to have a scoped allocation. Escape analysis helps, but the nature of Java's GC'd + no value types means it's basically never good at memory efficiency. Memory performance can be theoretically good, but efficiency not really. That's just part of the tradeoff it's making. And nearly everything is behind a reference, making everything far larger than it could be.

Compaction helps reduce fragmentation, but it comes at the cost of necessarily doubling the size of everything being compacted. Only temporarily, but those high-water spikes are what kicks things to swap, too.

socialdemocrat · on Nov 25, 2020

Big difference is that Objective-C is a superset of C. Any Objective-C developer worth his/her salt will drop down to C code when you need to optimize. The object-oriented parts of Objective-C are way slower than Java. But the reason Objective-C programs can still outcompete Java programs is that you have the opportunity to pick hotspots and optimize the hell out of them using C code.

Object-oriented programs in Objective-C are written in a very different fashion from Java programs. Java programs tend to have very fine granularity on their objects. Objective-C programs tend to have interfaces which are bulkier, and larger objects.

That is partly why you can have a high performance 3D API like Metal written in a language such as Objective-C which has very slow method dispatch. It works because the granularity of the objects have been designed with that in mind.

rsynnott · on Nov 25, 2020

For those, Apple's favored approach to memory management (mostly reference counting) absolutely _is_ an advantage over Android's (mostly GC). That's not relevant when comparing an Intel and ARM Mac, tho.

saagarjha · on Nov 24, 2020

Not on iOS.

mhh__ · on Nov 25, 2020

> but I have seen a bunch of reviews claiming that 8GB is perfectly fine.

Has anyone actually used one of these long enough to fairly compare with an x year old laptop in regular use?

breatheoften · on Nov 24, 2020

I think the argument they were trying to get to but totally failed to make is possibly along these lines

huge memory bandwidth relative to ram size + os level memory compression => massive reduction in memory pressure for many many many workloads.

macos has supported memory compression for awhile now -- i would hypothesize that M1 may have massively improved that subsystem in ways that actually do translate into needing less memory on average for a lot of common real-world workloads that amount to "human timescale multitasking" between large working sets -- eg i click this app and it has a huge working set and then click into another that has a large working set and then click back -- with those clicks that represent application context switches occurring very very rarely in machine time scale.

If memory compression subsystem can swap working sets into and out of compressed memory space insanely quickly with low power usage then the os might've gotten very aggressive about using that feature to put not recently accessed memory into compressed memory space.

saagarjha · on Nov 24, 2020

I believe it was being brought up as an example of "Apple has designed their hardware around their software" and then that translates to "Apple's software does well on machines with less memory".

berkut · on Nov 24, 2020

Compared to something like Android, sure, I get that, but compared to ObjectiveC/Swift on x86 (which I think was being argued - i.e. against the Intel Macs)?

I guess it makes reference counting in general more efficient, I'm just saying I don't see why that would mean Apple Silicon Macs running ObjectC/Swift code would have less memory usage than the same code compiled and running on x86.

saagarjha · on Nov 24, 2020

I'm not necessarily convinced by the posted argument. That being said, I tend to think that people running a bunch of VMs and Electron apps and Docker cause them to use a bunch more RAM than I would consider to be "reasonable", and they've lost sight of how much you can do in a lesser amount of memory. (Typing this from a computer with 8 GB of RAM, which I have repeatedly been told is "below adequate" for development.)

LaGrange · on Nov 24, 2020

The problem is, by now development practices in many companies effectively force using multiple large containers. I know an x stack could use 4x less memory if I spent considerable time on ripping out unnecessary cruft, but few people in the company would agree that it's time well spent, and the home office allowance suffices for a machine with 32-64Gb RAM (especially in 2020, when I don't really see that much value in laptops for dev work anymore).

dwaite · on Nov 25, 2020

I believe the idea was that reference counting was more memory efficient than other forms of garbage collection, such as copy collectors and mark and sweep collectors which commonly make up generational garbage collectors.

Languages like Java also do not yet support stack-allocated value types outside a few primitives like integers, and heap allocations are both slower and less space efficient due to the indirection and memory management.

jhgb · on Nov 25, 2020

You do get stack-allocated value types in languages like Go or C#, though, so these shouldn't be subject to the same limitations that Java is.

pritovido · on Nov 25, 2020

It is a simple process, everything that you do in a language needs to be mapped into lower level instructions.

If the lower level hardware instruction does not exist, you use multiple of other instructions to emulate that.

If you add a low level instruction that maps a very common high level operation in hardware, you don't need to call 5 to 10 software functions(extremely expensive),each calling lots of opcodes but just can execute a single opcode and works by hardware beings extremely faster.

It is not hard to be better than Microsoft here. From my personal experience and having disassembled lots of their code they always were lazy bastards. They cared 0 about efficiency. Why should they? They had monopolies like Office or Windows giving them over 95% margins. They could just use the money they printed to buy everything instead of competing.

Lisp machines did that (adding opcodes that map the high level language) with the most common Lisp operators. Those machines were extremely expensive, in the hundreds of thousands of dollars because few could afford that. Apple sells in massive scale, in the hundreds of millions of CPUs per year, making this cheap for them.

_19qg · on Nov 25, 2020

> each calling lots of opcodes but just can execute a single opcode and works by hardware beings extremely faster.

Typically these language oriented instructions need to be implemented by microcode in the CPU. Often this does not create a fast system, but it helps to keep the compiler simple. Examples are typical Lisp Machines, you've mentioned. With RISC CPUs OTOH the idea is to make the CPU instructions more primitive and put more effort into optimizing compilers instead. There were a few attempts to combine (high) language supporting architecture and the RISC principle, but I personally have never seen such a machine.

> in the hundreds of thousands of dollars

Usually in the 'tens of thousands of dollars'.

sedatk · on Nov 25, 2020

Reference counting releases memory as soon as it gets dereferenced, while GC cleans up memory periodically, which means higher memory usage (more than what's actually in use at any moment).

berkut · on Nov 25, 2020

That explains a iOS vs Android difference (ARC vs Garbage Collection), but it doesn't explain the article (and Gruber's) apparent argument that Apple Silicon machines running native ObjectiveC/Swift code use less memory than the same apps natively built via ObjectiveC/Swift code on Intel running the same OS (but different machine code obviously).

cosmotic · on Nov 25, 2020

with lots of tiny, short lived objects, reference counting can be hugely inefficient.

dwaite · on Nov 25, 2020

Systems that can reap no longer needed objects rather than walking them can help here. The automatic approach is a copy collector, which is typically the more often approach of a generational garbage collector. Since a copy collector typically works by following references, this also increases data locality for machines with a small amount of L1 cache.

Garbage Collectors and JITs typically work best with hardware support, as you need to check pointer reads and writes as objects are being moved around or code is being rewritten. A lot of these systems use MMU gymnastics, such as mapping the same memory page into multiple locations with different permissions.

You also have systems where you create the objects knowing that they will be tiny and short-lived with a fixed lifetime, which can be hugely efficient. This is how Apache Bucket brigades work, since they know that other than a few special cases all memory allocated while handling a request will be garbage once a response is returned.

CyberDildonics · on Nov 25, 2020

Lots of tiny memory allocations are inefficient no matter what. Slight deallocation refinements to poorly made software (the reference counting part is not 'hugely inefficient') is focusing on the wrong thing.

cosmotic · on Dec 8, 2020

Lots of tiny memory allocations is pretty efficient in java. The VM would have already allocated memory from the kernel so there's no context switch, and once the tiny objects are no longer referenced, deallocation is a free (0 machine instructions) side effect of garbage collection. Garbage collection isn't free, but it can be cheap(er) than reference counting millions of objects with explicit and individual allocation and deallocation.

CyberDildonics · on Dec 10, 2020

There are lots of problems that aren't being addressed here.

First, java ends up doing a huge number of heap allocations that are just stack allocations for other system languages.

Second, java might have some heap allocation optimization, but it's still a huge performance sink to allocate in a tight loop.

Third, reference counting is not slow. Incrementing or decrementing an integer only when a variable isn't moved can be both cheap and rare. Even better, it is deterministic. Garbage collection gets its optimization from doing bulk operations, which is exactly what becomes a problem. Any speed up pales in comparison to the speed advantage of avoiding those allocations all together. Once allocations are not weighing down performance, the lack of pauses and deterministic behavior of reference counting is an even larger advantage.

You can say that memory 'has already been allocated from the kernel' but that is what heap allocators do in any language. Jemalloc maps virtual memory and puts it into pools for sizes and threads.

At the end of the day, taking out excessive allocations is usually a trivial optimization to make. It is usually trivial to avoid in the first place. Languages fighting their garbage collector and promising the next version will have one that is faster and/or lower latency is a cycle that has been going on before java was first released. At a certain point I think people should accept that stack allocations and moves of heap allocations take care of the vast majority of scenarios and actual reference counting in this context is not a problem. Variable with unknown lifetimes should only be needed when communicating with unknown components. Garbage collection on the other hand has been a constant problem as soon as there is any neccesity for interactivity.

trevyn · on Nov 24, 2020

They stuff the reference count in the unused bits of the 64-bit pointers.

gpderetta · on Nov 25, 2020

How would that work? The refcount need to be on the pointee, not on the pointer.

saagarjha · on Nov 25, 2020

It’s stored in the isa pointer inside the object.

gpderetta · on Nov 25, 2020

thanks, it makes sense now.

For what I understand, the isa pointer is sorta-kinda similar to a vtable pointer in C++.

saagarjha · on Nov 26, 2020

Yep. It’s actually a pointer to the class instance for the object, which is a full object that contains more information than a typical vtable might, but it serves as a “type ID” that the runtime can use to dispatch on.

berkut · on Nov 24, 2020

So just tagged pointers essentially? That's possible on x86 isn't it (unless it's an endian-thing)?

wtallis · on Nov 24, 2020

x86-64 was designed to prevent (or at least discourage) efficient use of tagged pointers, with the higher half/lower half split in the virtual address space. All the excess high-order bits you don't need for actual addressing are required to have the same value, so you effectively only get at most one tag bit.

saagarjha · on Nov 25, 2020

They’re required to have the same value upon dereference; there is no restrictions prior to this as assembly doesn’t care what a register is. The bits are appropriately masked off when necessary prior to using the pointer.

saagarjha · on Nov 24, 2020

It is and both both platforms use tagged pointers.

j16sdiz · on Nov 24, 2020

yes, but they have hardware acceleration for that specific tagged pointer type.

CountSessine · on Nov 24, 2020

Yikes. That's the same shenanigans that got them into trouble with the 68000. Everyone stuffed data into the top 8 bits of pointers because even though the 68000 had 32-bit addressing registers, it only had a 24-bit address bus and the top 8 bits were dontcare's. Then, the 6802x came out with more address lines and...

...and that's basically why x86_64 was specified to require a particular bit pattern in high-order bits - it was to stop applications and OS programmers from writing a bunch of software with tagged pointers which would tie Intel's and AMD's hands when adding address lines. I guess Apple is ok with tying their own hands.

duskwuff · on Nov 25, 2020

Tagged pointers are an officially accepted thing in ARM -- the relevant feature is called top-byte ignore (TBI). It only applies to the upper 8 bits of a pointer, leaving 56 bits for addressing.

lazyjones · on Nov 25, 2020

72PB ought to be enough for anyone...

saagarjha · on Nov 25, 2020

I believe Apple leaves this off.

CountSessine · on Nov 25, 2020

Today I Learned - thanks!

kstrauser · on Nov 25, 2020

Eh, the jump from 16-bit addressing to 32-bit was a factor of 65,536. The jump from 32-bit to 64-bit is 4,294,967,296x. Throwing away the top 8 bits drops it to an address space "only" 16,777,216 times bigger than 4GB. It seems like there's some headroom for growth in there.

jgon · on Nov 25, 2020

Doesn't this become less and less of an issue the more bits you add to your pointers? Like with 32 bits, you can't have one memory address per person on the planet earth, at 64 bits, you can have 1 pointer per atom that makes up the planet earth, and at 128 bits we're talking 1 address per atom in the known universe (or something like that, I haven't crunched the numbers exactly, this is more to give a flavor for the order of magnitude we're talking).

So if you cut off the top 8 bytes of a 32bit register and leave yourself with 24 bits, you can't even give a pointer to each person in Tokyo, but you cut off the top 8 bits of a 64 bit pointer you can still give a pointer to each atom of every human being on earth?

saagarjha · on Nov 25, 2020

The bits Apple uses are not “don’t cares” this time; they’re masked off before the pointer is used.