Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To answer the author's question: No, SIMD would not help with this.

This is neat. I wonder if the author would be willing to write a Kaitai Struct definition for it.

Something else interesting: QOI is 1,2,2 letters off from PNG. I'm quite certain this is an accident but it's interesting nonetheless.



I suspect SIMD would help with the encoding. The lookup table is small enough to fit into 8 AVX2 registers, so instead of hashing, you could use direct lookup, which would improve compression ratio further (a little bit).


> The lookup table is small enough to fit into 8 AVX2 registers

Indeed.

> so instead of hashing, you could use direct lookup

However, I don’t think that part gonna work. See the code using that table: https://github.com/phoboslab/qoi/blob/master/qoi.h#L324-L328

SIMD registers aren’t indexable (at least not on AMD64), the register needs to be known to the compiler.

Lanes within each register aren’t indexable either. The insert and extract instructions are encoding lane index in the code. There’re workarounds for this one, like abusing vpshufb or vpermd, but with that amount of overhead I doubt SIMD will deliver any profit at all.


Yes encoding might benefit of course, I was more considering decoding speed I suppose.


There are some clever tricks that can be pulled with the latest instructions sets like AVX-512. The registers are huge and the instructions available are so varied that there are clever ways to use them in "off label" ways to implement lookup tables and bit-level parsers.


AVX-512 is deprecated and will be removed from future chips, so that's probably not as useful as one might think.


I doubt that. What would be hilarious is if AMD's upcoming Zen 4 supported it consistently, but not all Intel CPU models did.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: