Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Emoji-list with emojis, names, shortcodes, unicode and html entities (gist.github.com)
88 points by inex on Dec 17, 2017 | hide | past | favorite | 23 comments


To be honest, I think this approach to working with emojis is very flawed. It seems to be most obvious on Slack, where multi-component emojis are regularly split into separate characters.

Case in point:

    {"emoji": "‍‍‍", "name": "family_mothers_two_girls", "shortname": "", "unicode": "", "html": "👩‍👩‍👧‍👧", "category": "p", "order": ""},
This should really not be `family_mothers_two_girls`, nor should it be "Family: Woman, Woman, Girl, Girl" as Emojipedia describes it (https://emojipedia.org/family-woman-woman-girl-girl/). It should be (and is, in the underlying unicode) `woman,woman,girl,girl` - the zero width joiners being represented as commas.

On Slack, when I post the "Man Singer, Medium-Light Skin Tone" (https://emojipedia.org/male-singer-type-3/) emoji, it splits it up into a man with medium-light skin tone, followed by a microphone. This is because it has replaced my emoji with:

    :man::skin-tone-3:‍:microphone:
What Slack should be doing is storing my emoji as:

    :man,skin-tone-3,microphone:
This differentiates between zero-width-joined emojis, and multiple separate emojis. Currently there is no difference, and there is ambiguity about what my intent was - currently it just has to guess about which might want to be joined.

Of course this also means that Slack doesn't just work with new emojis or newly combined emojis - they all have to be added and supported manually. That defeats the entire point of the emoji standard! Instead of storing labelled versions of the emojis, why not just consider storing the actual unicode code points...?


I typically use the json file from emojilib when parsing emojis.

It is regularly updated and can be added to JS projects as a dependency.

https://github.com/muan/emojilib/blob/master/emojis.json


There's also https://github.com/milesj/emojibase which is pretty complete.

They're the only one i found to have a simple to use list of emoji groups.


Yeah there are a lot of lists - but I really needed the HTML Codes for every Emoji. That was the main focus of the list.


Why don't you just compute the HTML Codes from the Unicode codepoints?


Valid point, but that would require server- or client-side processing, right?

I intended with the list in JSON to deliver a as solid base as possible for anyone available to process it like needed.

However, any recommendations for optimization’s or alike are very welcome!


Why?


This is one of those things that you don't really need, but may REALLY need at some point in the future.

Also who names emojis, are they author named!?

Also, Also. Github could use some optimization, forking this froze my browser. Rendering the page took like 20 seconds.


For what it's worth, Unicode's CLDR names Unicode characters in multiple languages [1], although this work is done in phases, and usually the English names are ready before others. Unicode also maintains an informative emoji chart [2], which includes comparison pictures among popular glyph-sets, as well as an English short name.

[1] http://cldr.unicode.org/#TOC-What-is-CLDR- [2] http://unicode.org/emoji/charts-5.0/full-emoji-list.html


> Also who names emojis

Developers of screen readers, for one.

If you're on a Mac, use VoiceOver or the "say" command-line tool on an emoji.


> Also who names emojis, are they author named‽

Actually, the Unicode Consortium names (when you hover over the emoji) http://www.unicode.org/emoji/charts/emoji-versions.html#v9.0...


I named only a very few, those I had to touch anyway while compiling the list. But most of the namings come from a source list I merged into the final list.

And there are also still many names missing, need to figure out how to add them automatically from the Unicode website or any other Emoji resource; because doing it manually is not what I want to do :)


Wow Github could do with some "pagination" for large files - opened the link on my not-i7-macbook and felt the performance hit...


I would guess the issue was the emojis themselves. I had to disable them on slack as it caused unbearable (read: fan spun up) load when someone overused them.


The obfuscated "internet chemotherapy" script caused the same issue on GitHub, but viewing raw seems fine.

Something to do with the size of the file I guess...


EmojiNet has a comprehensive list emoji with their machine-readable meanings. Check out the papers for more information - http://emojinet.knoesis.org/dataset.php


Where’s the pile-of-shit one?


woah how could I miss that one - will add it!

Officially it seems being called "hankey" or "pile of poo" (first may be related to that South Park episode?)


Personally I prefer "the poop emoji" since basically everybody I know knows which one that is.


It was actually already in the list - but added meta data now: https://gist.github.com/oliveratgithub/0bf11a9aff0d6da7b46f1...

And it's poop, no ice-cream, I think we most would agree


you mean chocolate ice-cream?


It's definitely canonically named Pile of Poo.

http://unicode.org/cldr/utility/character.jsp?a=1F4A9


Nice tnx




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: