Emoji-list with emojis, names, shortcodes, unicode and html entities

BillinghamJ · on Dec 18, 2017

To be honest, I think this approach to working with emojis is very flawed. It seems to be most obvious on Slack, where multi-component emojis are regularly split into separate characters.

Case in point:

    {"emoji": "‍‍‍", "name": "family_mothers_two_girls", "shortname": "", "unicode": "", "html": "&#128105;&zwj;&#128105;&zwj;&#128103;&zwj;&#128103;", "category": "p", "order": ""},

This should really not be `family_mothers_two_girls`, nor should it be "Family: Woman, Woman, Girl, Girl" as Emojipedia describes it (https://emojipedia.org/family-woman-woman-girl-girl/). It should be (and is, in the underlying unicode) `woman,woman,girl,girl` - the zero width joiners being represented as commas.

On Slack, when I post the "Man Singer, Medium-Light Skin Tone" (https://emojipedia.org/male-singer-type-3/) emoji, it splits it up into a man with medium-light skin tone, followed by a microphone. This is because it has replaced my emoji with:

    :man::skin-tone-3:‍:microphone:

What Slack should be doing is storing my emoji as:

    :man,skin-tone-3,microphone:

This differentiates between zero-width-joined emojis, and multiple separate emojis. Currently there is no difference, and there is ambiguity about what my intent was - currently it just has to guess about which might want to be joined.

Of course this also means that Slack doesn't just work with new emojis or newly combined emojis - they all have to be added and supported manually. That defeats the entire point of the emoji standard! Instead of storing labelled versions of the emojis, why not just consider storing the actual unicode code points...?

matchai · on Dec 18, 2017

I typically use the json file from emojilib when parsing emojis.

It is regularly updated and can be added to JS projects as a dependency.

https://github.com/muan/emojilib/blob/master/emojis.json

makepanic · on Dec 18, 2017

There's also https://github.com/milesj/emojibase which is pretty complete.

They're the only one i found to have a simple to use list of emoji groups.

inex · on Dec 18, 2017

Yeah there are a lot of lists - but I really needed the HTML Codes for every Emoji. That was the main focus of the list.

yorwba · on Dec 18, 2017

Why don't you just compute the HTML Codes from the Unicode codepoints?

inex · on Dec 18, 2017

Valid point, but that would require server- or client-side processing, right?

I intended with the list in JSON to deliver a as solid base as possible for anyone available to process it like needed.

However, any recommendations for optimization’s or alike are very welcome!

jwilk · on Dec 18, 2017

stevemk14ebr · on Dec 18, 2017

This is one of those things that you don't really need, but may REALLY need at some point in the future.

Also who names emojis, are they author named!?

Also, Also. Github could use some optimization, forking this froze my browser. Rendering the page took like 20 seconds.

niftich · on Dec 18, 2017

For what it's worth, Unicode's CLDR names Unicode characters in multiple languages [1], although this work is done in phases, and usually the English names are ready before others. Unicode also maintains an informative emoji chart [2], which includes comparison pictures among popular glyph-sets, as well as an English short name.

[1] http://cldr.unicode.org/#TOC-What-is-CLDR- [2] http://unicode.org/emoji/charts-5.0/full-emoji-list.html

valleyer · on Dec 18, 2017

> Also who names emojis

Developers of screen readers, for one.

If you're on a Mac, use VoiceOver or the "say" command-line tool on an emoji.

hutattedonmyarm · on Dec 18, 2017

> Also who names emojis, are they author named‽

Actually, the Unicode Consortium names (when you hover over the emoji) http://www.unicode.org/emoji/charts/emoji-versions.html#v9.0...

inex · on Dec 18, 2017

I named only a very few, those I had to touch anyway while compiling the list. But most of the namings come from a source list I merged into the final list.

And there are also still many names missing, need to figure out how to add them automatically from the Unicode website or any other Emoji resource; because doing it manually is not what I want to do :)

orliesaurus · on Dec 18, 2017

Wow Github could do with some "pagination" for large files - opened the link on my not-i7-macbook and felt the performance hit...

rplnt · on Dec 18, 2017

I would guess the issue was the emojis themselves. I had to disable them on slack as it caused unbearable (read: fan spun up) load when someone overused them.

chewmieser · on Dec 18, 2017

The obfuscated "internet chemotherapy" script caused the same issue on GitHub, but viewing raw seems fine.

Something to do with the size of the file I guess...

sanjrockz · on Dec 18, 2017

EmojiNet has a comprehensive list emoji with their machine-readable meanings. Check out the papers for more information - http://emojinet.knoesis.org/dataset.php

aj7 · on Dec 18, 2017

Where’s the pile-of-shit one?

inex · on Dec 18, 2017

woah how could I miss that one - will add it!

Officially it seems being called "hankey" or "pile of poo" (first may be related to that South Park episode?)

zaarn · on Dec 18, 2017

Personally I prefer "the poop emoji" since basically everybody I know knows which one that is.

inex · on Dec 18, 2017

It was actually already in the list - but added meta data now: https://gist.github.com/oliveratgithub/0bf11a9aff0d6da7b46f1...

And it's poop, no ice-cream, I think we most would agree

ioulian · on Dec 18, 2017

you mean chocolate ice-cream?

reificator · on Dec 18, 2017

It's definitely canonically named Pile of Poo.

http://unicode.org/cldr/utility/character.jsp?a=1F4A9

tomkinson · on Dec 18, 2017

Nice tnx