To be honest, I think this approach to working with emojis is very flawed. It seems to be most obvious on Slack, where multi-component emojis are regularly split into separate characters.
This should really not be `family_mothers_two_girls`, nor should it be "Family: Woman, Woman, Girl, Girl" as Emojipedia describes it (https://emojipedia.org/family-woman-woman-girl-girl/). It should be (and is, in the underlying unicode) `woman,woman,girl,girl` - the zero width joiners being represented as commas.
On Slack, when I post the "Man Singer, Medium-Light Skin Tone" (https://emojipedia.org/male-singer-type-3/) emoji, it splits it up into a man with medium-light skin tone, followed by a microphone. This is because it has replaced my emoji with:
:man::skin-tone-3::microphone:
What Slack should be doing is storing my emoji as:
:man,skin-tone-3,microphone:
This differentiates between zero-width-joined emojis, and multiple separate emojis. Currently there is no difference, and there is ambiguity about what my intent was - currently it just has to guess about which might want to be joined.
Of course this also means that Slack doesn't just work with new emojis or newly combined emojis - they all have to be added and supported manually. That defeats the entire point of the emoji standard! Instead of storing labelled versions of the emojis, why not just consider storing the actual unicode code points...?
For what it's worth, Unicode's CLDR names Unicode characters in multiple languages [1], although this work is done in phases, and usually the English names are ready before others. Unicode also maintains an informative emoji chart [2], which includes comparison pictures among popular glyph-sets, as well as an English short name.
I named only a very few, those I had to touch anyway while compiling the list. But most of the namings come from a source list I merged into the final list.
And there are also still many names missing, need to figure out how to add them automatically from the Unicode website or any other Emoji resource; because doing it manually is not what I want to do :)
I would guess the issue was the emojis themselves. I had to disable them on slack as it caused unbearable (read: fan spun up) load when someone overused them.
EmojiNet has a comprehensive list emoji with their machine-readable meanings. Check out the papers for more information - http://emojinet.knoesis.org/dataset.php
Case in point:
This should really not be `family_mothers_two_girls`, nor should it be "Family: Woman, Woman, Girl, Girl" as Emojipedia describes it (https://emojipedia.org/family-woman-woman-girl-girl/). It should be (and is, in the underlying unicode) `woman,woman,girl,girl` - the zero width joiners being represented as commas.On Slack, when I post the "Man Singer, Medium-Light Skin Tone" (https://emojipedia.org/male-singer-type-3/) emoji, it splits it up into a man with medium-light skin tone, followed by a microphone. This is because it has replaced my emoji with:
What Slack should be doing is storing my emoji as: This differentiates between zero-width-joined emojis, and multiple separate emojis. Currently there is no difference, and there is ambiguity about what my intent was - currently it just has to guess about which might want to be joined.Of course this also means that Slack doesn't just work with new emojis or newly combined emojis - they all have to be added and supported manually. That defeats the entire point of the emoji standard! Instead of storing labelled versions of the emojis, why not just consider storing the actual unicode code points...?