Ph.D. Dissertation Defense “A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation of Emoji using EmojiNet” By Sanjaya Wijeratne

Monday, November 19, 2018, 10 am to Noon
Campus: 
Dayton
366 Joshi
Audience: 
Current Students
Faculty

Ph.D. Committee:  Drs. Amit Sheth, Advisor, Derek Doran, Krishnaprasad Thirunarayan, and Wenbo Wang (GoDaddy Inc.)

Thesis Statement: Machine-readable emoji sense repositories can be created and used to enable a substantially better understanding of the emoji meaning in text contexts. This is useful for improving the performance of downstream applications such as emoji sense disambiguation and calculating emoji similarity.

Abstract

The ability to automatically process and interpret text fused with emoji will be essential as society embraces emoji as a standard form of online communication. Since their introduction in the late 1990's, emoji have been widely used to enhance the sentiment, emotion, and sarcasm expressed in social media messages. They are equally popular across many social media sites including Facebook, Instagram, and Twitter. Processing emoji using traditional Natural Language Processing (NLP) techniques is a challenging task due to the pictorial nature of emoji and the fact that (the same) emoji may be used in different contexts and cultures to express different meanings. Their polysemous nature complicates tasks such as emoji similarity calculation and emoji sense disambiguation. Having access to machine-readable sense repositories that are specifically designed to capture emoji meaning can play a vital role in representing, contextually disambiguating, and converting pictorial forms of emoji into text, enabling NLP techniques to process this new medium of communication.  

This dissertation presents EmojiNet, the largest machine-readable emoji sense inventory that links Unicode emoji representations to English meanings extracted from reliable online web sources. EmojiNet consists of: (i) 12,904 sense labels over 2,389 emoji linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated with emoji senses based on word embedding models; and (iii) for some emoji, discrepancies in their presentation on different platforms. It further presents methods for emoji similarity evaluation and sense disambiguation uniquely enabled by EmojiNet.  Emoji similarity methods are formed using word embedding models and are evaluated over a number of corpora. Those same embedding models are further used to carry out accuracy of emoji sense disambiguation. The EmojiNet framework, its RESTful web service, and benchmark datasets created as part of this dissertation are publicly released at http://emojinet.knoesis.org/.

Relevant publications: http://knoesis.org/Library?f%5Bsearch%5D=Sanjaya

For information, contact
Log in to submit a correction for this event (subject to moderation).