Masters Thesis Defense “Identifying Tweets with Implicit Entity Mentions” By Adarsh Alex

Tuesday, August 16, 2016, 2 pm to 5 pm
Campus: 
Dayton
366 Joshi
Audience: 
Current Students
Faculty

Committee:  Drs. Amit Sheth, Advisor, TK Prasad, and Tanvi Banerjee

ABSTRACT:

Social networking sites like Twitter and Facebook have become a significant source of user-generated content in the past decade. Mining of this user-generated content has proved beneficial for a broad range of applications like event monitoring, trend detection, and sentiment analysis. Identifying entities is one of the major tasks that is required by above tasks. Identification of entities is typically performed in two steps; Named Entity Recognition (NER) and Entity Linking. State of the art NER solutions focus on recognizing the entities that are mentioned explicitly in text. However, entities are frequently mentioned implicitly in tweets. For example, `Didn't know that it’s the same actress in Fault in our stars and Divergent.' contains explicit mentions of movies Fault in our stars and Divergent while it has implicit mention of actress Shailene Woodley.  Identifying that above tweet has an implicit mention of entity of type actress is the initial step towards identifying the implicit mention of Shailene Woodley in this tweet. In this thesis, we propose a two step semantic driven approach to address the challenge of identifying tweets with implicit entity mentions of a given entity type. Specifically, we answer two research questions in this thesis:

1. How to find tweets that have implicit entity mentions of a given type?
2. What features of a tweet help to distinguish tweets with implicit entities from others?

We answer the first question by developing a technique to find semantic cues that might indicate the presence of an implicit entities in tweets. The second research question is answered by exploiting the syntactic features of the tweets, along with semantic features extracted from crowdsourced knowledge bases like Wikipedia and DBpedia to determine whether tweet has implicit mention or not. We evaluate our approach by creating a gold standard dataset for three domains namely movies, books.

For information, contact
Log in to submit a correction for this event (subject to moderation).