Ph.D Dissertation Proposal Defense - "Semantics Based Summarization, Utilization, and Alignment of Semi-Structured Data on the Web" by Kalpa Gunaratna

Thursday, March 31, 2016, 10 am to 1 pm
Campus: 
Dayton
366 Joshi
Audience: 
Current Students
Faculty

Ph.D. Committee: Drs. Amit Sheth (advisor), TK Prasad (advisor), Keke Chen, Gong Cheng (Nanjing University, China), Edward Curry (NUIG, Ireland), and Hamid R. Motahari-Nezhad (IBM Research, USA)

ABSTRACT:

Processing of published structured and semi-structured content on the Web has been gaining increased attention with the rapid progress in the Linking Open Data initiative. The growth of hardware efficiency and technology has resulted in large portions of data published on the Web in the forms of datasets and knowledge graphs. We refer to datasets as containing mere data without much processing and enhancements in the forms of semantics as opposed to knowledge graphs where the data are providing more semantics and knowledge. Today, there exist large knowledge graphs (e.g., encyclopedic dataset like DBpedia) that encapsulate vast amount of knowledge for human and machine consumption. While it is good to have this much knowledge available on the Web, it leads to information overload when exploring for human consumption (and even for machines) and hence, proper summarization and presentation techniques need to be implemented. Further, these extracted knowledge in the form of knowledge graphs can be utilized to improve data retrieval techniques with more semantics compared to pure syntactic-based techniques. On the other hand, data published on the web can be aligned to query and analyze as a whole. Linking datasets on the Web happen at the instance level and problems arise when consuming them at the schema level especially for properties (relationships). Relationships capture much of the meaning of knowledge represented in the triple format and aligning them together with instance level can facilitate a proper global view of the data.

In this dissertation, we look deeper into the above mentioned sub-problems, that is: summarization, utilization, and alignment of the data and knowledge available on the Web. First, we investigate a concise and comprehensive (diverse and hence improved coverage) way of presenting entity centric information mentioned in knowledge graphs to human users for quick understanding. Then, we discuss matching similar documents on the Web (in bio-medical domain) utilizing extracted knowledge out of them (in the form of RDF triples and ontologies) that can capture the context better compared to bag of words/keywords concept. Finally, we explore relationship alignment between datasets on the Web using extension analysis that could be used for integration and comprehensive querying of the data.

For information, contact
Log in to submit a correction for this event (subject to moderation).