Ph.D. Dissertation Proposal Defense An Automated Assessment of Linked Data Vocabularies By Stella Sam

Thursday, July 11, 2019, 9:30 am to 11:30 am
Campus: 
Dayton
405 Russ Engineering Tait Conference Room
Audience: 
Current Students
Faculty
Staff

Ph.D. Committee:  Drs. Pascal Hitzler (advisor), John Gallagher, TK Prasad, and Karl Hammar (Jonkoping University, Sweden)

ABSTRACT:

The volume of Linked Data (LD) being published continues to rise tremendously. Hence the quality of LD has become of great interest to LD owners, publishers, engineers, as well as users, who look for assurances regarding the trustworthiness of, not only the data, but also its source and reuse.  However, the meaning of quality of LD is far from straightforward, and can suggest different things to different people in different contexts. Many recent studies have attempted to address this issue through dimensions that are internal to the dataset itself. An editorial in 2014 presents the principles of 5 Stars of LD Vocabulary Use. It introduces the concept that the quality of a linked dataset can be indirectly measured by how its base/underlying vocabulary reuses, and is being reused, by the base vocabularies of other linked datasets. The idea is that a good vocabulary should give us a measure of how powerful, effective, and easy it is, to use LD and be able to restrict the potential interpretations of the classes and roles used in LD querying, defining their intended purpose. This work is a study of these principles, which help to rank a linked dataset from anywhere between zero and five Stars. It includes a manual review process that establishes the baseline results, an automation of the ranking process, and a comparison of this process with other LD quality measurement systems. This thesis seeks to establish that the base vocabulary of a linked dataset does indeed have an impact on the quality of the dataset.

For information, contact
Log in to submit a correction for this event (subject to moderation).