Masters Thesis Defense “Distance Learning and Attribute Importance Analysis by Linear Regression on Idealized Distance Functions” by Rupesh Kumar Singh

Tuesday, May 16, 2017, 10 am to Noon
Campus: 
Dayton
304 Russ Engineering
Audience: 
Current Students
Faculty

Committee:  Drs. Guozhu Dong, Advisor, Keke Chen, and Michelle Cheatham

ABSTRACT:

A good distance metric is instrumental on the performance of many tasks including classification and data retrieval. However, designing an optimal distance function is very challenging, especially when the data has high dimensions. Recently, a number of algorithms have been proposed to learn an optimal distance function in a supervised manner, using data with class labels. In this thesis we proposed methods to learn an optimal distance function that can also indicate the importance of attributes.

Specifically, we present several ways to define idealized distance functions, two of which involving distance error correction involving KNN classification, and another involving a two-constant defined distance function. Then we use multiple linear regression to produce regression formulas to represent the idealized distance functions. Experiments indicate that distances produced by our approaches have classification accuracy that are fairly comparable to existing methods. Importantly, our methods have added bonus of using weights on attributes to indicate the importance of attributes in the constructed optimal distance functions.

Finally, the thesis presents importance of attributes on a number of datasets from the UCI repository.

Keywords: Distance learning; move bad neighbors out, global class gap; two-constant distance; weighted distance function; Manhattan; Euclidean

For information, contact
Log in to submit a correction for this event (subject to moderation).