Masters Thesis Defense "Conditional Correlation Analysis" by Sanjeev Bhatta

Wednesday, May 3, 2017, 10 am to Noon
Campus: 
Dayton
304 Russ
Audience: 
Current Students
Faculty

Committee:  Drs. Guozhu Dong, Advisor, Keke Chen, and Derek Doran

ABSTRACT:
Correlation analysis is a frequently used statistical measure to examine the relationship among variables in different practical applications. However, the traditional correlation analysis uses an overly simplistic method to do so. It measures how two variables are related in an application by examining only their relationship in the entire underlying data space. As a result, traditional correlation analysis may miss a strong correlation between those variables especially when that relationship exists in the small subpopulation of the larger data space. This is no longer acceptable and may lose a fair share of information in this era of Big Data which often contains highly diverse nature of data where data can differ in a noticeable manner within the same application.

To remedy this situation, we are introducing a new approach called Conditional Correlation Analysis (CCR) in this thesis. Instead of computing the correlation among variables in the entire data space, this approach first divides the entire data space into multiple subpopulations using patterns. It then computes the correlation for each subpopulation and identifies the subpopulation which is highly different (in term of correlation strength) from the global population.

Moreover, we introduce the concepts of CCRs and the ways to mine those CCRs, provides measures to evaluate the unusualness of CCRs and gives experiments to evaluate and illustrate the CCR approach in financial and medical applications.

For information, contact
Log in to submit a correction for this event (subject to moderation).