|
|
|
|
|
|
|
|
|
|
||
|
|
I was born in from the Department of Computer Science and Engineering, Shanghai University of Electric Power in 2005. I came to U.S in 2006 to take my graduate study of computer science and received my master degree in August 2008 from the Department of Computer Science ,Wright State University. In 2010, I started my master study of Applied Statistics in the Department of Mathematics and Statistics, Wright State University. I am interested in the work related to statistical data analysis, data mining, SAS programming and data analysis software development. I am currently looking for full time or intern jobs related in these areas. My resume could be found [here]. |
On this webpage, you could
also find other information by clicking the below links: Research Papers Statistical & Data Mining Projects Programming Projects
Courses
taken@WSU TA work
|
Title:
Analyzing and Tracking Weblog Communities Using Discriminative
Collection Representatives, appeared in SBP10. [Paper][Slides]
Abstract:
Analyzing/tracking
weblogs by given communities (ATWC) is increasingly important for
sociologists and government agencies, etc. This paper introduces an approach
to address the needs of ATWC by using concise discriminative weblog
collection representatives (DCRs), which are constructed from large
collections of blogs by communities of interest. DCRs are aimed at helping
users to quickly identify the major themes/trends in such collections, and to
quickly identify important shifts/differences in major themes and trends of
blogs by given communities over time and space. We propose to use the quality
of DCR-based classifiers to measure DCRs' quality. We present algorithms for
constructing DCRs, report experimental results to evaluate the efficiency of
the algorithms and the quality of the DCRs they construct, and provide
real-data examples to demonstrate the usefulness of DCRs for ATWC. Title:
Object Similarity through Correlated Third-Party Objects, OhioLINK ETD, 2008. [Paper]
[Slides][Video Demo] Abstract:
Given a
pair of objects, it is of interest to know how they are related to each other
and the strength of their similarity. Many previous studies focused on two
types of similarity measures: The first type is based on closeness of
attribute values of two given objects, and the second type is based on how
often the two objects co-occur in transactions/tuples.
In this thesis we study a new ¡°behavior-based¡± similarity measure, which
evaluates similarity between two objects by considering how similar their
correlated ¡°third-party¡± object sets are. Behavior-based similarity can help
us find pairs of objects that have similar external functions but do not have
very similar attribute values or do not co-occur quite often. After introducing and
formalizing behavior-based similarity, we give an algorithm to mine pairs of
similar objects under this measure. We demonstrate the usefulness of our
algorithm and this measure using experiments on several news and medical
datasets. (1)
Loading the dataset, a progress bar appears to indicate the loading progress:(2) A progress monitor showing the process of mining
similar object pairs in real time:
(3)The results are displayed in a java table, all
the columns could be sorted and extended:(4) The final results are also automatically generated
as an html file:
|
Statistical & Data Mining Projects [back to top]
Projects @ Statistical Consulting Center,
1. Help the
institutional research department of WSU to identify important factors that
have impacts on students¡¯ retention and graduation.
2. Help the
internal audit department to do different statistical analysis on school credit
cards usage.
Projects @ Qbase
1. Help the
non-profit organizations to identify potential donors, new donors and
influential donors.
2. Help the
Projects @ Data Mining Research Lab,
Title:
OLAP-style Entity Correlation Analysis on Events Data, Lexis-Nexis,
2006-2007. [Video Demo]
In this project I designed and
developed tools to perform OLAP-style entity correlation analysis on events
data contained in news reports. The aim of the tools is to extract interesting
correlations among entities.
The source data is metadata extracted from news reports. The metadata contains
a number of attributes such as "company" "organization"
"ticker" "person" "city" "country" etc.
Each specific event contains a number of attributes, and it contains a number
of values for each of those attributes. From each event, each pair of attribute
values, for two (possibly identical) attributes, is considered as a correlation
instance. A user can provide any specific set of events as input to this
program.
The frequent correlations are computed from the given set of events. They are
displayed through a user-friendly user interface. Users can navigate the
display to do drill-down and roll-up of correlations.
At each level of the display, the user interface first provides a list of
attributes in order to give the users a schema description of the data. When a
user clicks any of the attributes, she/he will see the top-K most frequent
entities for the clicked attribute. The default value for K is 100. When the
user clicks any of the displayed entities, the list of attributes will again
appear, allowing the user to drill-down another time. This process can repeat
many times, allowing the user to drill-down the correlation to a number of
levels. At any time, the path from the root to the current attribute value is
high-lighted to allow the user to see the history/context of the correlations
associated with the current path.
(1) Root Level Display for Correlation Analysis:
(2) Expanded display for each attribute:

(3)
A detailed level of the display:
(4) A more detailed low
level display:

Programming Projects [back to top]
Projects @ TechEdge,
Wright Brothers Institute:
1. Open Layer Sensing Test bed project:
Display
(1) Traffic web
cameras in Dayton:
(2) Traffic web
cameras in Cincinnati, Dayton and Columbus:

(3) Connect to
the traffic web camera located on I-75/3rd street: (4) Start to play the real-time
traffic video:

2. PocketLST project:
Worked as the team
leader of the android phone development group and developed android phone applications
that could send/receive text and image messages to/from the Google map in real
time. [Technical Report] [Slides] [Video Demo]
The screenshots
below are my part of android implementations for the PocketLST
project:




Courses Taken @
WSU [back to top]
Computer
Science Courses:
--CS516: Survey of
Computer Science Numerical Methods
--CS605: Introduction to Data Management
System
--CS634: Concurrent Software Design
--CS666: Introduction to Formal Language
--CEG66: Matrix Computation
--CS680: Comparative Languages
--CS701: Database System & Design
--CS702: Advanced Computer Networks
--CEG720: Computer Architecture
--CS740: Natural Language processing
techniques
--CEG770: Computer
Engineering Mathematics
--CS790: Advanced Data Mining
Applied
Statistical Courses:
--STT611 Applied Time Series
--STT646: Statistical Methods for Engineers
--STT661: Statistical Theory I
--STT662: Statistical Theory II
--STT666: Statistical Methods I
--STT667: Statistical Methods II
--STT669: Introduction to Experimental
Design
--STT740 Categorical Data Analysis
--STT761: Theory of Linear Model
--STT767: Applied Regression Analysis
Unofficial WSU Transcript could be found [here].
TA courses and
labs:
cs240(lab):
Programming Language I (java)
cs241(lab):
Programming Language II (advanced java)
cs242(lab):
Programming Language III (c++)
STT264(lab):
elementary statistics
STT265(lab):
elementary statistics II
MTH126:
Intermediate Algebra
@ 2008