|
|
Dr. Michael L. RaymerDepartment of Computer Science and Engineering, 338 Russ Engineering Center |
ResearchI am interested in the development of novel computational techniques to investigate problems in biochemistry. Bioinformatics, proteomics, genomics, and computational biology are a few of the monikers that have recently been associated with this research. Some of the specific biological problems that I investigate include protein-water interactions and ligand binding, protein conformation and the activity of molecular chaperones, and protein binding site identification. To explore these problems, I use methods from evolutionary computation (genetic algorithms, evolutionary programming, genetic programming, and others) and from pattern recognition (various statistical classification techniques including nearest neighbor and Bayesian classification). My primary research aim is to devise and implement novel algorithms for analyzing and understanding large biological data sets. More detailed information is available on my publications page. |
|
Computational pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern has a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and anllow higher classification accuracy. By employing evolutionary computation (specifically, evolutionary programming and genetic algorithms) for feature selection and extraction, I am attempting to develop classfiication techniques that can reduce the number of features considered while maintaining classification accuracy.
A primary goal in the design of these classifiers is the interpretability of the feature selection results. Many linear feature extraction techniques use features that are linear combinations of many or all of the original features. For these methods, it can be difficult to determing which features from the original feature set play a significant role in distinguishing among pattern classes. For biological problems, gaining an understanding of the features that take part in classification can often provide more interesting scientific results than the actual classifications.
Another area of current research interest is the use of hybrid EC/classifier techniques, as well as graph theory and exploratory statistical methods, to analyze and understand large databases of three-dimensional protein structure information such as the RCSB Protein Data Bank, and the smaller PDB-select database. Possible areas of exploration include domain identification, secondary structure prediction, and analysis of structure-function relationships.