Image 

Dr. Michael L. Raymer

Department of Computer Science and Engineering,
Wright State University

338 Russ Engineering Center
(937) 775-5110
mraymer@cs.wright.edu 

Research

I am interested in the development of novel computational techniques to investigate problems in biochemistry.  Bioinformatics, proteomics, genomics, and computational biology are a few of the monikers that have recently been associated with this research.  Some of the specific biological problems that I investigate include protein-water interactions and ligand binding, protein conformation and the activity of molecular chaperones, and protein binding site identification.  To explore these problems, I use methods from evolutionary computation (genetic algorithms, evolutionary programming, genetic programming, and others) and from pattern recognition (various statistical classification techniques including nearest neighbor and Bayesian classification).

My primary research aim is to devise and implement novel algorithms for analyzing and understanding large biological data sets.  More detailed information is available on my publications page

Image 

Pattern Recognition and Evolutionary Computation

Computational pattern recognition generally requires that objects be described in terms of a set of measurable features.  The selection and quality of the features representing each pattern has a considerable bearing on the success of subsequent pattern classification.  Feature extraction is the process of deriving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and anllow higher classification accuracy.  By employing evolutionary computation (specifically, evolutionary programming and genetic algorithms) for feature selection and extraction, I am attempting to develop classfiication techniques that can reduce the number of features considered while maintaining classification accuracy.

A primary goal in the design of these classifiers is the interpretability of the feature selection results.  Many linear feature extraction techniques use features that are linear combinations of many or all of the original features.  For these methods, it can be difficult to determing which features from the original feature set play a significant role in distinguishing among pattern classes.  For biological problems, gaining an understanding of the features that take part in classification can often provide more interesting scientific results than the actual classifications.


Understanding Protein Binding Interactions

Using the methods described above, I have performed several investigations into the determinants of protein-water binding.  The Consolv algorithm is a k-nearest-neighbor classifier hybridized with a genetic algorithm for feature selection and extraction.  This algorithm can predict conserved water binding between independently-solved crystallographic protein structures with ~67% accuracy.  Analysis of the features employed in this classification has provided some insight into the physical and chemical determinants of protein solvent binding and solvation site conservation between ligand-bound and unbound structures.  Hybrid classification techniques have proven useful for identification of other types of protein binding sites as well, including metal binding sites in metaloenzymes, and protein active sites.  

Image 


Data Mining of Three-Dimensional Protein Structure Information

Another area of current research interest is the use of hybrid EC/classifier techniques, as well as graph theory and exploratory statistical methods, to analyze and understand large databases of three-dimensional protein structure information such as the RCSB Protein Data Bank, and the smaller PDB-select database.  Possible areas of exploration include domain identification, secondary structure prediction, and analysis of structure-function relationships.