I am generally interested in the development of application of statistical methods to scientific problems and the development of general statistical methodologies driven by these applications. Currently I have three major distinct but also related research areas. The first is in data mining and information theory related area, the second is in molecular biology and evolutionary genetics, the third is in the statistical analysis of metagenomic data.
1. Data mining related topics:
There are several main themes along this line of my research.
1. Multivariate data exploratory methods, data reduction and model interpretation. In particular, prototype methods.
2. Large p small n problems, for both supervised and unsupervised learning; Related to this, I am interested in sufficient dimension reduction and inverse regression.
3. Classification problems for very high number of classes.
4. Rare Target Identification in Drug Discovery.
2. Statistical methods in molecular evolution:
My interests in this direction include:
1. Statistical methods to predict the structure or functions of genes.
2. Statistical methods for detecting adaptive molecular evolution.
3. Inference and diagnostics in phylogeny.
4. Improving stochastic models of protein evolution. (Developing a general and flexible codon model framework to incorporate the structure information of genes and further development of such models to genome analysis).
3. Metagenomic Analysis:
I am currently working on problems:
1. Modeling the association between host genome and metagenome and their interactions.
2. Modelling the joint influence of host genome, metagenome and environmental variables to the disease states.
3. Developing data reduction methods and supervised learning methods to interpret the metagenomic data based on NMF.
4. Develop a set of approaches to calculate microbial beta diversity based on the joint distributions of species.
5. Continue developing the supervised version of the hierarchical Bayesian models BioMiCo (A Bayesian model for inference of metabolic divergence among microbial communities).