I am generally interested in the development of application of statistical methods to scientific problems and the development of general statistical methodologies driven by these applications. Currently I have three major distinct but also related research areas. The first is in data mining and information theory related area, the second is in molecular biology and evolutionary genetics, the third is in the statistical analysis of metagenomic data.
1. Data mining related topics:
There are several main themes along this line of my research.
1. Multivariate data exploratory methods, data reduction and model interpretation.
2. Large p small n problems, for both supervised and unsupervised learning;
3. Classification problems for very high number of classes with large amount of missing values.
2. Statistical methods in molecular evolution:
My interests in this direction include:
1. Statistical methods to predict the structure or functions of genes.
2. Statistical methods for detecting adaptive molecular evolution.
3. Inference and diagnostics in phylogeny.
4. Improving stochastic models of protein evolution. (Developing a general and flexible codon model framework to incorporate the structure information of genes and further development of such models to genome analysis).
3. Metagenomic Analysis:
I am currently working on problems:
1. Modeling the association between host genome and metagenome and their interactions.
2. Modelling the joint influence of host genome, metagenome and environmental variables to the disease states.
3. Developing data exploratory methods for the metagenomic data.
4. Bayesian inference of metabolic divergence among microbial communities.