Our recent work has shown that the combination of current RNA-Seq statistical algorithms based on empirical performance weighting (PANDORA algorithm) improves the overall detection of differentially expressed genes by reducing false hits (numbers of false positives and false negatives) while maintaining true positives. It was shown that PANDORA, as evaluated on simulated and real datasets, achieved significant improvements in both precision and sensitivity.
Therefore, there is strong evidence that proper algorithm combination yields more accurate results in the case of RNA-Seq data. However, up to date, there is no published study which applied any similar principle as PANDORA in i) the analysis of alternative splicing using RNA-Seq data, ii) the analysis of protein-DNA interations using ChIP-Seq data, iii) the detection of DNA variant (WES, WGS). In all cases there is currently a wealth of methodologies which can be combined towards more accurate results. In addition, such combined approaches can be used to construct unified public data resources which integrate several layers of genomic knowledge and public data analyzed in a homogenized manner. Thus, our research focuses on:
1. Systematic combination and evaluation of current algorithms for the detection of alternative splicing from RNA-Seq data
2. Systematic combination and evaluation of current algorithms for the detection of genome-wide protein-DNA interactions from ChIP-Seq data
3. Systematic combination and evaluation of current algorithms for the detection of DNA variations from WES and WGS data
4. Deployment of our methods to construct integrated data resources of focused genomic content (e.g. a genomic knowledge base dedicated in human colon cancer)
5. Deployment of our methods to study transcription factor networks in cancer development taking into account factors with little literature evidence, like the large heterogeneity among cancerous biological samples