Center for Integrative Genomics Advanced Research Seminar (CIGars)

The Center of Integrative Genomics (CIG) welcomes our invited speaker, Jingyi Jessica Li, Ph.D., Professor in the Department of Statistics, Department of Human Genetics, and Department of Biomathematics at the University of California, Los Angeles for a special Advanced Research Seminar:

ClusterDE: a Post-clustering Differential Expression (DE) Method Robust to False-positive Inflation Caused by Double Dipping

Jingyi Jessica Li, Ph.D.
Professor, Department of Statistics 
Department of Human Genetics
Department of Biomathematics 
University of California, Los Angeles

In typical single-cell RNA-seq (scRNA-seq) data analysis, a clustering algorithm is applied to find putative cell types as clusters, and then a statistical differential expression (DE) test is employed to identify the differentially expressed (DE) genes between the cell clusters. However, this common procedure uses the same data twice, an issue known as “double dipping”: the same data is used twice to define cell clusters as potential cell types and DE genes as potential cell-type marker genes, leading to false-positive cell-type marker genes even when the cell clusters are spurious. To overcome this challenge, we propose ClusterDE, a post-clustering DE method for controlling the false discovery rate (FDR) of identified DE genes regardless of clustering quality, which can work as an add-on to popular pipelines such as Seurat. The core idea of ClusterDE is to generate real-data-based synthetic null data containing only one cluster, in contrast to the real data, for evaluating the whole procedure of clustering followed by a DE test. Using comprehensive simulation and real data analysis, we show that ClusterDE has not only solid FDR control but also the ability to identify cell-type marker genes as top DE genes and distinguish them from housekeeping genes. ClusterDE is fast, transparent, and adaptive to various clustering algorithms and DE tests. Besides scRNA-seq data, ClusterDE generally applies to post-clustering DE analysis, including single-cell multi-omics data analysis.

Jingyi Jessica Li, Professor of Statistics and Data Science (also affiliated with Biostatistics, Computational Medicine, and Human Genetics), leads a research group titled the Junction of Statistics and Biology at UCLA. With a Ph.D. from UC Berkeley and B.S. from Tsinghua University, Li focuses on developing interpretable statistical methods for biomedical data. Her research delves into quantifying the central dogma, extracting hidden information from transcriptomics data, and ensuring statistical rigor in data analysis by employing synthetic negative controls. Recipient of multiple awards including the NSF CAREER Award, Sloan Research Fellowship, ISCB Overton Prize, and COPSS Emerging Leaders Award, her contributions have gained recognition in the fields of computational biology and statistics.


Enjoy free lunch and hear about a wide range of work happening in the local bioscience community!