Publications on PubMed and Google Scholar
Current and former members of the JSB Group are in bold
author* indicates equal contribution
author indicates corresponding author(s)
Statistical rigor in omics data analysis
98. Liu, P. and Li, J.J. (2025). mcRigor: a statistical method to enhance the rigor of metacell partitioning in single-cell data analysis. Nature Communications 16:1802. (Featured in Nature Communications Editors’ Highlights) [ RECOMB 2025 ] [ SOFTWARE ]
75. Xia, L.*, Lee, C.*, and Li, J.J. (2024). Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nature Communications 15:1753. (Featured in Nature Communications Editors’ Highlights) [ Nature Methods: Seeing data as t-SNE and UMAP do ] [ SOFTWARE ]
64. Zhou, H.J., Li, L., Li, Y., Li, W., and Li, J.J. (2022). PCA outperforms popular hidden variable inference methods for QTL mapping. Genome Biology 23:210. [ Highlight talk at RECOMB 2023 ] [ SOFTWARE ] | [ PDF ]
59. Li, Y.*, Ge, X.*, Peng, F., Li, W., and Li, J.J. (2022). Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome Biology 23:79. [ UCLA NEWS ] [ CODE ] | [ PDF ]
38. Li, J.J. and Tong, X. (2020). Statistical hypothesis testing versus machine-learning binary classification: distinctions and guidelines. Patterns 1(7):110115. [ UCLA NEWS ] [ PODCAST ]
Spatial transcriptomics
85. Yan, G., Hua, S.H., and Li, J.J. (2025). Categorization of 34 computational methods to detect spatially variable genes from spatially resolved transcriptomics data. Nature Communications 16:1141.
Association measures
77. Li, J.J., Zhou, H.J., Tong, X., and Bickel, P.J. (2024). Dissecting gene expression heterogeneity: generalized Pearson correlation squares and the K-lines clustering algorithm. Journal of American Statistical Association 119(548):2450-2463. [ SOFTWARE ] | [ PDF ]
Single-cell RNA-seq
73. Song, D., Wang, Q., Yan, G., Liu, T., and Li, J.J. (2024). scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nature Biotechnology 42:247-252. [ SOFTWARE ] [ PDF ]
71. Yan, G., Song, D., and Li, J.J. (2023). scReadSim: a single-cell RNA-seq and ATAC-seq read simulator. Nature Communications 14:7482. [ SOFTWARE ] [ PDF ]
58. Jiang, R., Sun, T., Song, D., and Li, J.J. (2022). Statistics or biology: the zero-inflation controversy about scRNA-seq data. Genome Biology 23:31. [ CODE ] [ PDF ]
53. Song, D.*, Li, K.*, Hemminger, Z., Wollman, R., and Li, J.J. (2021). scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling. Bioinformatics 37(Supplement_1):i358-i366. [ ISMB/ECCB 2021 ] [ SOFTWARE ]
50. Sun, T., Song, D., Li, W.V., and Li, J.J. (2021). scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome Biology 22:163. [ RECOMB 2021 ] [ UCLA NEWS ] [ SOFTWARE ] [ CODE ] [ PDF ]
46. Song, D. and Li, J.J. (2021). PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data. Genome Biology 22:124. [ UCLA NEWS ] [ SOFTWARE ] [ CODE ]
45. Xi, N.M. and Li, J.J. (2021). Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Systems 12(2):176-194. [ CODE ] [ DATA ] [ SSRN’s Top Downloaded Paper of Apr 9 – Jun 7, 2021 in Computational Biology eJournal ]
34. Li, W.V. and Li, J.J. (2019). A statistical simulator scDesign for rational scRNA-seq experimental design. Bioinformatics 35(14):i41-i50. [ ISMB/ECCB 2019 ] [ SOFTWARE ]
26. Li, W.V. and Li, J.J. (2018). An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nature Communications 9:997. [ UCLA NEWS ] [ SOFTWARE ]
Bulk RNA-seq isoform discovery and quantification
29. Li, W.V. and Li, J.J. (2018). Modeling and analysis of RNA-seq data: a review from a statistical perspective. Quantitative Biology 6(3):195-209.
27. Li, W.V.*, Zhao, A., Zhang, S., and Li, J.J.* (2018). MSIQ: joint modeling of multiple RNA-seq samples for accurate isoform quantification. Annals of Applied Statistics 12(1):510-539. [ SOFTWARE ] [ COLOR PDF ]
15. Ye, Y. and Li, J.J. (2016). NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data. BMC Genomics 17(Supp 1):11. [ SOFTWARE ]
2. Li, J.J., Jiang, C.-R., Brown, B.J., Huang, H., and Bickel, P.J. (2011). Sparse linear modeling of RNA-seq data for isoform discovery and abundance estimation. Proc Natl Acad Sci. USA 108(50):19867-19872. [ SOFTWARE ]
Central dogma and translational control
35. Li, J.J., Chew, G.-L., and Biggin, M.D. (2019). Quantitative principles of cis-translational control by general mRNA sequence features in eukaryotes. Genome Biology 20:162. [ CODE ]
22. Li, J.J., Chew, G.-L., and Biggin, M.D. (2017). Quantitating translational control: mRNA abundancee-dependent and independent contributions and the mRNA sequences that specify them. Nucleic Acids Research 45(20):11821-11836. [ Highlight talk at RECOMB 2018 ]
11. Li, J.J. and Biggin, M.D. (2015). Statistics requantitates the central dogma. Science 347(6226):1066-1067. [ UCLA NEWS ] [ Interview at Significance 12(3):8 ]
7. Li, J.J., Bickel, P.B., and Biggin, M.D. (2014). System wide analyses have underestimated protein abundances and transcriptional importance in animals. PeerJ 2:e270. [ Press release ] [ Guest post on “Bits of DNA” blog ] [ PeerJ Picks 2015″ Collection ] [ Top Bioinformatics Papers – June 2015″ Collection ] [ Top 5 most cited PeerJ articles ]
Classification methodologies and applications
65. Zhang, C., Chen, Y.E., Zhang, S., and Li, J.J. (2022). Information-theoretic classification accuracy: a criterion that guides data-driven combination of ambiguous outcome labels in multi-class classification. Journal of Machine Learning Research 23(341):1-65. [ RECOMB 2023 ] [ SOFTWARE ] [ PDF ]
49. Li, J.J., Chen, Y.E., and Tong, X. (2021). A flexible model-free prediction-based framework for feature ranking. Journal of Machine Learning Research 22(124):1-54. [ SOFTWARE ]
40. Lyu, J.*, Li, J.J.*, Su, J., Peng, F., Chen, Y.E., Ge, X., and Li, W. (2020). DORGE: Discovery of Oncogenes and tumor suppressoR genes using Genetic and Epigenetic features. Science Advances 6(46):eaba6784. [ VIDEO ]
25. Tong, X.*, Feng, Y.*, and Li, J.J. (2018). Neyman-Pearson classification algorithms and NP receiver operating characteristics. Science Advances 4(2):eaao1659. [ SOFTWARE ] [ VIDEO ] [ Francis X. Diebold’s Blog on NP Classification ]
Microbiome sequencing data imputation
52. Jiang, R., Li, W.V., and Li, J.J. (2021). mbImpute: an accurate and robust imputation method for microbiome data. Genome Biology 22:192. [ UCLA NEWS ] [ SOFTWARE ] | [ PDF ]
Networks
48. Sun, Y.E., Zhou, H.J., and Li, J.J. (2021). Bipartite tight spectral clustering (BiTSC) algorithm for identifying conserved gene co-clusters in two species. Bioinformatics 37(9):1225-1233. [ SOFTWARE ]
42. Wang, Y.X.R., Li, L., Li, J.J., and Huang, H. (2021). Network modeling in biology: statistical methods for gene and brain networks. Statistical Science 36(1):89-108.
32. Razaee, Z.S., Amini, A.A., and Li, J.J. (2019). Matched bipartite block model with covariates. Journal of Machine Learning Research 20(34):1-44.
High-dimensional model inference
37. Liu, H., Xu, X., and Li, J.J. (2020). A bootstrap lasso + partial ridge method to construct confidence intervals for parameters in high-dimensional sparse linear models. Statistica Sinica 30:1333-1355. [ SOFTWARE ]
Comparative genomics
33. Ge, X.*, Zhang, H.*, Xie, L., Li, W.V., Kwon, S.B., and Li, J.J. (2019). EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences. Nucleic Acids Research 47(13):e77. [ SOFTWARE ] [ WEBSITE ]
30. Duong, D., Ahmad, W.U., Eskin, E., Chang, K.-W., and Li, J.J. (2019). Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions. Journal of Computational Biology 26(1):38-52. [ SOFTWARE ]
19. Li, W.V., Chen, Y., and Li, J.J. (2017). TROM: a testing-based method for finding transcriptomic similarity of biological samples. Statistics in Biosciences 9(1):105-136. [ SOFTWARE ]
18. Gao, R. and Li, J.J. (2017). Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons. BMC Genomics 18:234.
17. Yang, Y.*, Yang, Y.T.*, Yuan, J., Lu, Z.J., and Li, J.J. (2017). Large-scale mapping of mammalian transcriptomes identifies conserved genes associated with different cell states. Nucleic Acids Research 45(4):1657-1672. [ DATA ]
14. Li, W.V., Razaee, Z.S., and Li, J.J. (2016). Epigenome overlap measure (EPOM) for comparing tissue/cell types based on chromatin states. BMC Genomics 17(Supp 1):10. [ SOFTWARE ]
10. Gerstein, M.B.*, Rozowsky, J.*, Yan, K.K.*, Wang, D.*, Cheng, C.*, Brown, J.B.*, Davis, C.A.*, Hillier, L*, Sisu, C.*, Li, J.J.*, Pei, B.*, Harmanci, A.O.*, Duff, M.O.*, Djebali, S.*, and 82 other authors from the modENCODE consortium (2014). Comparative analysis of the transcriptome across distant species. Nature 512(7515):445-448. [ NIH NEWS ]
8. Li, J.J., Huang, H., Bickel, P.B., and Brenner, S.E. (2014). Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-seq data. Genome Research 24(7):1086-1101. [ Press release ] [ Top 10 papers selected at the 2014 RECOMB/ISCB Conference on Regulatory & Systems Genomics ]
Gene regulation
1. MacArthur, S.*, Li, X.Y.*, Li, J.*, Brown, J.B., Chu, H.C., Zeng, L., Grondona, B.P., Hechmer, A., Simirenko, L., Keranen, S.V., Knowles, D.W., Stapleton, M., Bickel, P., Biggin, M.D., and Eisen, M.B. (2009). Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biology 10:R80. [ Faculty of 1000 recommendation ]