Haotian Cui, Ph.D.

prof_pic.jpg

I am a researcher specializing in machine learning, genomics, and drug discovery. My work focuses on developing large-scale self-supervised models to enable biological insights and therapeutic discoveries. I am passionate about building foundation models for single-cell omics and molecular biology, integrating generative AI with experimental pipelines to accelerate biomedical breakthroughs.

I completed my Ph.D. in Computer Science at University of Toronto (2019-2024) under Prof. Bo Wang, where I pioneered scGPT, one of the first generative foundation models for single-cell multi-omics. I also led the development of LUMI-lab, an AI-driven autonomous lab for mRNA therapeutics.

Research Interests: Foundation model, Machine Learning, Genomics, Drug Discovery

selected publications

  1. Nature Methods
    scGPT: toward building a foundation model for single-cell multi-omics using generative AI
    H. Cui*, C. Wang*, H. Maan, K. Pang, F. Luo, N. Duan, and B. Wang
    Nature Methods. Pioneering single-cell foundation model with generative pretraining. , 2024
  2. Towards Multimodal Foundation Models in Molecular Cell Biology
    H. Cui, A. Tejada-Lapuerta, M. Brbić, J. Saez-Rodriguez, S. Cristea, H. Goodarzi, M. Lotfollahi, F.J. Theis, and B. Wang
    NatureForthcoming. Perspective on multimodal foundation models across the central dogma. , 2025
  3. bioRxiv
    LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling Discovery of New Ionizable Lipid Designs for mRNA Delivery
    H. Cui*, Y. Xu*, K. Pang, G. Li, F. Gong, B. Wang, and B. Li
    bioRxiv. Under review (Cell). Fully self-driving lab for active-learning experiments on lipid design. , 2025
  4. bioRxiv
    scGPT-spatial: Continual Pretraining of Single-Cell Foundation Model for Spatial Transcriptomics
    C.X. Wang*H. Cui*, A.H. Zhang, R. Xie, H. Goodarzi, and B. Wang
    bioRxiv, 2025
  5. bioRxiv
    MethylGPT: a foundation model for the DNA methylome
    K. Ying*, J. Song*H. Cui*, Y. Zhang, S. Li, X. Chen, H. Liu, A. Eames, D.L. McCartney, R.E. Marioni, and J.R. Poganik
    bioRxiv, 2024
  6. Nature Communications
    AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery
    Y. Xu*, S. Ma*H. Cui*, J. Chen, S. Xu, F. Gong, A. Golubovic, M. Zhou, K.C. Wang, A. Varley, and R.X.Z. Lu
    Nature Communications, 2024
  7. Genome Biology
    DeepVelo: deep learning extends RNA velocity to multi-lineage systems with cell-specific kinetics
    H. Cui*, H. Maan*, M.C. Vladoiu, J. Zhang, M.D. Taylor, and B. Wang
    Genome Biology, 2024
  8. EMNLP 2022
    CodeExp: Explanatory Code Document Generation
    H. Cui, C. Wang, J. Huang, J.P. Inala, T. Mytkowicz, B. Wang, J. Gao, and N. Duan
    In Findings of the Association for Computational Linguistics: EMNLP 2022, Dec 2022
  9. bioRxiv
    scFormer: a universal representation learning approach for single-cell data using transformers
    H. Cui*, C. Wang*, H. Maan, N. Duan, and B. Wang
    bioRxiv, Dec 2022
  10. ICML 2022
    A Deep Learning Framework for Estimating Cell-specific Kinetic Rates of RNA Velocity
    H. Cui*, H. Maan*, M.D. Taylor, and B. Wang
    In The 2022 ICML Workshop on Computational Biology, Dec 2022
  11. Nature Biomedical Engineering
    Stretchable ultrasonic arrays for the three-dimensional mapping of the modulus of deep tissue
    H. Hu, Y. Ma, X. Gao, D. Song, M. Li, H. Huang, X. Qian, R. Wu, K. Shi, H. Ding, M. Lin, X. Chen, W. Zhao, B. Qi, S. Zhou, R. Chen, Y. Gu, Y. Chen, Y. Lei, C. Wang, C. Wang, Y. Tong, H. Cui, A. Abdal, Y. Zhu, X. Tian, Z. Chen, C. Lu, X. Yang, J. Mu, Z. Lou, M. Eghtedari, Q. Zhou, A. Oberai, and S. Xu
    Nature Biomedical Engineering, Dec 2023
  12. DaSH 2022
    Execution-based Evaluation for Data Science Code Generation Models
    J. Huang, C. Wang, J. Zhang, C. Yan, H. Cui, J.P. Inala, C. Clement, N. Duan, and J. Gao
    In DaSH 2022, Dec 2022