Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Nature Communications , Article number: (2026) Cite this article We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply. Self-amplifying mRNA (saRNA) vectors hold promise for the sustained expression of mRNA vaccines in vivo. However, their inherently high immunogenicity and low-fidelity replication—stemming from the RNA viral genome's replication mechanisms—limit their efficacy as replacements or adjuncts to protein therapies. Here we report an engineered viral protein genome-linked (VPg) saRNA vector derived from a Norovirus replicon, designed for rapid loading of therapeutic protein mRNAs in vitro. The engineered VPg saRNA is adapted for a range of therapeutic scenarios, including treatment of tumor-associated cachexia under conditions of translational restriction in cap-dependent metabolism, precise encoding of oncolytic mRNAs in vivo to achieve complex functionality, and therapy for graft-versus-host disease in highly auto-immune environments. VPg saRNA addresses key limitations of linear mRNA and conventional saRNA therapies, broadening the potential applications of mRNA-based treatments. Source data for Figs. 1–7 and Supplementary Figs. 1–7 are provided as Source Data files. LC–MS/MS raw datasets and Sanger sequencing chromatograms generated in this study have been deposited in Figshare and are available at https://doi.org/10.6084/m9.figshare.3078216546 and https://doi.org/10.6084/m9.figshare.3069415146. Raw fluorescence microscopy, cryo-transmission electron microscopy (cryo-TEM), immunohistochemistry (IHC), and hematoxylin and eosin (H&E) staining images are not publicly available due to large file sizes, instrument-specific formats, and institutional data management restrictions; however, all processed and representative images supporting the findings of this study are included in the paper and its supplementary materials. All other data, including raw imaging data, are available from the corresponding author upon request. Source Data are provided with this paper. Skowronski, D. M. & De Serres, G. Safety and efficacy of the BNT162b2 mRNA Covid-19 vaccine. Qiu, X. et al. Development of mRNA vaccines against respiratory syncytial virus (RSV). Cytokine Growth Factor Rev. Liu, C. et al. mRNA-based cancer therapeutics. Kim, Y. K. RNA therapy: rich history, various applications and unlimited future prospects. Pardi, N., Hogan, M. J., Porter, F. W. & Weissman, D. mRNA vaccines-a new era in vaccinology. Anderson, B. R. et al. Nucleoside modifications in RNA limit activation of 2'-5'-oligoadenylate synthetase and increase resistance to cleavage by RNase L. Nucleic Acids Res. Blakney, A. K., Ip, S. & Geall, A. J. An update on self-amplifying mRNA vaccine development. Rohner, E., Yang, R., Foo, K. S., Goedel, A. & Chien, K. R. Unlocking the promise of mRNA therapeutics. Minnaert, A. K. et al. Strategies for controlling the innate immune activity of conventional and self-amplifying mRNA therapeutics: getting the message across. Chen, R. et al. Engineering circular RNA for enhanced protein production. Perkovic, M. et al. A trans-amplifying RNA simplified to essential elements is highly replicative and robustly immunogenic in mice. Bloom, K., van den Berg, F. & Arbuthnot, P. Self-amplifying RNA vaccines for infectious diseases. Schmidt, C. & Schnierle, B. S. Self-amplifying RNA vaccine candidates: alternative platforms for mRNA vaccine development. Ventoso, I. et al. Translational resistance of late alphavirus mRNA to eIF2alpha phosphorylation: a strategy to overcome the antiviral effect of protein kinase PKR. García, M. A. et al. Impact of protein kinase PKR in cell biology: from antiviral to antiproliferative action. Rudd, P. A. et al. Interferon response factors 3 and 7 protect against Chikungunya virus hemorrhagic fever and shock. Wauquier, N. et al. The acute phase of Chikungunya virus infection in humans is associated with strong innate immunity and T CD8 cell activation. McNab, F., Mayer-Barber, K., Sher, A., Wack, A. & O'Garra, A. Type I interferons in infectious disease. & Sadler, A. The role of protein kinase R in the interferon response. J. Interferon Cytokine Res. Castro, C., Arnold, J. J. & Cameron, C. E. Incorporation fidelity of the viral RNA-dependent RNA polymerase: a kinetic, thermodynamic and structural perspective. Virus Res. Patterson, E. I. et al. Measuring alphavirus fidelity using non-infectious virus particles. Thorne, L. G. & Goodfellow, I. G. Norovirus gene expression and replication. Doerflinger, S. Y. et al. Membrane alterations induced by nonstructural proteins of human norovirus. Wolff, G., Melia, C. E., Snijder, E. J. & Bárcena, M. Double-membrane vesicles as platforms for viral replication. Alexopoulou, L., Holt, A. C., Medzhitov, R. & Flavell, R. A. Recognition of double-stranded RNA and activation of NF-kappaB by toll-like receptor 3. Leen, E. N. et al. A conserved interaction between a C-terminal motif in norovirus VPg and the HEAT-1 domain of eIF4G is essential for translation initiation. Belliot, G., Sosnovtsev, S. V., Chang, K. O., McPhie, P. & Green, K. Y. Nucleotidylylation of the VPg protein of a human norovirus by its proteinase-polymerase precursor protein. The genome-linked protein VPg of vertebrate viruses-a multifaceted protein. Högbom, M., Jäger, K., Robel, I., Unge, T. & Rohayem, J. The active form of the norovirus RNA-dependent RNA polymerase is a homodimer with cooperative activity. Yang, X. et al. Motif D of viral RNA-dependent RNA polymerases determines efficiency and fidelity of nucleotide addition. Gong, P. & Peersen, O. Structural basis for active site closure by the poliovirus RNA-dependent RNA polymerase. Liu, W., Shi, X. & Gong, P. A unique intra-molecular fidelity-modulating mechanism identified in a viral RNA-dependent RNA polymerase. Nucleic Acids Res. Rohayem, J., Robel, I., Jäger, K., Scheffler, U. & Rudolph, W. Protein-primed and de novo initiation of RNA synthesis by norovirus 3Dpol. Subba-Reddy, C. V., Goodfellow, I. & Kao, C. C. VPg-primed RNA synthesis of norovirus RNA-dependent RNA polymerases by using a novel cell-based assay. Paul, A. V. & Wimmer, E. Initiation of protein-primed picornavirus RNA synthesis. Virus Res. te Velthuis, A. J. Common and unique features of viral RNA-dependent polymerases. Chung, L. et al. Norovirus translation requires an interaction between the C terminus of the genome-linked viral protein VPg and eukaryotic translation initiation factor 4G. Qin, X., Jiang, B. & Zhang, Y. 4E-BP1, a multifactor regulated multifunctional protein. Cell Cycle 15, 781–786 (2016). Geremia, A. et al. Activation of Akt-mTORC1 signalling reverts cancer-dependent muscle wasting. J. Cachexia Sarcopenia Muscle 13, 648–661 (2022). Baracos, V. E., Martin, L., Korc, M., Guttridge, D. C. & Fearon, K. C. H. Cancer-associated cachexia. Currow, D. C. & Abernethy, A. P. Anamorelin hydrochloride in the treatment of cancer anorexia-cachexia syndrome. Dalton, J. T., Taylor, R. P., Mohler, M. L. & Steiner, M. S. Selective androgen receptor modulators for the prevention and treatment of muscle wasting associated with cancer. Renz, B. W. et al. β2 adrenergic-neurotrophin feedforward loop promotes pancreatic cancer. Kojima, M. et al. Ghrelin is a growth-hormone-releasing acylated peptide from stomach. Feng, Z. et al. An in vitro-transcribed circular RNA targets the mitochondrial inner membrane cardiolipin to ablate EIF4G2+/PTBP1+ pan-adenocarcinoma. Feng, Z. Engineered VPg saRNA achieves cap-independent, low-immunogenic and precise encoding of therapeutic proteins in vivo. Yan, L. et al. Targeting glucose metabolism sensitizes pancreatic cancer to MEK inhibition. Griffin, J. H., Zlokovic, B. V. & Mosnier, L. O. Activated protein C: biased for translation. Ranjan, S. et al. Activated protein C protects from GvHD via PAR2/PAR3 signalling in regulatory T-cells. Sinha, R. K. et al. Activated protein C ameliorates chronic graft-versus-host disease by PAR1-dependent biased cell signaling on T cells. Sanjuán, R., Nebot, M. R., Chirico, N., Mansky, L. M. & Belshaw, R. Viral mutation rates. Poirier, E. Z. et al. Low-fidelity polymerases of alphaviruses recombine at higher rates to overproduce defective interfering particles. Langsjoen, R. M., Muruato, A. E., Kunkel, S. R., Jaworski, E. & Routh, A. Differential alphavirus defective RNA diversity between intracellular and extracellular compartments is driven by subgenomic recombination events. Anderson, B. R. et al. Incorporation of pseudouridine into mRNA enhances translation by diminishing PKR activation. Nucleic Acids Res. Kormann, M. S. et al. Expression of therapeutic proteins after delivery of chemically modified mRNA in mice. McGee, J. E. et al. Complete substitution with modified nucleotides in self-amplifying RNA suppresses the interferon response and increases potency. & Hilgenfeld, R. RNA-virus proteases counteracting host innate immunity. Koudelka, T. et al. N-terminomics for the identification of in vitro substrates and cleavage site specificity of the SARS-CoV-2 main protease. Chhabra, P. et al. Updated classification of norovirus genogroups and genotypes. Warzak, D. A., Pike, W. A. & Luttgeharm, K. D. Capillary electrophoresis methods for determining the IVT mRNA critical quality attributes of size and purity. Cooke, K. R. et al. An experimental model of idiopathic pneumonia syndrome after bone marrow transplantation: I. The roles of minor H antigens and endotoxin. This research was supported by the National Natural Science Foundation of China (32471002, 82401829, 82472617, and 82404553; F.Z.Y. ), the National University of Singapore (NUHSRO/2020/133/Startup/08, NUHSRO/2023/008/NUSMed/TCE/LOA, NUHSRO/2021/034/TRP/09/Nanomedicine, NUHSRO/2021/044/Kickstart/09/LOA, 23-0173-A0001; C.X.Y. ), the National Medical Research Council (MOH-001388-00, CG21APR1005, MOH-001500-00, MOH-001609-00; C.X.Y. ), the Singapore Ministry of Education (MOE-000387-00; C.X.Y. ), the National Research Foundation (NRF-000352-00; C.X.Y. ), the Leading Innovative and Entrepreneur Team Introduction Program of Zhejiang (2023R01002; Z.G.W. ), the Distinguished Young Scientists Fund of Zhejiang (LR25H250001; Z.G.W. ), and the National Science and Technology Major Project of China (No. State Key Laboratory of Macromolecular Drugs and Large-scale Preparation, School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China Zunyong Feng, Liuxi Chu, Jing Zhou, Ping Wu & Xiaokun Li The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China Liuxi Chu, Xiaokun Li & Zhouguang Wang Human Anatomy Experimental Training Center, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Wannan Medical College, Wuhu, Anhui, China Qiang Li, Zhiliang Xu, Liang Yan & Yanjiao Huang Respiratory Medicine and Acute Care Center, First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, China Qiang Li & Qun Chen Department of Diagnostic Radiology, Yong Loo Lin School of Medicine; Department of Chemical and Biomolecular Engineering, College of Design and Engineering; Department of Biomedical Engineering, College of Design and Engineering; Department of Pharmacy and Pharmaceutical Sciences, Faculty of Science; Clinical Imaging Research Centre, Centre for Translational Medicine, Yong Loo Lin School of Medicine; Nanomedicine Translational Research Program, Yong Loo Lin School of Medicine; Theranostics Center of Excellence (TCE), Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Xuanbo Zhang, Yuanbo Pan, Jianhua Zou & Xiaoyuan Chen Department of Neurosurgery, Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China Yuanbo Pan Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou, China Zhouguang Wang Shandong Provincial Key Laboratory of Precision Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250117, China Xiaoyuan Chen Institute of Molecular and Cell Biology, Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore Xiaoyuan Chen Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar conceived and designed the overall study, supervised key experiments, and led data analysis and interpretation. performed most molecular and cellular experiments. assisted with cell culture and animal studies. contributed to data processing, statistical analysis, and figure preparation. provided essential reagents, technical guidance, and methodological support. performed imaging and histopathological examinations. jointly supervised the project, contributed to conceptual refinement, and provided funding and resources. wrote the manuscript with input from all authors. All authors discussed the results, revised the manuscript, and approved the final version. Correspondence to Xiaokun Li, Zhouguang Wang or Xiaoyuan Chen. Xiaoyuan Chen is a co-founder of and holds shares in Yantai Lannacheng Biotechnology Co., Ltd. The remaining authors declare no competing interests. Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. Reprints and permissions Feng, Z., Chu, L., Li, Q. et al. Engineered VPg saRNA achieves cap-independent, low-immunogenic and precise encoding of therapeutic proteins in vivo. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative © 2026 Springer Nature Limited Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.
You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). For more than a century, most evidence suggested that sponges, immobile filter-feeders that lack muscles, neurons and other specialized tissues, were the first animal lineages to emerge. Then, in 2008, a genomic study pointed to a head-scratching rival1: dazzling, translucent predators called comb jellies, or ctenophores, with nerves, muscles and other sophisticated features. Ancient sea jelly makes tree of life wobble That single study ignited a debate that has raged for nearly 20 years, sparking fierce arguments about how complexity evolved in animals. But after dozens of studies — some of which analysed and reanalysed the same data and reached different conclusions — the debate has become entrenched, some researchers say. “Where it might have been healthy for people to engage with curiosity and an interest in finding the truth together, it became a battle,” says Nicole King, an evolutionary biologist at the University of California, Berkeley, who co-authored a paper last November that landed cautiously on ‘team sponge'2. Scientists, including King, argue that a different approach is needed: one in which researchers from both sides work together to answer the question. Fresh ideas — and attitudes — would catalyse progress, they say. Multicellularity was so successful that it sparked an explosion in innovative body forms and new ways to sense and respond to environments. As well as the ancestors of modern-day sponges and comb jellies, there were placozoa (now represented by blob-like marine invertebrates); cnidarians (modern members of which include jellyfish and sea anemones); and bilaterians that show mirror-image body symmetry in early development that would give rise to invertebrates, including starfish, snails and spiders, and vertebrates, including humans (see ‘Tree of life — now with two options'). Fossil evidence of the earliest animals is sparse and hard to decipher — a porous cavity here or a branching tube there. Identifying the first animal lineage, along with knowledge of its modern-day descendants, is another way to gain insight into these early creatures. “Knowing this will tell us something, not everything, about what those first animals might have looked like,” says Max Telford, an evolutionary biologist at University College London. Evolutionary biologists sometimes call this first animal the ‘sister' to other animal groups, because it shares a common parent with all of them. For more than a century, most scientists placed the sponge lineage at the base of the animal tree, mainly because modern-day sponges lack many of the features that define other animals, including specialized tissues such as muscle, nerve and gut, which were thought to have evolved later. “If the sponge tree were right, everything would just rather fall into place,” says Telford. Casey Dunn, an evolutionary biologist now at Yale University in New Haven, Connecticut, never planned to kick off a decades-long debate. The study was the first to include comb-jelly genome data, says Dunn, but “going into this we had no idea we would get a result other than sponges are a sister to the rest of animals”. Their 2008 conclusion that comb jellies, not sponges, were the first animal group1 landed like a bombshell. The finding drew two kinds of response, says Dunn. “There was another response where a variety of folks were like, ‘Hey, sponges have always been the sister and they always will be. Dozens more papers followed, some using new data sets, different methods of analysis or both. Journals published essays, perspectives and other expert analyses, while institutional press releases and media coverage sometimes painted each advance as the final word. “It fell into this sort of back and forth of trying to disprove each other, and then with each subsequent pronouncement saying this is the answer,” says King. One is how complex tissues such as nerve, muscle and gut could be present in the first animals but absent in members of some later lineages. One possibility is that these tissues evolved not just once, but independently in multiple lineages. Another option is that those features were present in the first animals, but were lost in later lineages, including sponges. Some biologists think that a similar loss occurred in the placozoa lineage, the modern members of which also lack nervous systems and muscles. The two camps tend to segregate by discipline, observes Antonis Rokas, an evolutionary geneticist at Vanderbilt University in Nashville, Tennessee. Sponge-sister proponents mostly have backgrounds in zoology and evolutionary-developmental biology. For these scientists, the gradual accrual and elaboration of complex traits was compelling. “The signal is low, it has travelled a long distance and there are multiple things that can erode that signal.” These are all that distinguish that first lineage from successive branches before each one starts to follow its own evolutionary path. Modern sponges, such as Aplysina fistularis, are seemingly simple creatures.Credit: Alex Mustard/Nature Picture Library The window might have lasted fewer than five million years, say researchers, less than the time since humans and chimpanzees (Pan troglodytes) shared a common ancestor, meaning that there wasn't long for many changes to accumulate. And the signal's trace faded quickly: after the animal lineages started evolving independently, some 600 million years ago, these early gene variations would have quickly become lost or obscured among newer changes along specific lineages. In a 2021 analysis4, for example, Rokas and his colleagues found that whether a study comes to a sponge-sister or comb-jelly-sister conclusion can depend on which non-animals — called outgroups — are included in the analysis, as well as various assumptions about how different gene sequences evolve. “These are good people earnestly trying to get a good answer, but the puzzle is really hard,” says evolutionary biologist Sean Carroll at the University of Maryland, College Park. Two comb jellies fuse their bodies and then act as one An eye-popping discovery: early vertebrates had four eyes rather than two Street-dog policy in India is barking up the wrong tree Volcanic personality: the man who recognized volcanoes as a planet-shaping force of nature Canny cattle: at least one cow knows how to use tools Still working at 107: supercentenarian study probes genetics of extreme longevity Scalable and multiplexed recorders of gene regulation dynamics across weeks INRAE is recruiting 49 research scientists through open competitions, from January 27 to March 5, 2026. Seeking exceptional Senior/Junior PIs, Postdocs, and Core Specialists globally year-round Two comb jellies fuse their bodies and then act as one An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday. Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. The epigenome of human immune cells is shaped by both genetics and environmental factors, yet the relative contributions of these influences remain incompletely characterized. Here we use single-nucleus methylation sequencing and assay for transposase-accessible chromatin using sequencing (ATAC–seq) to systematically explore how pathogen and chemical exposures, along with genetic variation, are associated with changes in the immune cell epigenome. Distinct exposure-associated differentially methylated regions (eDMRs) and differentially accessible regions were identified, and a significant correlation between these two modalities was observed. Additionally, genotype-associated DMRs (gDMRs) were detected, indicating that eDMRs are enriched in regulatory regions, whereas gDMRs are preferentially located within gene body marks. Disease-associated single-nucleotide polymorphisms were frequently colocalized with methylation quantitative trait loci, providing cell-type-specific insights into the genetic basis of diseases. These findings highlight the complex interplay between genetic and environmental factors in shaping the immune cell epigenome and advance understanding of immune cell regulation in health and disease. The debate between nature and nurture is a long-standing discussion in both biology and society. It centers around the relative impact of genetic inheritance (nature) versus environmental factors (nurture) on human development. While inherited epigenetic marks are passed down through generations, acquired features arise from environmental influences and can alter gene expression without changing the underlying DNA sequence. Understanding the contributions of these two sources of epigenetic variation is crucial for comprehending how genes and environments together shape biological outcomes. The interplay between genetic predispositions and environmental factors shapes biological outcomes1. Previous twin studies based on bulk tissues have estimated that the average heritability of methylation levels at cytosine–guanine dinucleotides (CpGs) across the genome ranges from 5% to 19% in different tissues2,3,4,5. Methylation quantitative trait locus (meQTL) studies have shown associations between genetic variation and methylation at individual CpG sites6,7,8,9,10,11. However, these studies rely on bulk tissues and often use arrays to profile the methylome. The genome-wide, cell-type-specific relationship between genetic variation and methylation has yet to be fully elucidated. To investigate the cell-type-specific contributions of genetic and environmental factors on the human immune cell epigenome, we analyzed 171 peripheral blood mononuclear cell (PBMC) samples from 110 individuals with defined exposures to pathogens (human immunodeficiency virus 1 (HIV-1), influenza A virus (IAV), methicillin-resistant Staphylococcus aureus (MRSA), methicillin-susceptible Staphylococcus aureus (MSSA), anthrax vaccine and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)) and chemicals (organophosphates (OPs)). Seven major immune cell types were isolated using fluorescence-activated cell sorting (FACS), followed by snmC-seq2 and single-nucleus ATAC–seq (snATAC–seq) profiling. We identified exposure-associated differentially methylated regions (eDMRs) and genotype-associated DMRs (gDMRs), which showed distinct genomic enrichments—eDMRs were enriched at enhancers, whereas gDMRs were predominantly found in gene bodies, highlighting divergent regulatory mechanisms. Exposure-specific DMRs exhibited unique epigenomic signatures, with several implicating master transcription factor (TF) binding sites. HIV-1-induced changes in DNA methylation and chromatin accessibility were significantly correlated. Moreover, we observed substantial colocalization between meQTLs and GWAS variants, providing cell-type-resolved insights into disease-associated loci. This comprehensive atlas of exposure- and genetics-associated epigenomic features in human immune cells provides a valuable resource for mechanistic studies of infectious and genetic disease. We collected 171 PBMC samples from 110 individuals who were exposed or not exposed to seven major exposures (Fig. HIV-1-exposed samples were collected from the iPrEx cohort, a phase 3 clinical trial12, which was designed to evaluate the efficacy of pre-exposure prophylaxis (PrEP) for HIV-1 prevention. We analyzed PBMC samples from nine donors at three distinct time points—approximately 200 days before HIV-1 positivity (pre), the day of HIV-1 diagnosis (acu) and approximately 200 days after initiating treatment (cro). We studied prechallenge and 28 days postchallenge (with live H3N2 influenza virus) PBMC samples from 18 donors who received the placebo vaccine. Additionally, we also analyzed PBMC samples from donors who were exposed to SARS-CoV-2 and had severe or nonsevere coronavirus disease 2019 (COVID-19) symptoms. For HIV-1 and IAV, we have internal control samples that are from the same set of donors before infection and collected samples from them after exposure. We also collected PBMC from 12 healthy donors as external controls. For exposures without internal controls (COVID, anthrax vaccine, MRSA/MSSA and OP), all healthy samples were used as controls. We also identified the gDMRs using this dataset. For bacterial exposure, PBMCs from 19 patients who tested positive for MRSA or MSSA were analyzed, yielding a total of 27 samples. We also obtained PBMC samples from 27 vaccinated participants who handled Bacillus anthracis in a controlled Biosafety Level 3 (BSL3) facility while wearing appropriate personal protective equipment. These individuals were trained scientists working in a BSL3 laboratory who had received either BioThrax, an inactivated acellular vaccine primarily composed of the nonpathogenic protective antigen (PA) protein or the anthrax vaccine adsorbed, in accordance with the facility's safety protocols. OPs are a class of pesticides known to have a severe impact on the dopaminergic and serotonergic systems. A common form of this pesticide, that is, chlorpyrifos, has been widely used in the United States. As part of this study, samples were collected from farm workers and nearby residents. By tracking the levels of TCPY over four months, samples from 27 donors were classified as exposed to high, moderate or low levels of OP. PBMCs from these samples were FAC-sorted into major immune cell types (Extended Data Fig. 1) and processed through the snmC-seq2 pipeline as previously described14. A total of 96 cells per sample and cell type were sorted to ensure adequate coverage. To control for potential batch effects, all cell types were sorted onto the same 384-well plate (Extended Data Fig. Single-nucleus methylation data were analyzed using in-house pipelines, as previously described14. A total of 104,000 cells were then clustered using the average CG methylation score in 5-kb bins across the autosomes. Pseudobulk profiles demonstrated high genome coverage and sufficient depth at single-CpG resolution (Extended Data Fig. snATAC–seq data from our companion study were integrated with the methylation data and cell-type labels were transferred. To assess whether exposures are associated with the methylome of each immune cell type, we performed within-cell-type clustering for the major sorted immune cell types. The t-distributed stochastic neighbor embedding (t-SNE), with or without harmony integration, showed similar embedding (Extended Data Fig. This analysis revealed heterogeneity in methylation profiles, resulting in more than ten distinct clusters for each cell type (Fig. Of note, HIV-1, SARS-CoV-2 and MRSA/MSSA exposures were associated with distinct monocyte, CD4 and CD8 naive T cell profiles (Fig. The proportion of immune cells from each exposure in these clusters varied substantially (Fig. For example, we observed a cluster of monocytes enriched in both severe and nonsevere COVID samples. Considering that the biased distribution among clusters of different exposures might also be caused by heterogeneity between individuals rather than the specific exposures, we focused on proportional changes in HIV-1 and IAV samples collected from the same donors before and after infections. While IAV samples have comparable proportions in each cluster across the cell types, HIV-1 infection markedly changed the cell proportions among clusters, indicating that HIV-1 remodeled the global methylome and functional states of these immune cells, particularly NK cells, CD8 memory and naive T cells (Fig. a, UMAP of cells in each cell type from FACS using snmC-seq2 data. The cells are colored according to exposure (HIV, exposure to HIV; HIV_pre, before HIV infection from the same donors; Flu_pos, after IAV infection; Flu_pre, before IAV infection from the same donors; COVID_S, severe COVID patient samples; COVID_nS, nonsevere COVID patient samples; MRSA, samples exposed to MRSA; MSSA, samples exposed to MSSA; BA, samples from the donors that have taken anthrax vaccine and work frequently or infrequently in a controlled BSL3 facility handling B. anthracis; OP, samples from the donors that are exposed to OP). b, Bar plots show the proportions of cells from each group and cell type in the Leiden clusters, colored by the FACS cell types. c, Scatter plot shows the cell proportional changes before and after infection with HIV and IAV in each Leiden cluster. d, UMAP of cells from HIV exposure donors in the cell types that have the most cell proportion changes. The three rows are UMAP of cells from NK cell, Tc-mem and Tc-naive, and the columns are the cells from ‘pre', ‘acu' and ‘cro' stages. Cells from the stage are shown in red and cells from other stages are shown in gray. f, Two clusters of monocytes were identified. Statistical tests were done using the two-sided chi-square test. g, Gene body methylation levels at classical and nonclassical monocyte markers in the two clusters of monocytes. h, GO enrichment using the hypergeometric test with Benjamini–Hochberg FDR correction (implemented in Metascape) for DMGs between the two clusters of monocytes. To further validate the changes in functional states of different immune cell types associated with HIV-1 infection, we performed within-cell-type clustering of HIV-1 samples. The clustering revealed that cells from the three stages of HIV-1 infection (pre, acu and cro) were unevenly distributed among the clusters, with biased clusters exhibiting different global methylation levels (Fig. To further characterize the identity of each subtype that is significantly differentially distributed among the three stages (false discovery rate (FDR) < 0.05, Fisher's exact test), we identified differentially methylated genes (DMGs) in each cluster compared to all other clusters and performed functional enrichment analysis of these genes. Immune cell activation and differentiation-related functions are enriched among these clusters (Fig. The CD8 memory T cell cluster, specifically enriched in ‘acu' stage cells, is enriched in positive regulation of immune response. The monocyte cluster uniquely enriched with COVID samples and depleted in controls and other exposures is likely associated with this specific exposure (Fig. Almost half of the monocytes from both severe and nonsevere COVID-19 samples are separated in these two clusters (Fig. 2f), significantly more than the control samples (P = 2.05 × 10−237, Fisher's exact test). Gene body methylation levels are highly similar between classical and nonclassical monocyte markers (Fig. We identified 321 DMGs between ‘monocyte1' and ‘monocyte2' (Extended Data Fig. Functional enrichment of these genes showed that genes hypomethylated in both monocyte clusters are enriched in pro-inflammation functions like ‘IL-18 signaling pathway' and ‘regulation of interleukin-1 (IL-1) production' (Fig. Both IL-18 and IL-1 have been reported to have protective roles during mouse coronavirus infection15, whereas IL-1 has a pivotal role in the induction of cytokine storm as a result of uncontrolled immune responses in SARS-CoV-2 infection16. Moreover, in addition to IL-1 and IL-18 production-related functions, DMGs in ‘monocyte2' are also enriched for ‘phagocytosis' and ‘endocytosis' (Fig. 2h), indicating the antigen presentation function of this cluster, which is specifically enriched in COVID-19 monocytes. We refined our cell types by integrating methylation information with FACS. B cells and NK cells can be separated into two clusters by global CG methylation level (Fig. The higher methylated clusters were annotated as a naive state (B-naive and NK-naive), while the cluster with lower methylation level of B cells is annotated as memory B cells (B-mem) and a lower methylated cluster of NK cells is defined as the active state (NK-active)17. We observed some inconsistency between FACS and methylation profile clustering, with some naive T cells clustered together with memory T cells, indicating a transition of cell states in some naive T cells (Fig. Some sorted NK cells (CD56+ or CD16+ and CD14−) exhibited methylation profiles similar to monocytes, suggesting that they may represent CD16+ monocytes. These cells were excluded from the downstream analysis. We also observed bias among different exposures in the Uniform Manifold Approximation and Projection (UMAP) representation (Extended Data Fig. Nevertheless, our clustering results remained highly consistent following integration-based correction for donor and batch effects (Extended Data Fig. Cells are colored by the global methylation level of each cell. Cells are colored by the FACS cell types. c, The bar plots show the hypomethylated (top plot) and hypermethylated (bottom plot) eDMR counts. ‘Hypo' indicates the eDMRs are hypomethylated in exposures compared to controls. d, Scatter plot shows the two top PCs of ethnicity-dependent MRSA/MSSA eDmRs. e, Venn diagram shows the overlap of ethnicity-dependent MRSA/MSSA eDMRs between individuals of African and European genetic ancestry. f, Scatter plot shows the two top PCs of ethnicity-dependent COVID-19-associated eDmRs. g, Venn diagram shows the overlap of ethnicity-dependent COVID-19-associated eDMRs between individuals of African and European genetic ancestry. h, Dot plot shows the enrichment of eDMRs from COVID, HIV and MSSA in histone modification peaks. Each column shows the hypo-eDMRs in that condition. i, Dot plot shows the motif enrichment from HOMER, which uses the one-sided cumulative hypergeometric test on eDMRs from each exposure and cell type. Given the difficulty of controlling other exposures each individual may have experienced and the donors' diverse genomes, we used internal controls for HIV-1 and IAV exposures, for which we have cells before and after exposure. We used a stringent pipeline to identify the eDMRs of other exposures (Methods), in which the external controls are used. These eDMRs were not significantly different between any two sets of controls, minimizing the contribution of individual genetic variation and baseline exposures to these eDMRs. We identified 756,575 eDMRs across all exposures and cell types, including 517,698 hypomethylated and 238,877 hypermethylated eDMRs (Fig. On average, each exposure and cell type exhibited approximately 10,000 eDMRs. To control for the false positive rate, we performed sample label shuffling and found that our identified eDMRs were substantially more significant than DMRs detected in shuffled samples (Extended Data Fig. We examined the effect-size distributions of exposure-associated eDMRs to assess the magnitude of methylation change (Extended Data Fig. HIV-1 induced broad, multimodal shifts, particularly during early infection. MRSA/MSSA and anthrax vaccine exposures had distinct peaks, while OP caused highly variable changes. COVID responses were skewed by severity and IAV showed a symmetric, trimodal pattern. These distributions illustrate the magnitude of methylation changes associated with each exposure. Although the externally controlled exposures involved sizable cohorts, donor variability in genetic background, sex and age could still confound the results. To address this, methylation levels were adjusted for these covariates and eDMRs significantly associated with them were excluded, resulting in 271,592 hypomethylated and 138,551 hypermethylated eDMRs that more likely reflect exposure-specific effects. Notably, in both MRSA/MSSA and SARS-CoV-2 exposures, individuals of African genetic ancestry showed stronger methylation effects (Fig. 3d,e), with most eDMRs unique to each group (Fig. We further examined the association between eDMRs and histone modification peaks from ENCODE21 in each cell type. We found that both hypomethylated and hypermethylated eDMRs across all exposures are slightly enriched in heterochromatin regions marked by H3K9me3 (Fig. Notably, COVID-19-associated hypermethylated eDMRs in monocytes, naive T cells and NK-naive cells showed significant enrichment in enhancer marks like H3K27ac and H3K4me1, as well as the promoter mark H3K4me3, across all immune cell types (Fig. Similar patterns were observed for HIV-1, with hypomethylated eDMRs in CD8 memory T cells and hypermethylated eDMRs in CD8 naive T cells showing enrichment in these active histone marks (Fig. Furthermore, MSSA monocyte hypermethylated eDMRs were enriched in H3K27ac. In contrast, NK-active cells from MSSA-infected individuals (but not MRSA-infected ones) showed enrichment in enhancer regions. Interestingly, naive T cells from severe COVID-19 patients had hypomethylated eDMRs with greater enrichment in active histone marks than those from nonsevere patients (Fig. These eDMRs are generally depleted at CG islands, promoters and short interspersed nuclear elements but slightly enriched in DNA, long interspersed nuclear elements and long terminal repeat transposable elements (Extended Data Fig. DNA methylation influences TF binding to DNA22,23, potentially altering gene expression. We observed that master TF motifs were enriched in COVID-19 monocytes and CD4 naive T cells. Similarly, CEBP family TF motifs are specifically enriched in MRSA/MSSA monocytes26 (Fig. These results suggest that methylomes associated with exposures might be able to inhibit the binding of master TFs. On the other hand, hypomethylated eDMRs across the exposures share many similarly enriched motifs (Extended Data Fig. Although further validation is needed, ChIP-seq profiles from previous studies27 showed significant enrichment of eDMRs within relevant TF binding sites (Extended Data Fig. To assess functional relevance, we examined their overlap with differentially expressed genes from prior HIV-1 and SARS-CoV-2 studies28,29 and found significant enrichment, especially in monocytes and T cells, compared to random genes or CpGs (Extended Data Fig. We integrated snATAC–seq data from paired and unpaired donors across all exposure groups, except for MRSA and MSSA. For the HIV-1 exposure, in which samples were collected at different time points from the same donors, we integrated snDNA methylation data with snATAC–seq data using 5-kb bins on autosomes. UMAP plots generated before integration showed clear batch effects between the two modalities, which were partially mitigated following integration (Extended Data Fig. We mapped cells from single-cell ATAC–seq to methylation clusters and transferred cell-type labels via canonical correlation analysis30 (Fig. We observed a strong genome-wide correlation between the two modalities across all cell types, with the highest correlation observed in monocytes (Extended Data Fig. Interestingly, we detected a loss of methylation and an increase in accessibility after HIV-1 infection in memory CD8 T cells at the intron of DGKH (Fig. 4c), a gene previously reported to exhibit differential methylation between elite controllers and individuals receiving antiretroviral therapy31. Although this region experienced a loss of chromatin accessibility, the methylation level remained unchanged between the ‘acute' and ‘chronic' stages (Fig. We found a considerable fraction of eDMRs are consistent with changes in chromatin accessibility (Fig. The highest overlap (25.6%) between these two modalities was observed in ‘pre' stage hypo-eDMRs in CD8 naive T cells. a, UMAP of cells from HIV exposure using single-nucleus methylation when integrating with snATAC–seq data. b, UMAP of cells from one HIV donor sample after integration with single-nucleus methylation data. c, A genome browser view of a region at the DGKH gene that has consistent changes after HIV exposure in DNA methylation and chromatin accessibility in CD8 memory T cells. d, The overlap between hypo-eDMRs and gained ATAC–seq peaks in each condition in the corresponding cell types. We extended our analysis to additional exposures and found that exposure-associated differentially accessible regions overlapped with corresponding exposure-associated eDMRs by approximately 1–6% (Extended Data Fig. These findings underscore the importance of longitudinal data for robust identification of exposure-associated epigenomic changes. We also used this single-cell epigenomic atlas to identify gDMRs. We identified the SNPs of each individual using the DNA methylation data with biscuit32, followed by imputation and filtering (Methods). To confirm the accuracy of the SNP calls obtained using this strategy, we compared the SNPs from whole-genome sequencing (WGS) and methylation data from a previous study33. The results showed a strong agreement with WGS SNPs in a 10-Mb region, with low rates of false positives and false negatives (Extended Data Fig. 8a) and high correlation in alternate allele frequency at the overlapped SNPs (Extended Data Fig. Running SNP calls on methylation on GM12878 also show high accuracy compared to the true set (Extended Data Fig. We first identified differentially methylated regions (DMRs) at CpG sites within each cell type and quantified methylation levels for each individual. DMRs and SNPs were filtered, followed by performing meQTL analysis as described (Methods). After stringent filtering of the meQTL–DMR pairs (Supplementary Tables 3–11), we identified 275,283 gDMRs across all nine cell types, of which 214,933 were cis-correlated with SNP and 60,350 were trans-correlated (Supplementary Table 12). The number of cis-gDMRs is comparable across different cell types except CD8 memory T cells, which have more gDMRs compared to other cell types (Fig. The trans-gDMRs are mostly identified in various T cell types (Fig. The cis-gDMRs and trans-gDMRs in T cells are mostly single-CpG sites (Extended Data Fig. The number of gDMRs is correlated with the number of cells sequenced in each cell type (Extended Data Fig. We did an enrichment analysis between these two sets of DMRs on different histone marks. In contrast to eDMRs at enhancer marks, gDMRs are predominantly enriched at gene body mark H3K36me3 peaks, especially in memory state lymphocytes (B cell, CD4 and CD8 T cells; Fig. 5b), while eDMRs are enriched at enhancer and promoter regions in naive lymphocytes. We also did enrichment analysis of them at the loop anchors in different immune cells from ENCODE, which showed that gDMRs are more enriched at loop anchors in memory state lymphocytes and eDMRs are more associated with naive lymphocytes (Fig. This indicates that eDMRs and gDMRs might regulate gene expression in different cell types through distinct mechanisms. Both eDMRs and gDMRs exhibit similar genomic feature enrichment regarding genes and transposable elements (Extended Data Fig. a, Bar plot shows the counts of gDMRs in each cell type. b, Dot plot shows the enrichment from Fisher's exact test of gDMRs in histone modification peaks from each cell type. Each column represents a cell type and each row contains one histone modification peak for that cell type. c, Dot plot shows the enrichment using Fisher's exact test of gDMRs in chromatin loops. d, Metascape using one-sided hypergeometric test, GO enrichment of the genes that overlap with eDMRs or gDMRs. Results for eDMRs and gDMRs in all cell types are sorted in the same order. We performed a functional enrichment analysis on genes with DMRs that overlap H3K36me3 peaks. Both eDMRs and gDMRs were significantly enriched in housekeeping and immune-related functions, with immune functions more significant in eDMRs (Fig. This distinction highlights the specific impact of environmental and genetic factors on immune-related gene regulation through epigenetic modifications. Besides chromatin states, the enriched motifs also differ between eDMRs and gDMRs. While both hypomethylated and hypermethylated eDMRs are enriched mainly in immune-related TF motifs, gDMRs do not exhibit enrichment of immune TF motifs (Extended Data Fig. Using gDMRs as the background in Homer analysis, we identified significant enrichment of RUNX and ETS family TF motifs—including PU.1, ETS1 and Fli1—within eDMRs. These results suggest that eDMRs are primarily enriched with key TF binding sites in immune cells, potentially having a role in regulating gene expression through TF binding. To investigate the association of gDMRs with human diseases and immune-related traits, we performed colocalization analysis between our meQTLs and GWAS SNPs (Methods) linked to various traits (Supplementary Table 13). We identified many colocalized GWAS SNPs and meQTLs within these immune cell types, suggesting potential cell-type-specific regulatory connections between methylation changes and genetic variants associated with immune functions. Enrichment analysis of GWAS SNPs within the meQTLs of each cell type revealed a predominant enrichment in CD8 naive T cells, as well as in CD4 memory and naive T cells (Extended Data Fig. This suggests that these specific immune cell types are particularly influenced by genetic variants associated with immune-related traits and diseases. For example, GWAS SNPs associated with Gallstone disease and Eczema are enriched in CD8 naive T cells and CD4 memory T cells meQTLs that colocalize with the GWAS SNPs (Extended Data Fig. We further linked the gDMRs to specific diseases and phenotypes (Extended Data Fig. 9b) through colocalization analysis between meQTLs and phenotype-associated GWAS SNPs. The analysis revealed that most gDMRs are associated with only a single phenotype (Extended Data Fig. Gallstone disease and Eczema showed the highest number of associated gDMRs, particularly in T cells, highlighting the potential role of T cell–specific epigenetic regulation in mediating genetic risk for these immune-related diseases. We further linked the gDMRs with gene expression (Methods) by performing summary-based Mendelian randomization (SMR)34 analysis and identified 252,598 substantial associations (Supplementary Table 14). This analysis enabled us to uncover cell-type-specific regulatory connections between gDMRs and various diseases and phenotypes. For instance, SNPs associated with eczema disease showed colocalization with multiple meQTLs across various immune cell types in a cell-type-specific manner. Most of the GWAS SNPs only colocalize with meQTL in one cell type (Fig. 6a), indicating a high degree of cell-type specificity in the regulatory effects of these GWAS SNPs on the epigenome. This suggests that the impact of genetic variants on DNA methylation, and consequently on gene regulation, can be highly specialized and confined to particular immune cell types. Notably, the top eczema-associated SNP, rs10791824, colocalizes with both a meQTL and an expression QTL (eQTL; Fig. a, Upset plot shows the number of meQTLs that are colocalized with GWAS SNPs in each cell type or multiple cell types. b, Genotype-disease association P values in the eczema GWASs (top) at EFEMP2 locus, eQTL (middle) associated with EFEMP2 expression and meQTL signal associated with a DMR in naive T cell (bottom), P values are from GWAS associations (Wald test), eQTL test (empirical P values via permutations) and meQTL test (empirical P values via permutations). Our study provides a comprehensive, exposure-driven atlas of human immune cells, revealing how genetic and environmental factors shape the epigenomes. We constructed an intricate epigenomic atlas using snmC-seq2 and ATAC–seq, revealing the epigenomic features associated with different exposures. This comprehensive approach enabled us to identify eDMRs and gDMRs, dissecting the roles of these two factors in shaping the epigenome of human immune cells. We also identified significant colocalization between meQTLs and GWAS-associated disease SNPs, uncovering potential cell-type-specific epigenetic mechanisms underlying these SNPs. Genetic factors and exposome have long been recognized to shape epigenomes1. While previous studies have identified DNA methylation changes associated with genetic1,6,7,8 or environmental exposures1,35,36, our study uniquely examines both eDMRs and gDMRs within the same group of donors. This approach allows for a more reliable comparison of changes driven by these two factors. The differential enrichment of eDMRs and gDMRs on the chromatin indicates that genetics and environments may regulate gene expression differently. Although we cannot fully disentangle the effects of genetics from different exposure histories in our donors, genetic factors exert a stronger regulatory influence on the gene body in memory lymphocytes. Our study highlights the intricate contributions of both genetic and environmental factors in shaping the immune cell epigenome. We identified distinct, exposure-specific eDMRs that reflect how immune cells respond to various pathogens and chemicals, while accounting for genetic background. These findings improve our understanding of immune regulation and provide a valuable resource for investigating the molecular mechanisms of environmental exposures. The eDMRs may also serve as biomarkers for specific exposures, with potential diagnostic and therapeutic applications. However, the mechanisms by which genetics and environment interact to influence the epigenome remain incompletely understood. Addressing this gap will be essential for fully elucidating how gene–environment interactions shape immune function and contribute to disease, fully addressing the ‘nature and nurture' question in human disease. Some exposure groups had relatively small sample sizes, which may limit statistical power and generalizability. In addition, the absence of information on other environmental exposures or longitudinal data for certain exposures restricts our ability to assess temporal dynamics and causality. Future studies with larger, more balanced cohorts and longitudinal designs will be essential to validate and extend our findings. Cells were sorted into 384-well plates using FACS based on their specific antibody labeling. The FACS antibody cocktail allowed for the identification of seven different immune cell types in blood (Extended Data Fig. The SONY Multi-Application Cell Sorter LE-MA900 Series was used to isolate single cells in 384-well PCR plates containing protein kinase. For library preparation, we followed the previously described methods for bisulfite conversion and library preparation in snmC-seq2 (refs. The snmC-seq2 libraries generated from isolated immune cells were sequenced on an Illumina Novaseq 6000 using S4 flow cells in 150-bp paired-end mode. Freedom EVOware (v2.7) was used for library preparation, while Illumina MiSeq control software (v3.1.0.13) and NovaSeq 6000 control software (v1.6.0) and Real-Time Analysis (RTA; v3.4.4) were used for sequencing. snATAC–seq was performed as previously described38, using either the Chromium Next GEM Single Cell ATAC Library & Gel Bead Kit v1.1 (10x Genomics, 1000175) with the Chromium Next GEM Chip H (10x Genomics, 1000161) or the Chromium Single Cell ATAC Library & Gel Bead Kit (10x Genomics, 1000110). Libraries were sequenced on an Illumina NovaSeq 6000 system (1.4 pM loading concentration) using a 50 × 8 × 16 × 49 bp read configuration, targeting an average of 25,000 reads per nucleus. For alignment and QC of the single-cell methylation data, we used the same mapping strategy used in our previous single-cell methylation projects in our lab39. Specifically, we used our in-house mapping pipeline, YAP (https://hq-1.gitbook.io/mc/), for all the mapping-related analysis. The pipeline includes the following main steps: (1) demultiplexing FASTQ files into single cells with Trim Galore (v4.4), (2) reads-level QC, (3) mapping with bismark (v0.20.0), (4) BAM file processing and QC with samtools (1.17) and Picard MarkDuplicates (v3.0.0), and (5) generation of the final molecular profile. Detailed descriptions of these steps for snmC-seq2 can be found in ref. All the reads were mapped to the human hg38 genome, and we calculated the methylcytosine counts and total cytosine counts for two sets of genomic regions in each cell after mapping. We filtered out low-quality cells based on three metrics generated during mapping—mapping rate >50%, final mC reads >500,000 and global mCG >0.5. Chromosomes X, Y and M were excluded from the analysis and the remaining genome was divided into 5-kb bins to create a cell-by-bin matrix. In this matrix, each bin was assigned a hypomethylation score (hyposcore) calculated from the P values of a binomial test, which indicates the probability of hypomethylation of that bin. The matrix was further binarized for downstream analysis using a hyposcore cutoff ≥0.95. Hyposcore measures the likelihood of observing greater than m methylated reads under the assumption that methylation follows the binomial distribution with parameters c and p. where m is the observed number of methylated count for region i, c is the coverage (total count) covering region i, n is the total number of 5-kb bin regions and p is the expected probability of methylation for this cell. The calculation of hyposcore was implemented in ALLCools (v1.1.1, https://lhqing.github.io/ALLCools/intro.html) using SciPy40. Bins covered by fewer than five cells and those with any absolute z score (the number of cells with nonzero values) >2 were filtered out. Additionally, we excluded bins that overlapped with the ENCODE blacklist using ‘bedtools intersect' (Dale, Pedersen and Quinlan 2011; Quinlan and Hall 2010). To perform unsupervised clustering, we used ALLCools39, which first conducted PCA on the 5-kb bin matrix. For each exposure, we selected the top 32 principal components (PCs) for clustering using the modules in scanpy. In the HIV-1 and influenza cohorts, we observed a donor effect in the clustering results with these PCs. Therefore, we applied harmony41 to correct the donor effect on these PCs. To annotate the cells, we used both the single-cell methylation clustering results and cell-surface markers. In almost every cohort, we observed two clusters of B cells and NK cells, which were distinguished by their global mCG levels. Therefore, we assigned these clusters as naive and memory B cells, naive and active NK cells. We also merged clusters with cell-surface markers indicating memory CD4 and CD8 T cells, even if they exhibited multiple clusters in the t-SNE embedding. To identify DMRs associated with each immune cell type, we analyzed PBMCs from healthy donors. Based on single-cell methylation and FACS, we identified nine cell types through clustering. These cell types were grouped based on their global mCG levels, and DMRs were called separately within high-mCG and low-mCG cell types. We used methylpy (v1.4.6, https://github.com/yupenghe/methylpy) for DMR calling and the resulting DMRs were further annotated with genes and promoters. To identify DMRs associated with each exposure, we merged the control samples and samples from each exposure group. We used methylpy (https://github.com/yupenghe/methylpy) to identify DMRs between the control and exposure groups and between different exposure groups. Once we obtained the primary set of DMRs, we calculated the methylation levels of all samples at these DMRs using ‘methylpy add-methylation-level'. Additional filtering of the DMRs was performed by comparing methylation levels across sample groups using Student's t test. Only DMRs with a minimum P value <0.05 between any two groups were retained. For DMRs associated with MRSA/MSSA, BA, OP and SARS-CoV-2, where external controls were used for DMR calling, we compared the methylation levels of exposure samples and control samples, as well as different cohorts of controls (HIV, Flu and commercial controls). DMRs that showed significant differences (P < 0.05) between the exposure group and all three control cohorts, but no significant differences (P > 0.05) between any two control cohorts, were retained. To visualize complex heatmaps, we used PyComplexHeatmap (https://github.com/DingWB/PyComplexHeatmap)38,42. Hypomethylated DMRs in the corresponding sample groups and cell types were labeled for better visualization. The heatmap rows were split according to sample groups and the columns were split based on DMR groups and cell types. Within each subgroup, rows and columns were clustered using Ward linkage and the Jaccard metric. To validate that the identified DMRs for each exposure were not confounded by batch effects or other factors, we shuffled the group labels of the samples within each exposure and identified DMRs among the randomly assigned groups. To quantify the magnitude of methylation differences across exposure groups, we calculated Cohen's d effect sizes for each DMR using pairwise comparisons. For each DMR, methylation values were extracted across the three defined groups. Cohen's d was computed using the pooled standard deviation formula: The pooled variance was calculated with Bessel's correction to account for sample size differences. These effect sizes provide an interpretable measure of methylation divergence independent of sample size or statistical significance. We merged the BAM files from the same donors and SNP calling was performed using Biscuit's32 variant calling function. This process identifies SNPs in both CpG and nonCpG contexts by analyzing the bisulfite-treated reads. Biscuit distinguishes between methylated cytosines and actual C/T polymorphisms, reducing the risk of false positives. To increase variant density and coverage, we imputed SNPs using Minimac4, referencing the 1,000 Genomes Phase 3 panel. Postimputation, only high-confidence SNPs present in dbSNP were retained to ensure reliability and compatibility with downstream analyses. Standard variant filtering was applied to remove low-confidence SNPs. We excluded SNPs with a minor allele frequency (MAF) below 0.05. Additionally, SNPs overlapping with regions in the blacklist were filtered out. We performed PCA of genotypes following standard best practices to control for population structure as follows: Linkage disequilibrium (LD) pruning—we used PLINK 2 to filter variants to high-quality, common biallelic SNPs (--snps–only just-a–t, --max-alleles 2, --geno 0.02 and --maf 0.05) and excluded regions of extended LD (high-LD-regions-hg38.bed). We then applied LD pruning with a 200-SNP window, step size of 50 SNPs and an r2 threshold of 0.1 (--indep-pairwise 200 50 0.1). Preparation of input genotypes—the pruned SNP set was extracted to generate a reduced variant call format (VCF) for PCA, ensuring that only informative and independent variants were retained. The primary analysis was conducted using QTLtool with the options --center, --scale, --maf 0.05 and --distance 0, which ensures mean centering, scaling and consistent handling of pruned SNPs. For cross-validation, we also performed PCA with PLINK 2 (--pca 20 approx var-wts), which produced highly concordant results. This approach ensured robust inference of genetic PCs, minimizing the impact of LD and technical artifacts and provided reliable covariates for downstream QTL and association analyses. To identify meQTLs associated with DMRs, we used QTLtools (v2.0-7-g61a04d2c5e)43. DMRs for each cell type were identified across the 110 donors using methylpy. We used QTLtools in nominal mode to calculate the association between genotype (SNP) and methylation levels within DMRs. This method tests all SNP–DMR pairs within a specified genomic window (1 Mb) around the DMRs, reporting nominal P values for each pair. Associations were considered significant at an FDR threshold <0.01. To assess the bias in the proportion of COVID-19 samples across different monocyte clusters, a chi-square test was applied. This test helped determine whether the distribution of samples from severe and nonsevere COVID-19 patients differed significantly from that of control samples, with results indicating a strong statistical difference (P = 2.05 × 10−237, Fisher's exact test). We used TensorQTL in trans mode to efficiently test large-scale genotype–methylation associations. By default, TensorQTL computes parametric P values using linear regression and applies Benjamini–Hochberg FDR correction. In both analyses, we included covariates such as age, sex, the first five PCs of genotypes and exposures. Covariate adjustment was performed using the QTLtools built-in method for linear model regression. HOMER (v5.1) was used to identify enriched motifs within these different sets of DMRs for each exposure. The results from HOMER's ‘knownResults.txt' output files were used for downstream analysis. Only motif enrichments with a P value <0.01 were retained. Motif enrichment results were visualized using scatter plots generated with Seaborn. Pairwise DMG analysis for each exposure was performed using ALLCools, following the tutorial (https://lhqing.github.io/ALLCools/cell_level/dmg/04-PairwiseDMG.html). Significantly DMGs were selected based on an FDR < 0.01 and a delta mCG > 0.05. We also used linear regression (mCG ~ annotation + age + sex + ethnicity) to identify the genes associated with the two clusters of monocytes, regressing out age, sex and ethnicity of the donors. This integration was performed using canonical correlation analysis based on 5-kb bins, where we transferred our methylation cell annotations to the cells from the other modality. To generate the peaks and BigWig files for each cell type, we used SnapATAC2 (refs. To assess the correlation between single-cell methylation and single-cell ATAC, we calculated the correlation between the hyposcore of each 5-kb bin and Tn5 insertions in each bin. Summary statistics of GWAS were downloaded from the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/)47, including 29,401 studies and 25,111 traits. We performed colocalization analysis with coloc48 (v5.2.3) using default priors to calculate the probability that both the meQTL and GWAS traits share a common causal variant. The posterior probability (PP4) of a single causal variant associated with both DMR and GWAS traits was used to identify significant colocalizations (PP4 > 0.50). A high PP4 value indicates strong evidence for shared causality. R packages locuscomparer (v1.0.0)49 and locuszoomr (v0.3.1)50 were used to visualize the colocalization results. To assess whether meQTL and GWAS SNPs were significantly overlapping for each DMR–trait pair, we performed chi-squared tests using the ‘stats.chi2_contingency' function from the Python package SciPy40. Resulting P values were adjusted for multiple testing using the Benjamini–Hochberg method. To investigate the relationship of eDMRs and gDMRs on gene expression, we performed SMR (v1.0) analysis34 by integrating our unfiltered meQTLs with the eQTLs derived from whole-blood gene-expression levels generated from the eQTLGen consortium34,51. By setting the exposure as DMR and outcome as gene expression in the SMR analysis, we tested whether genetic variants associated with DMRs are also associated with expression levels of nearby genes, providing evidence for associations between the two. We did not conduct HEIDI analysis due to the differences between the two sample populations. This does not rule out the possibility that some of the DMR–gene associations identified in our SMR analysis are due to linkage rather than pleiotropy or causality. SMR associations with a P value below the Bonferroni-adjusted alpha level (α = 0.05/n) were considered significant. Default parameters were used to run SMR. We further filtered out associations in the HLA region (chr6: 28,477,797–33,448,354). Enrichment tests were conducted using Fisher's exact test to evaluate the distribution of DMRs across exposures and cell types. This statistical approach was selected due to the small sample sizes in some groups and the need for exact calculations without relying on large-sample approximations. To assess the bias in the proportion of COVID-19 samples across different monocyte clusters, a chi-square test was applied. This test helped determine whether the distribution of samples from severe and nonsevere COVID-19 patients differed significantly from that of control samples, with results indicating a strong statistical difference (P = 2.05 × 10−237, Fisher's exact test). For each DMR, effect sizes were calculated using Cohen's d to quantify the magnitude of methylation differences among exposure groups. Cohen's d was computed for pairwise comparisons across groups (for example, exposure versus control), using the pooled standard deviation formula. We performed pairwise differential methylation analysis between exposure groups using the methylpy package. Statistical significance was determined by calculating P values for each comparison, with a threshold of 0.05. Further, we applied FDR correction to adjust for multiple comparisons. The study was conducted with approval from the Salk Institutional Review Board (IRB) under protocol 18-0015 titled 'Single Cell Analysis for Forensic Epigenetics (SAFE)'. Research activities were covered under Salk's Federalwide Assurance (FWA) for the Protection of Human Subjects (FWA00005316). Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. De-identified molecular data and associated sample metadata generated in this study are available through controlled access via the Database of Genotypes and Phenotypes (dbGaP) under accession phs003204.v1.p1 (https://dbgap.ncbi.nlm.nih.gov/beta/study/phs003204.v1.p1/). Access to these data is subject to approval by the dbGaP Data Access Committee in accordance with NIH policies on the sharing of human genomic data. The single-cell ATAC–seq data were generated in our companion study52 and the processed data used in this study have been deposited in the NCBI Gene Expression Omnibus (GEO) under accession GSE306525 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE306525). All other data supporting the findings of this study are available within the article and its Supplementary Information files. Source data are provided with this paper. Ramos, R. G. & Olden, K. Gene–environment interactions in the development of complex disease phenotypes. Global analysis of DNA methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements. Bell, J. T. et al. Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population. Gordon, L. et al. Neonatal DNA methylation profile in human twins is specified by a complex interplay between intrauterine environmental and genetic factors, subject to tissue-specific influence. Shang, L. et al. meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans. Hawe, J. S. et al. Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function. Villicaña, S. & Bell, J. T. Genetic impacts on DNA methylation: research findings and future perspectives. Zhang, T. et al. Cell-type-specific meQTLs extend melanoma GWAS annotation beyond eQTLs and inform melanocyte gene-regulatory mechanisms. A. et al. Genetic control of DNA methylation is largely shared across European and East Asian populations. Analysis of blood methylation quantitative trait loci in East Asians reveals ancestry-specific impacts on complex traits. Grant, R. M. et al. Preexposure chemoprophylaxis for HIV prevention in men who have sex with men. Thistlethwaite, W. et al. Innate immune molecular landscape following controlled human influenza virus infection. B., Elliott, R. & Weiss, S. R. Role of the inflammasome-related cytokines IL-1 and IL-18 during infection with murine coronavirus. Mardi, A., Meidaninikjeh, S., Nikfarjam, S., Majidi Zolbanin, N. & Jafari, R. Interleukin-1 in COVID-19 infection: immunopathogenesis and possible therapeutic perspective. The DNA methylation profile of activated human natural killer cells. Racial disparities in COVID-19 outcomes among black and white patients with cancer. Racial disparities in invasive methicillin-resistant Staphylococcus aureus infections, 2005–2014. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Rimoldi, M. et al. DNA methylation patterns of transcription factor binding regions characterize their functional and evolutionary contexts. Domcke, S. et al. Competition between DNA methylation and transcription factors determines binding of NRF1. Nadeau, S. & Martins, G. A. Conserved and unique functions of Blimp1 in immune cells. Peripheral CD4+ T cell subsets and antibody response in COVID-19 convalescent individuals. Zou, Z., Ohta, T. & Oki, S. ChIP-Atlas 3.0: a data-mining suite to explore chromosome architecture together with large-scale regulome data. Integrated single-cell analysis of multicellular immune dynamics during hyperacute HIV-1 infection. Ren, X. et al. COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Stuart, T. et al. Comprehensive integration of single-cell data. B. et al. HIV-specific CD8 T cells from elite controllers have an epigenetic imprint that preserves effector functions. Zhou, W. et al. BISCUIT: an efficient, standards-compliant tool suite for simultaneous genetic and epigenetic inference in bulk and single-cell studies. Tian, W. et al. Single-cell DNA methylation and 3D genome architecture in the human brain. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Castillo-Fernandez, J. E., Spector, T. D. & Bell, J. T. Epigenetics of discordant monozygotic twins: implications for disease. & Luca, F. Genotype x environment interactions in gene regulation and complex traits. Luo, C. et al. Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex. Hickey, J. W. et al. Organization of the human intestine at single-cell resolution. Liu, H. et al. DNA methylation atlas of the mouse brain at single-cell resolution. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Ding, W., Goldberg, D. & Zhou, W. PyComplexHeatmap: a Python package to visualize multimodal genomics data. Delaneau, O. et al. A complete tool set for molecular QTL discovery and analysis. Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Fang, R. et al. Comprehensive analysis of single cell ATAC–seq data with SnapATAC. A fast, scalable and versatile tool for analysis of single-cell omics data. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations with gene expression complicate GWAS follow-up. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. B. et al. Dissecting the epigenome dynamics in human immune cells upon viral and chemical exposure by multimodal single-cell profiling. This work was supported by the Defense Advanced Research Projects Agency (DARPA) through the DARPA Epigenetic Characterization and Observation (ECHO) program for the project Single-cell Analysis for Forensic Epigenetics (SAFE), administered by the US Army Research Office under cooperative agreement W911NF-19-2-0185. is an investigator of the Howard Hughes Medical Institute. We thank former DARPA Biological Technologies Office Program Manager E.V. Gieson, as well as the current leadership team, including J.-P. Chretien and T. Thomou, for their guidance and insightful comments. acknowledges funding support from the National Institutes of Health (NIH; grants P50-HG007735, UM1-HG009442 and UM1-HG009436). was supported in part by the National Institute of General Medical Sciences Predoctoral Basic Biomedical Sciences Research Training Program T32 GM145427. We thank all anonymous donors who contributed biological samples to this project through our collaborators. We thank S. Kaech for consultations on the PBMC selection strategy, and C. O'Connor and L. Boggeman at the Salk FACS Core for their assistance in standardizing the gating strategy. This work used the Stampede2 supercomputing resources at Texas Advanced Computing Center (TACC) through the Extreme Science and Engineering Discovery Environment (XSEDE) and the Anvil supercomputing resources at Purdue University's Rosen Center for Advanced Computing (RCAC), made available through the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program. These resources were funded by the National Science Foundation (NSF) under grants 1540931 and 2005632, respectively, and were assessed via a research allocation to the Ecker Lab (grant MCB130189). We thank M. Gujral, L. Huang and R. Scott, and other engineers at TACC and RCAC, for their assistance in porting and optimizing computational tools on XSEDE resources, provided through the XSEDE Extended Collaborative Support Service (ECSS) program. We thank K. Claypool, R. Jaimes, A. Michaleas, D. Ricke, P. Fremont-Smith and T. White from MIT Lincoln Laboratory for providing data warehousing support. These authors contributed equally: Wenliang Wang, Manoj Hariharan, Wubin Ding. Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA Wenliang Wang, Manoj Hariharan, Wubin Ding, Anna Bartlett, Cesar Barragan, Rosa Castanon, Ruoxuan Wang, Vince Rothenberg, Haili Song, Joseph R. Nery, Jordan Altshul, Mia Kenworthy, Hanqing Liu, Wei Tian, Jingtian Zhou, Qiurui Zeng, Huaming Chen & Joseph R. Ecker Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA Duke University School of Medicine, Bryan Research Building, Durham, NC, USA Integrative Cellular Biology & Bioinformatics Lab, Saarland University, Saarbrücken, Germany Center for Infectious Disease Diagnostics and Innovation, Division of Infectious Diseases, Duke University Medical Center, Durham, NC, USA Micah T. McClain, Thomas W. Burke, Elizabeth A. Petzold, Christopher W. Woods, Vance G. Fowler Jr. & Felicia Ruffin Department of Civil and Environmental Engineering, Pratt School of Engineering, Duke University, Durham, NC, USA Terasaki Institute for Biomedical Innovation, Los Angeles, CA, USA Duke Clinical Research Institute, Durham, NC, USA Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA Department of Neurology, Icahn School of Medicine at Mount Sinai, New York City, NY, USA Sindhu Vangeti, Irene Ramos, German Nudelman & Stuart C. Sealfon Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar also involved in standardizing the PBMC gating strategy and FACS. All authors read the manuscript and provided comments. An application for a patent based on the results of this work has been filed with the United States Patent and Trademark Office (USPTO) under application US 63/489,546. is a scientific advisor for Zymo Research Inc. and Ionis Pharmaceuticals. is named as an inventor on patents describing ATAC–seq methods. 10x Genomics has licensed intellectual property on which W.J.G. is a scientific cofounder of Protillion Biosciences. reports personal fees from Novartis, Debiopharm, Genentech, Achaogen, Affinium, Medicines, MedImmune, Bayer, Basilea, Affinergy, Janssen, Contrafect, Regeneron, Destiny, Amphliphi Biosciences, Integrated Biotherapeutics, C3J, Armata, Valanbio, Akagera and Aridis, Roche; grants from NIH, MedImmune, Allergan, Pfizer, Advanced Liquid Logics, Theravance, Novartis, Merck, Medical Biosurfaces, Locus, Affinergy, Contrafect, Karius, Genentech, Regeneron, Deep Blue, Basilea and Janssen; royalties from UpToDate; stock options from Valanbio and ArcBio; honoraria from Infectious Diseases of America for service as Associate Editor of Clinical Infectious Diseases; and a patent for sepsis diagnostics pending. The other authors declare no competing interests. Nature Genetics thanks Musa Mhlanga and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. a, An example gating process for one sample. c, We sorted different cell types in the same plate for each sample. a, The genome coverage distribution of all single cells sequenced in this study. b, The CpG sites coverage distribution of each merged sample. c, Heatmap shows the distribution of sequencing depth at all CpG sites of all merged samples, each row is a sample, x-axis shows the covered depth at CpG site, color shows the number of CpG sites at that depth. d, The CpG sites coverage distribution of all cells merged in each exposure condition. e, Heatmap shows the distribution of sequencing depth at all CpG sites of all merged exposure conditions, each row is a condition, x-axis shows the covered depth at CpG site, color shows the number of CpG sites at that depth. a, t-SNE after harmony integration of each sorted cell type, colored by exposures. b, The cells from the corresponding exposure are colored in red, while other cells and control cells are colored in dark gray and gray. a, The UMAP of cells from HIV exposure in each FACS cell type. a, UMAP without harmony integration, colored by exposures. b, UMAP after harmony integration by donors, colors show the cell types by FACS sorting. c, UMAP after harmony integration by donors, colors show the exposures. d, Histplot shows the distribution of p values when calling exposure conditions eDMRs and DMRs from label shuffled samples. e, Heatmap shows the ratio of single CpG eDMRs in each exposure and cell type. f–k, Plots show the effect size distributions of eDMRs from each exposure. a, Dot plot shows the enrichment of eDMRs from BA, influenza virus and OP in histone modification peaks. Each column shows the hypo-eDMRs in that condition. b, Genomic features enrichment of eDMRs in each cell type and exposure. c, Motif enrichment of hypo-eDMRs in each cell type and exposure. d, Dot plots show the enrichment of eDMRs from each cell type in COVID-19 and MRSA/MSSA with the transcription factor ChIP-seq peaks in the corresponding cell types. e,f, Dot plots show the HIV-1-associated eDMRs near DEGs, compared with random genes (e) or with random CpG sites (f). g,h, Dot plots show the COVID-19-associated eDMRs near DEGs, compared with random genes (g) or with random CpG sites (h). a, UMAP shows the joint embedding of snATAC-seq and snmC-seq2 data from one sample before harmony integration. b, UMAP shows the joint embedding after harmony integration. c, Global correlation of DNA methylation and chromatin accessibility using 5 kb bins across the genome. d, Heatmap shows the corresponding changes in both DNA methylation and chromatin accessibility in anthrax vaccine exposure. e, Heatmap shows the corresponding changes in both DNA methylation and chromatin accessibility in OP exposure. f, Heatmap shows the corresponding changes in both DNA methylation and chromatin accessibility in SARS-CoV-2 exposure. g, Heatmap shows the corresponding changes in both DNA methylation and chromatin accessibility in IAV exposure. a, Venn diagram shows the overlap of SNPs called from methylation reads and WGS in a 10 Mb region. b, kdeplot shows the correlation of alternative allele frequency of the SNPs from WGS and biscuit c, Venn diagram shows the overlap of SNPs called from methylation reads and ground truth SNPs for NA12878. e, Heatmap shows the ratio of single CpG gDMRs in each cell type in trans and cis. f, Scatter plot shows the correlation between gDMR (cis and trans) counts with the number of cells sequenced in each cell type. g, Genomic features enrichment of eDMRs and gDMRs. h, Motif enrichment of gDMRs and eDMRs using each other as background. a, Enrichment of colocalized GWAS SNPs from each phenotype with the meQTLs from each cell type. b, Heatmap shows the distribution of colocalized meQTLs with different phenotypes in each cell type. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Wang, W., Hariharan, M., Ding, W. et al. Genetics and environment distinctively shape the human immune cell epigenome. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. As pressure mounts globally on drug pricing and development cost continues to rise, clinicians and translational scientists in biotech, academia and biopharma companies are re-evaluating when, where and how to launch early clinical programs. These initial patient data become critical to de-risk development programs and allow developers to deploy their limited time and resources on the most promising drugs. We evaluate four fundamental shifts in drug development that appear to be unfolding and may well become critical to future global biopharma success: use of large-scale high quality cohort studies, sponsor-driven investigator-initiated trials, the integration of affordable artificial intelligence with extensive high quality data registries, and China's focus on precision medicine. This is a preview of subscription content, access via your institution Get Nature+, our best-value online-access subscription Subscribe to this journal Receive 12 print issues and online access Prices may be subject to local taxes which are calculated during checkout Advancing Innovation and Global Reach: The Next Chapter in China's Clinical Trial Development https://www.lek.com/sites/default/files/insights/pdf-attachments/adv-innovation-global-china-ct-dev-report.pdf (2025). A decade of innovation, a decade to come. Large pharma drug licensing from China reaches record high at 28% in 2024, reveals GlobalData. Beaney, A. OCT West Coast 2025: China surpasses US for annual number of clinical trials. Clinical Trials Arena https://www.clinicaltrialsarena.com/news/china-surpasses-us-for-annual-trials/?cf-view (12 February 2025). China proposes shorter clinical trial reviews in efforts to accelerate drug development. Fierce Biotech https://www.fiercebiotech.com/biotech/accelerate-drug-development-china-proposes-shorten-clinical-trial-review-time (16 June 2025). The real China model. Wang, Y. et al. Stroke Vasc. Preparation and application of fluorinated cilostazol derivatives. Chen, L. et al. Eur. Pun, F. W. et al. Front. Patient enrollment complete for clinical trial of 4B Technologies' AI-discovered ALS drug supported by Insilico Medicine. World Intellectual Property Organization. Patent Landscape Report: Generative Artificial Intelligence https://doi.org/10.34667/tind.49740 (WIPO, 2024). Yang, Y., Husain, L. & Huang, Y. Tenchov, R. et al. ACS Pharmacol. Xu, G. et al. Sig. Ltd. China exosome research products market is poised to reach valuation of over US$ 148.93 million by 2032. https://www.globenewswire.com/news-release/2024/11/12/2979489/0/en/China-Exosome-Research-Products-Market-is-Poised-to-Reach-Valuation-of-Over-US-148-93-Million-By-2032-Astute-Analytica.html (2024). Eaton, E. S. At FDA roundtable, biotech leaders call for ‘modernising' regulation to compete with China. FirstWord Pharma https://firstwordpharma.com/story/5970090 (6 June 2025). Gottlieb, S. How to stop the shift of drug discovery from the U.S. to China. S. China's edge over U.S. biotech was never scientific. Caidya, Raleigh, NC, USA Candid Therapeutics, San Diego, CA, USA Fudan University, Shanghai, China Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar developed the basic outline of the article and organized most of the secondary research. provided practical examples from their respective roles as an academic (B.L. and founder of a global clinical CRO (L.T. Correspondence to Lingshi Tan. is the founder and chairman of Caidya. is chairman, president and CEO of Candid Therapeutics. is the founder and chairman of 4B Technologies. Tan, L., Song, K. & Lu, B. China's innovation in translational medicine: rethinking early-stage clinical development. Version of record: 27 January 2026 Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.
You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Tumor-infiltrating T cells have been the primary focus of cancer immunotherapy; however, accumulating evidence points to a critical role for B cells and plasma cells in shaping responses to immune checkpoint blockade. In this study, we investigated the humoral immune response in 38 patients with hepatocellular carcinoma treated with neoadjuvant anti-programmed cell death protein 1 (PD-1) therapy. In responders, defined by more than 50% tumor necrosis, we observed on-treatment enrichment of clonally expanded IgG1+ plasma cells within the tumor. Clonal tracking revealed that anti-PD-1 treatment expanded preexisting B cell clones associated with favorable clinical outcomes. Moreover, serum from responders contained IgG1 antibodies specific to cancer/testis antigens, including NY-ESO-1, and these humoral responses were linked to tumor-reactive T cell activity. We independently validated these findings across seven additional cohorts, encompassing single-cell and bulk sequencing data from 500 patients, spatial transcriptomics from seven patients and survival analyses from 1,582 patients. Our findings apply to recently approved treatments, such as PD-1 and vascular endothelial growth factor A (VEGF-A) blockade, but not to chemotherapy alone, suggesting broad relevance to individuals treated with immunotherapy. Collectively, our results demonstrate that PD-1 blockade induces tumor-specific IgG1+ plasma cell responses that complement cellular immunity and contribute to clinical benefit, underscoring a coordinated humoral−cellular axis in effective antitumor immunity. Immune checkpoint blockade (ICB) has become the backbone in the treatment of numerous types of cancer, although there remain major gaps in mechanistic understanding of what leads to clinical response1,2. ICB primarily targets inhibitory pathways, such as PD-1/programmed death ligand 1 (PD-L1) and cytotoxic T lymphocyte antigen 4 (CTLA-4), that regulate T cell function, thereby enhancing T cell activation, infiltration, cytotoxicity and antitumor immune responses3,4. A robust tumor immune microenvironment response requires the coordination of several immune cell types to activate cytotoxic tumor-infiltrating lymphocytes (TILs)5. Previously, we described intratumoral niches containing mature dendritic cells and CXCL13+ helper T cells, leading to a coordinated clonal expansion of granzyme K+ and PD-1+ effector-like CD8+ T cells in ICB responders with early-stage hepatocellular carcinoma (HCC)6. In several cancers, infiltrating plasma cells (PCs) and B cells carry strong prognostic significance and have emerged as potential predictors of response to immune checkpoint inhibitors7,8,9. Moreover, they can perform a variety of functions, including antigen presentation and antibody production, which enable them to support both T cell responses and innate mechanisms such as complement activation and opsonization of cancer antigens7. Tumor-associated B cells have been shown to play a crucial role in melanoma inflammation and have been associated with response to ICB therapy10. Intratumoral B cells and PCs identified by single-cell RNA sequencing (scRNA-seq) in non-small-cell lung cancer (NSCLC) showed predictive association with overall survival to PD-L1 blockade, independently of intratumoral CD8+ T cells and PD-L1 expression11. Additionally, these cells can also be present in tertiary lymphoid structures (TLSs), which may contribute to their differentiation into immunoglobulin-secreting plasma cells11,12. These TLSs, particularly those with mature organization with T cells and PCs surrounding germinal center class-switching B cells, have been associated with clinical benefit in multiple studies13. Although a predictive association between PCs and overall survival in patients treated with anti-PD-1 or anti-PD-L1 (PD-(L)1) therapy blockade has been established, the underlying intricate mechanisms (clonal composition and dynamics, isotype and subclass usage and specificity) driving this association are poorly understood, and no study to date has demonstrated the ability of PD-(L)1 blockade to induce or potentiate humoral antitumor immunity7,14. Furthermore, B cells represent a highly heterogenous and diverse population, and the specific subset of B cells or PCs that might be most important for effective antitumor immunity has not been clearly elucidated. In the present study, we investigated B cells and PCs during ICB responses with neoadjuvant anti-PD-1 therapy, with or without radiation, in HCC. Responders showed tumor-enriched IgG1 class switching, PC differentiation and clonal expansion after ICB, whereas non-responders accumulated dysfunctional memory B cells. The findings were validated in independent cohorts treated with PD-1, PD-L1, CTLA-4 or VEGF-A blockade. Baseline and tumor-enriched IgG1 PCs correlated with clinical response, and elevated circulating IgGs against cancer/testis antigens (CTAs) suggested humoral antitumor immunity in responders after ICB15,16,17. To test the hypothesis that B cells or PCs have a key role during immunotherapy and treatment response, we examined specimens from eight sets of independent clinical trial cohorts: two newly generated datasets from our own investigator-initiated trials in patients with HCC treated with neoadjuvant PD-1 blockade, with or without radiation, for discovery and validation, respectively. We also analyzed published datasets of a combination of PD-1, PD-L1, CTLA-4 and VEGF-A blockade, including large cohorts such as IMbrave150 and The Cancer Genome Atlas (TCGA) (Fig. No sex-associated differences were observed in any of the analyzed outcomes, consistent with the balanced representation of male and female patients in the cohort. a, Overview of cohorts analyzed in this study. b, Uniform manifold approximation and projection (UMAP) plots showing the integrated analysis of 1.2 million cells from 38 patients and the reclustering of 50,000 B cells and PCs. Annotation was based on canonical B cell and PC markers together with differentially expressed genes. c, Top, tumor enrichment scores for 27 patients, comparing tumor versus adjacent normal tissue using Wilcoxon rank tests with FDR correction. Box plots show medians, interquartile ranges (IQRs) (Q1–Q3), whiskers (≤1.5× IQR) and outliers. Bottom, dot plot of canonical and top differentially expressed genes per cluster, identified by Wilcoxon tests. The adjacent panel shows enrichment of each cluster in normal (light blue) versus tumor (orange); circle size represents statistical significance. d,e, Box plots (as defined above) showing tissue-specific enrichment of clusters between responders (R; dark blue, n = 8) and non-responders (NR; dark red, n = 19). Proportions were estimated using Dirichlet regression; log-transformed fold change (log2FC) significance was assessed using log-likelihood tests and Benjamini−Hochberg-adjusted P values. Color indicates log2FC; circle size reflects adjusted P value. The y axis indicates percent of cells. f,g, Principal component analysis of bulk RNA-seq showing variance explained by each principal component and sample separation in PC1 and PC2, colored by response category and timepoint (pre/post). i, Volcano plot showing differential expression between responders and non-responders on-treatment (x axis: log2FC; y axis: –log10P value). j, GSVA scoring of four single-cell-derived signatures projected into bulk showing increased PC signature after treatment in responders (P < 0.05, two-sided Wilcoxon). First, we profiled approximately 30,000 B cells and PCs derived from 1.2 million single-cell transcriptomes of the tumor microenvironment, uninvolved adjacent liver and the draining lymph nodes resected from 38 patients with early-stage HCC of the discovery cohort18. Of these, 27 patients received neoadjuvant ICB in the form of an anti-PD-1 blocking antibody, and 11 were untreated controls (Extended Data Fig. We resolved six distinct B cell states (naive, memory, activated, class-switched, class-unswitched and atypical) alongside plasmablasts and three PC populations with discrete transcriptional programs (Fig. Naive B cells were characterized by the expression of IGHD and CD19 and were decreased in tumor compared to adjacent tissue (false discovery rate (FDR) < 0.05); memory B cells showed MYC, CD69 and NR4A1 associated with an activation phenotype and with tumor enrichment, whereas PCs were characterized by MZB1 and JCHAIN (Fig. Naive B cells dominated the B cell fraction of the immune cells across the tissue compartments, followed by memory B cells, relative to total B cells and PCs (Extended Data Fig. As expected, canonical and cluster-specific genes were expressed in more than 70% of cells (Extended Data Fig. Pathological responses to neoadjuvant PD-1 were defined as more than 50% necrosis of tumor at the time of surgery, followed by differential cell abundance showing an enrichment of all PC phenotypes in the tumor among ICB responders (FDR < 0.05), which was not as pronounced in adjacent uninvolved liver (Fig. We hypothesized that skewing of IgG1-producing PCs is linked to clinical response, given their overrepresentation in single-cell data from responders (Extended Data Fig. To test this, we used complementary bulk RNA-seq of paired pretreatment biopsies and posttreatment resected tumors from the discovery cohort. Principal component analysis showed that gene expression profiles were markedly different between responders and non-responders (Fig. In posttreatment tumors, IGHG1 emerged as one of the top upregulated genes. Its expression was already elevated in pretreatment samples from responders and increased significantly after therapy, whereas it remained unchanged or decreased in non-responders (FDR < 0.01; Fig. Projection of the single-cell signatures onto bulk data emphasized the increase of PCs in responders (FDR < 0.01) (Fig. Thus, these results suggest that skewing toward an IgG1 signature may exist at baseline (pretreatment) in anti-PD-1 responders, which was significantly amplified after ICB treatment and was highly associated with response. Using a combination of BCR-seq with scRNA-seq, we investigated clonal expansion and immunoglobulin isotype (Fig. We found that plasmablasts and PCs were largely of the IgG1 and IgG2 subclass (FDR < 0.01), whereas other memory and naive B cells were an admixture of IgM, IgD, IgA and, to a much lesser degree, the other IgG subclasses (Fig. When stratifying by ICB response, IgG1 and IgG2 PCs were almost exclusive to responders, whereas IgM, IgA and IgD dominated in non-responders (Fig. Irrespective of ICB response, 26% of total B cells and PCs were expanded (more than one cell per clone) (Extended Data Fig. Responders exhibited marked intratumoral IgG1+ PC and plasmablast expansion, unlike non-responders, who showed memory B cell expansion (Fig. IGHG1, MZB1, JCHAIN and XBP1 were among the top genes enriched in clonally expanded cells from responders (Fig. In responders, clonally expanded cells were enriched for IgG1+ PCs, whereas non-expanded cells expressed MS4A1 (encoding CD20) (Fig. Together, these data suggest a tumor-specific clonal expansion of IgG1 antibody-producing cells in ICB responders. Bottom panels show correlations between isotype assignments from gene expression and scBCR-seq for the 30,000 cells with paired data. Multiple samples per patient were sequenced to ensure reproducibility. Single-cell BCR-seq data include six patients with HCC treated with ICB (two responders and four non-responders). Using scRNA-seq-based isotype inference, the analysis was expanded to 8 responders and 14 non-responders. The log-transformed fold changes and P values were obtained from log-likelihood tests; adjusted P values were computed using Benjamini−Hochberg correction. PCs and IgG1/IgG2 responses are significantly enriched in responders. d, Box plot (as defined in Fig. 1c) showing increased IgG1+ PC representation in tumor tissue of responders with available BCR-seq. Ratios were estimated using Dirichlet regression with log-likelihood testing. e, Stacked bar plot of expanded clones per cluster, defining expansion as two or more cells per clonotype. Expansion occurs specifically in PCs of responders. f, Differential expression (two-sided Wilcoxon test, Benjamini−Hochberg adjusted) comparing expanded clones in responders versus non-responders. IGHG1, MZB1, JCHAIN and XBP1 are associated with clinical response. g, Differential expression comparing expanded versus non-expanded cells in responders shows expansion-associated upregulation of IGHG1, MZB1, JCHAIN and XBP1, whereas non-expanded cells upregulate LTB, MS4A1, CD52 and IRF8 (Benjamini−Hochberg adjusted). h, Box plots showing clonal sizes after filtering for shared CDR3 sequences within clonotypes and clusters across 27 patients. Clonal sizes increase for IgG1, IgG2, IgA and IgM in both B and plasma compartments (two-sided Wilcoxon test). i, Clonal sizes of expanded (≥2 cell) clones from bulk BCR-seq in two responders and non-responders show increased IgG1 expansion in responders. Next, we investigated whether B cell differentiation and clonal expansion occur at the primary tumor site or in draining lymph nodes. In lymph nodes, IgM was the dominant isotype (Extended Data Fig. Although unique CDR3 sequence overlap among lymph node, tumor and adjacent liver samples was limited within individual patients (Extended Data Fig. 2i), clonal tracking revealed significantly larger IgG1+ and IgG2+ tumor PC clones in responders compared to non-responders (FDR < 0.01; Fig. The shared CDR3 clonotypes across cellular compartments (lymph node, naive, B memory and plasmablast/PC) revealed that clonally expanded cells with the same CDR3 sequence could be found at the lymph node as well as the tumor site (Fig. 2i and Supplementary Table 1), thus suggesting trafficking of these expanded clones between the tumor and the draining lymph node. Further inspection of the bulk sequencing data validated the expansion of IGHG1 phenotype, including evidence of the same CDR3 barcodes before and after treatment in both responders and non-responders (Extended Data Fig. Using multiplex immunohistochemistry (mIHC) and computational tools19, we examined the spatial distribution of the B cells (CD20+), PCs (MZB1+) and other immune cells in the tumor and adjacent liver in 17 mIHC biopsies (6 responders and 11 non-responders) from our discovery cohort. Next, we used a radial binning approach to define cell communities or immune aggregates in an unsupervised fashion (Fig. Here, responders showed enrichment of CD3+CD8+ T cells, CD68+ macrophages and MZB1+ PCs. Conversely, CD20+ B cells were found within the lymphoid aggregates or within the stromal compartment of the tumor rather than admixed with tumor parenchyma, reinforcing PC expansion as a hallmark of effective ICB response (Fig. a, Representative examples showing increased PC infiltration in responders compared to non-responders, quantified as the percentage of PCs within unsupervised neighborhood regions derived from mIHC images. b, Box plots (as defined in Fig. 1c) comparing PC infiltration scores between responders (n = 6) and non-responders (n = 10), shown both per patient and as averaged scores across regions (P < 0.05, two-sided Wilcoxon rank test). The left plot displays individual regions per patient; the right plot summarizes patient-level averages across 16 total patients. c, Unsupervised identification of immune cell aggregates from mIHC using a spatial enrichment analysis. A radial gradient approach quantifies local immune communities by evaluating up to three markers within a 10-µm distance (approximately one cell diameter) from a reference cell. d, Spatial enrichment of immune populations within aggregates in responders versus non-responders. f, Spot clustering results identifying 13 spatial clusters across approximately 17,000 spots from seven patients (four responders and three non-responders). g, Enrichment of responder-associated clusters and top markers per most abundant cluster for each patient, demonstrating sample-level concordance between spatial clusters and biological phenotypes. h, Top pathways associated with each spatial cluster, highlighting cellular, molecular and functional programs tied to distinct microenvironmental niches. 1c) showing cluster-level enrichment patterns stratified by clinical response. Responders exhibit higher representation of plasma, T cell and myeloid-associated clusters, whereas non-responders are enriched for regulatory and dysfunctional phenotypes. Statistical significance was assessed with a two-sided Wilcoxon rank test across seven patients (four responders and three non-responders). j, Canonical cell markers enriched per cluster, confirming the identity of dominant immune and stromal populations defining each spatial niche. Next, we applied an single-cell variational inference-based autoencoder to integrate the spatial transcriptome cohort3 and then used CellCharter to identify spatial clusters constituted by multiple cell types (Fig. In ICB responders, we saw a marked enrichment of IgG1+ PCs confined to an immune-rich T cell/B cell/vascular fibroblast hybrid niche, with significant upregulation of IL-4 and IL-13 signaling (Fig. Pathway analysis of these clusters revealed a pro-BCR/TCR signaling hub, marked by MZB1, IL2RG, FCER2, PRDM1 and CD79A, consistent with a highly immunogenic, PC-driven microenvironment (Fig. By contrast, non-responders harbored focal accumulations of CD27+ memory and dysfunctional B cells within fibroinflammatory stromal regions. These niches were accompanied by regulatory T, T helper 17 (TH17) and monocyte signatures, collectively defining an immunosuppressive ‘stromal memory/exhausted B cell reservoir' (Fig. To validate these findings in an independent cohort18, we analyzed B cells and PCs from patients with HCC treated with neoadjuvant stereotactic radiation followed by anti-PD-1 therapy (biopsy, lymph node, tumor and adjacent normal) (Fig. In bulk sequencing, differential gene expression between responders and non-responders was most pronounced in pretreatment biopsies and posttreatment tumors but minimal in lymph nodes and adjacent normal tissues (Extended Data Fig. After treatment, total immunoglobulin increased, with responders showing selective IgG1 enrichment (Extended Data Fig. The bulk data also showed an increase in IGHG1, IGHG2 and IGHG3 expression in tumor tissue of responders (Extended Data Fig. As expected, we observed higher plasmablast and PC levels in responder biopsies (FDR < 0.2) (Extended Data Fig. Clonal size analysis showed that the largest clones were identified in lymph nodes, followed by pretreatment and posttreatment tumor tissue (Extended Data Fig. Shared clonotypes were observed between all three posttreatment tissue types and pretreatment biopsies (Extended Data Fig. Consistent with our findings, validation cohort scRNA-seq revealed significantly enriched plasmablasts in tumor tissue (FDR < 0.05) (Extended Data Fig. Activated B memory, class-switched and unswitched cells were also enriched in responders' adjacent normal tissues (FDR < 0.05) (Extended Data Fig. However, differential expression analysis recapitulated IGHG1 as the top marker of clinical response in both tumor and adjacent liver (Fig. Contrary to our discovery cohort, this independent validation cohort showed similar B cell and PC numbers between adjacent normal and tumor tissue (Extended Data Fig. Notably, most of the sequenced cells were already clonally expanded (Extended Data Fig. However, once again, the IgG1 isotype dominated across all PC and non-naive B cell phenotypes in tumor tissues of responders, whereas non-responder B cells and PCs were more prevalent for IgM and IgA (Fig. Regardless of tissue, differential abundance analysis showed that IGHG1 isotype, activated memory B cells, PCs and plasmablasts were associated with response, whereas IgM and IgG2 in memory B cells were enriched in non-responders (Fig. Although clonal expansion sizes were similar between B cells and PCs, tracking shared CDR3 clones revealed greater memory-to-plasma compartment transitions in responders, with exceptionally large IGHG1 clones (Fig. Together, these results show that IgG1 PCs account for more than 50% of total isotypes identified in responders and for less than 30% in non-responders (Fig. V2 cohort (radiation + anti-PD-1): a,b, Differential expression analysis of adjacent normal and tumor tissues revealed IGHG1 as the most significant gene associated with response, using a two-sided moderated t-test. c, Tumor heavy-chain isotype composition by cell cluster showed IGHG1 enrichment in responders. d, Differential abundance modeling using Dirichlet regression and log-likelihood testing identified IGHG1+ PC and B cell phenotypes as significantly enriched in responders. 1c) illustrate the magnitude and heterogeneity of cluster enrichment in responders versus non-responders across 10 patients (four responders and six non-responders), with significant findings at FDR < 0.05 (Benjamini−Hochberg correction). f, Proportions of clonally expanded cells (BCR-seq derived) showed that most cells were expanded in both groups, with no significant difference in overall expansion prevalence. g, Dot plots of shared heavy-chain clonal sizes per patient demonstrated larger IGHG1 clonal expansions in responders. h, Responders showed dominant and expanded IgG1 PC isotypes in both tumor and adjacent normal tissue. V3 cohort (anti-PD-1 and CTLA-4 plus PD-1): i, Heavy-chain isotype analysis confirmed that PC isotypes were predominantly IgG1. j, Box plots comparing pretreatment and posttreatment samples showed that responders increase PC abundance over time, whereas non-responders decrease; memory B cell frequencies remained unchanged (Wilcoxon test). k, Both PD-1 alone and CTLA-4+ PD-1 therapies induced a responder-specific plasma IgG1 signature. V4 cohort (IMbrave150): l, Bulk RNA-seq showed nominal posttreatment increases in CD20 (MS4A1) and MZB1 expression in responders although not statistically significant due to expression heterogeneity (moderated t-test). V5 cohort (anti-PD-1 and anti-VEGF-A): m, Responders receiving VEGF-A blockade in combination with PD-1 therapy also exhibited increased plasma IgG1 abundance relative to non-responders. V6 cohort (multiple treatments in HCC and ICCA): n, Box plots show that HCC samples contain higher PC abundance than ICCA samples and display slower progression times (Wilcoxon test). o, Across samples, disease progression positively correlated with tumor diversity scores and inversely correlated with PC abundance.NA, not available. To further extend our observations, we analyzed public datasets and found that patients with melanoma who responded to PD-1 with or without CTLA-4 blockade3 also showed expansion of PCs after treatment and a decrease in non-responders, despite a larger non-responder PC compartment before treatment, potentially due to low number of cells in this study (Fig. Next, we validated our findings in advanced HCC treated with anti-PD-1 and anti-VEGF-A therapies in two independent cohorts of unresectable HCC: IMbrave150 (ref. In both datasets, responders exhibited a skewing toward the IgG1 isotype, with IgG1 emerging as the dominant subclass. Another independent cohort of patients with HCC and intrahepatic cholangiocarcinoma (ICCA)22 showed that HCC tumors had a low tumor diversity score, linked to favorable outcomes, whereas ICCA tumors exhibited a high tumor diversity score, linked to a more aggressive phenotype and worse outcomes (6-month survival for ICCA versus 26-month survival for HCC). Notably, HCC samples had greater PC abundance, consistent with better progression-free survival in HCC than in ICCA (Fig. Together, these findings suggest that IgG1 PCs are strongly linked to ICB response. Given the enrichment of IgG1 PCs in ICB responders, we explored whether these patients potentially generated antitumor antibodies detectable in the patients' serum. Thus, we investigated serum samples collected before and during treatment from our discovery cohort, testing against a panel of 20 tumor-associated antigens. A higher proportion of responders (63%) exhibited antitumor IgG antibodies in their serum compared to non-responders (17%), with the IgG antibodies primarily belonging to the IgG1 subclass and targeting CTAs such as MAGE-A, GAGE7, PRAME and NY-ESO-1 (Fig. These data suggest that responders with circulating IgG antibodies against CTAs generally had higher titers than non-responders. Conversely, IgG titers in non-responders were less dynamic, showing minimal changes after anti-PD-1 treatment. IgG1 antibodies targeting CTAs can enhance antigen presentation by antigen-presenting cells and potentially prime CD8+ T cells through immune complexes and cross-presentation23,24. a,b, Pie chart and bar plots showing that antibodies against cancer-associated antigens are predominantly enriched in responders compared to non-responders, respectively. c,d, ELISpot analysis of IFNγ-secreting cells in response to varying effector-to-target (E:T) ratios. The left panel shows wells with decreasing numbers of spots from top to bottom, corresponding to different ratios (1:1 and 5:1) for two conditions, indicating the frequency of cytokine-producing cells. The right panels represent different experimental conditions or treatments, with each row representing different replicates or conditions. Darker and more numerous spots indicate higher frequencies of cells secreting IFNγ. e, Similarly, a broader characterization using seromics indicates increased abundance of autoantibodies against CTAs in responders compared to non-responders. f, Number of antigens enriched between responders and non-responders. g, Analysis of 16 patients (8 responders and 8 non-responders) showed that an enrichment of antibodies against CTA, tumor-associated antigen (Tu/AutoAg) and other antigens was also higher in responders. Specifically, comparisons of CTA-specific IgG and IgA levels between responders and non-responders showed statistically significant differences (P = 0.04 and P = 0.01, respectively), as determined by the Wilcoxon rank-sum test and visualized by box plots (same definition as in Fig. h, Box plots (same definition as in Fig. 1c) showing the CTA gene expression signature divided by timepoint (pre or post) and clinical response (responders and non-responders); paired t-test and Wilcoxon rank test were used to estimate the significance between both groups; only paired t-test results are shown in the figure. i,j, The increase in autoantibodies against CTAs in responders, specifically in the IgG1 and IgA isotypes, was assessed using a heatmap to visualize relative abundance patterns across samples, violin plots to display the distribution and variability of antibody levels and quantile−quantile (Q−Q) plots to evaluate deviations from normality and highlight differences in distribution between responders and non-responders. KS, Kolmogorov–Smirnov test; NS, not significant; P/I, phorbol myristate acetate (PMA) and ionomycin positive control. Next, to investigate T cell responses, we performed an ELISpot assay with CD8+ T cells isolated from pretreatment and on-treatment peripheral blood mononuclear cells (PBMCs), resected tumors and draining lymph nodes with detectable NY-ESO-1 antibodies. The responder's CD8+ T cells showed significant IFNγ, whereas the CD8+ T cells from the non-responder, who had predominantly IgA, showed only minimal reactivity (Fig. Notably, the responder had circulating NY-ESO-1 antibodies before ICB treatment, but IFNγ production by CD8+ T cells was observed only in on-treatment samples, suggesting that antitumor B cell response may precede T cell response, as in previous reports23,24. To assess if serum autoantibodies target cancer antigens more than other autoimmune targets, we performed seromic profiling (IgG and IgA against approximately 20,000 antigens, including 186 CTAs; Supplementary Table 1) on 32 paired pretreatment and posttreatment samples from a discovery HCC neoadjuvant PD-1 cohort (8 responders and 8 non-responders). We observed that IgG autoantibodies and, to a lesser extent, IgA were enriched for CTAs in responders compared to non-responders (Fig. Surprisingly, reactivity to CTA was enriched for response, as antibodies detected to another approximately 400 other known non-CTA tumor antigens (including p53) had similar prevalence in responders and non-responders for IgG and IgA (Fig. Notably, almost all antigen-specific antibodies were unique to individual patients and were found before treatment and after treatment, although some increases in reactivity were noted after treatment (Extended Data Fig. In parallel, we did not observe correlation between gene expression of CTAs and autoantibodies, suggesting that immunogenicity is more important than expression alone (Fig. Finally, the increase in autoantibodies against CTAs in responders (P < 0.05) in IgG was identified as specific for CTAs compared to IgA, autoantigens and other antigens (Fig. Together, these results support the notion that antibody production and reactivity against CTAs are indicators of clinical response to ICB, in parallel to an increase of IgG1 PCs. To explore the relevance of IgG1 PC expression in patient survival, we used independent immunotherapy clinical trials (approximately 1,500 patients)25,26,27. Notably, high IGHG1 expression was associated with improved overall survival in multiple datasets, including skin cutaneous melanoma (SKCM) (TCGA)25 and POPLAR26 and OAK26 trials (patients with NSCLC treated with anti-PD-L1) (Fig. Notably, non-immunotherapy trials showed no effect of IGHG1 on survival (lung squamous cell carcinoma (LUSC) and liver hepatocellular carcinoma (LIHC)). These results indicate that chemotherapy-treated cancers have no clear link between clinical response and IGHG1. a, Kaplan−Meier survival analysis from TCGA and clinical trial cohorts (POPLAR and OAK) stratified by high versus low IgG1 expression levels across multiple cancer types (SKCM, LUSC and LIHC). High IgG1 expression is associated with improved survival in several contexts (log-rank P values shown). b, Cell−cell interaction contribution scores for each cell type, highlighting plasma IgG1 cells as major contributors in responders. c,d, Top cell−cell interactions enriched in responders (blue) and non-responders (red), based on ligand−receptor analysis. Arrows indicate the directionality and magnitude of cell type interactions, with plasma IgG1 cells and myeloid compartments prominently engaged in responders. f, Line plots showing the pseudobulk normalized median expression of RRBP1, CXCR4, ERN1 and IGHG1 along the trajectory path traced using the Monocle 3 algorithm and Moran's I statistical tests (FDR < 0.05 and Moran's I > 0.5 were estimated in e using two-sided Wilcoxon rank test and Monocle 3 pseudotime package). g, RNA trajectories build using either responders (blue) or non-responders (red) showing that responders differentiate mostly in activated and switched B cells leading to plasmablasts and PCs, whereas non-responders have higher pseudotime scores in atypical and memory B cells. Finally, we evaluated the immune cell contributions in cell−cell communication networks. IgG1 PCs, alongside specific macrophage and T cell subsets, were key drivers of interaction strength in responders (Fig. These findings suggest IgG1-skewed PCs foster a favorable immunogenic microenvironment through enhanced immunostimulatory signaling, potentially improving immunotherapy outcomes. We also identified key pathways potentially driving plasma IgG1 differentiation, including IL-6, TNF, MK, CD70, BTLA, MIF, BAG and CypA, which were enriched as both incoming and outgoing signals (FDR < 0.05) (Extended Data Fig. By integrating these signaling results with differential gene expression and pseudotime analysis, we identified genes strongly associated with clinical outcomes. Responders showed higher expression of genes related to PC differentiation, such as ERN1 and RRBP1, whereas non-responders showed increased CXCR4 (Fig. These patterns suggest that ICB induces B cell activation and class switching, leading to plasmablast expansion and differentiation into IgG1-secreting PCs (Fig. In summary, we identified critical gene programs that support sustained B cell activation and plasma IgG1 differentiation, both of which are associated with favorable clinical response. Although ICB is a cornerstone of cancer therapy, its effect on humoral immunity is not well understood. The presence of intratumoral B cells is strongly associated with positive ICB responses across various cancers14, but the specificity of these cells to tumor antigens and the mechanisms driving this benefit remain unclear28,29. We found that clinical responders were enriched in IgG1 PCs, which have high antitumor potential, similar to findings in colorectal cancer30. We observed a dynamic, tumor-specific differentiation of these IgG1 PCs, which were present at baseline and expanded upon ICB treatment. Conversely, non-responders had an abundance of naive and memory B cells, phenotypes that can contribute to cancer progression31,32,33. These data suggest that different B cell phenotypes are recruited to the tumor or lymph nodes12,34, and improper differentiation can worsen outcomes35,36,37,38. This is supported by observations that antigen-specific B cells differentiating into PCs are associated with improved ICB outcomes39,40,41. The presence and expansion of circulating IgG1 antibodies against CTAs in responders further suggests a dynamic, antigen-driven adaptive immune response. We hypothesize that these antibodies enhance T cell induction through cross-priming23,24, which aligns with previous findings where CTA seropositivity correlates with ICB benefit15,16,17,42,43 despite being linked to worse prognosis44,45,46. Although both responders and non-responders have preexisting tumor-specific clonotypes, only responders show significant clonal expansion with immunotherapy. Correspondingly, PCs in responders heavily infiltrate the tumor parenchyma, mimicking CD8+ T cells, whereas infiltration in non-responders is sparse. These findings suggest that PCs in responders have active effector functions and that CTA-targeted therapies, such as vaccines combined with ICB, could be a promising strategy to stimulate these beneficial IgG1 responses47. Combined, these findings suggest that PCs of responders have active effector function compared to other subsets of B cells. The B cells and PCs present in the tumor or adjacent tissues in non-responders had a diverse immunoglobulin repertoire, including IgA, IgG1−IgG4 and IgM without specific isotype enrichment, whereas, on the contrary, responders had an enrichment of IgG1 PCs and plasmablasts. Interestingly, studies have associated IgA plasmablasts derived from reprogramming by cancer-associated fibroblasts to be linked with poor clinical outcomes48. However, we noted that, in responders, not only was there a skewing toward IgG1 at baseline, but there was also an amplification of preexisting matching CDR3 IgG1 cells after immunotherapy compared to their abundance in biopsies taken prior to treatment. The expansion of plasma IgG1 effector cells is potentially associated with immunotherapy induced type I interferon responses49. Together, these findings suggest that an IgG1 PC signature may serve as an important pretreatment biomarker. Moreover, the observed increase in IgG1 abundance after ICB indicates that this dynamic change is strongly associated with a favorable response to therapy. Incorporating these insights into current treatment strategies could involve promoting B cell differentiation toward IgG1-producing PCs by targeting pathways identified in this study, such as IL6, TNF, RRBP1, ERN1 and CD70 signaling. Therapeutic agents such as IL-6 agonists or TNF pathway activators warrant exploration as potential means to enhance IgG1 skewing, thereby potentially improving the efficacy of ICB. A potential mechanism that favors the survival and expansion of plasma IgG1 involves the CyPA and Midkine signaling pathways, which help inhibit IL-6 degradation and promote survival through CD74, respectively50,51. In combination with proinflammatory conditions due to stimulation from myeloid cells through MIF, GAS, CXCL and Complement, IgG1-secreting PCs may be sustained during antitumoral response52,53. We observed significant clonal expansion after combination of radiation and ICB, suggesting that Ig-secreting PCs survived the radiation treatment, which was also reported to be associated with activation of ERN1 and XBP1 genes56. By contrast, cancers where chemotherapy is frontline are not associated with the IgG1 PC phenotype (TCGA-LUSC and TCGA-LIHC). The association between IgG1 skewing and improved survival across different tumor types underscores the broader implications of our results in guiding immunotherapy strategies34,57,58,59. The dominance of IgG1 subclass antibodies and their correlation with T cell activation highlights the interplay between humoral and cellular immune responses in mediating antitumor immunity. Overall, our study provides insights into the key role of B cell and PC responses in antitumor immunity and in response to ICB therapy, offering potential biomarkers for treatment stratification, and supports the hypothesis that tumor-specific humoral immunity is involved in ICB response. Further exploration of these mechanisms may facilitate the development of more effective immunotherapeutic strategies that further harness the role of the humoral immune system in antitumor immunity. A shortcoming of our study was small pretreatment biopsy sizes, restricting comprehensive immune tumor microenvironment analysis and allowing only bulk sequencing before ICB. Low cell and patient numbers from single-cell sequencing, limited clonal tracking to unique CDR3 regions and unverified antigen specificity of clonally expanded PCs were additional constraints. However, we aimed to overcome these limitations by integrating multiple approaches and cohorts that consistently linked increased IgG1 PC phenotype to posttreatment clinical response. Early-stage HCC lesions and matched non-involved liver specimens were surgically resected after two doses of cemiplimab (ClinicalTrials.gov registration: NCT03916627; cohort B1) or 2−4 doses of nivolumab. Patients across all HCC etiologies responded to ICB, defined as ≥50% tumor necrosis by pathological examination18. Early-stage HCC lesions and matched non-involved liver specimens were treated with stereotactic body radiotherapy ((SBRT) 8 Gy × three fractions) followed by two doses of cemiplimab prior to surgery. These patients were subsequently surgically resected after two doses of cemiplimab. Patients across all HCC etiologies responded to ICB, defined as ≥50% tumor necrosis by pathological examination18 (NCT03916627; cohort B2). Biopsies and tumor tissues from D1 and V2 cohorts were obtained from these patients undergoing surgical resection at Mount Sinai Hospital, after obtaining informed consent in accordance with a protocol reviewed and approved by the institutional review board (IRB) at the Icahn School of Medicine at Mount Sinai (IRB 18-00407). Patients with metastatic melanoma provided written informed consent for the collection of tissue and blood samples for research and genomic profiling, as approved by the Dana-Farber/Harvard Cancer Center IRB (DF/HCC protocol 11-181) and The University of Texas MD Anderson Cancer Center (IRB LAB00-063 and 2012-0846). Tumor samples (n = 48) were obtained from 32 patients at baseline and/or after checkpoint therapy. Checkpoint blockade therapy used antibodies targeting CTLA-4, PD-1 or PDL-1 (database of Genotypes and Phenotypes (dbGaP) study accession numbers phs001680.v1.p1 and PRJNA489548). 20): a phase 3, open-label, randomized study of atezolizumab in combination with bevacizumab compared to sorafenib in patients and untreated locally advanced or metastatic HCC. This study evaluated the efficacy and safety of atezolizumab in combination with bevacizumab compared to sorafenib in participants with locally advanced or metastatic HCC who have received no prior systemic treatment. Single-cell transcriptomics was used to characterize the intratumoral and peripheral immune context of patients with advanced HCC treated with atezolizumab + bevacizumab. Both blood and tumor tissue were evaluated (EGAS00001007547). This cohort consists of individuals aged 18 years or older diagnosed with gastrointestinal cancers, including throat, stomach, gallbladder, liver, pancreatic or colon cancer, who are scheduled for treatment at the National Institutes of Health (NIH) Clinical Center. Participants will undergo a screening process involving a physical examination and medical history, provide a baseline blood sample and contribute additional blood samples at 2 months and 4 months after baseline as well as at the completion of their treatment, across 1−4 NIH visits. They will also provide tumor tissue samples if they undergo cancer-related surgery, with no treatment provided as part of this study, which focuses on analyzing their immune system's response to the cancer through these samples. This cohort consists of data from the tumor microenvironment in HCC resection specimens from a prospective clinical trial of neoadjuvant cabozantinib, a multi-tyrosine kinase inhibitor that primarily blocks VEGFR2, and nivolumab, a PD-1 inhibitor in which five out of 15 patients were found to have a pathologic response at the time of resection. However, only four responders and three non-responders had data available. ELISA was used to detect and quantify circulating IgG antibodies to known tumor antigens, as previously described. In brief, plasma samples were analyzed by low-volume semiautomated ELISA for seroreactivity to a panel of recombinant protein antigens (NY-ESO-1, p53, SOX2, HORMAD1, ERG, DHFR, PRAME, WT1, MELAN-A, SURVIVIN, UBTD2, CT47, MAGE-A1, MAGE-A4, SSX4, CT10, SSX2, XAGE, GAGE7 and MAGE-A10). Low-volume 96-well plates were coated overnight at 4 °C with 0.5–1 μg ml−1 antigen and blocked for 2 h at room temperature with PBS containing 5% non-fat milk and 0.1% Tween 20. Plasma was titrated from 1:100 to 1:6,400 in fourfold dilutions and added to blocked and washed 96-well plates. For assay validation and titer calculation, each plate contained positive and negative controls (pool of healthy donor sera). Plasma antigen-specific IgG was detected after incubation with alkaline-phosphatase-conjugated goat anti-human IgG (SouthernBiotech, 2040-4, diluted 1:4,500), revelation using AttoPhos substrate and buffer and measurement using a fluorescence reader (BioTek Synergy). By linear regression, a reciprocal titer was calculated for each sample and for each antigen as the predicted or interpolated dilution value at which the titration curve meets a cutoff value7. A positive significant result was defined as reciprocal titers more than 100. After bead-guided selection, CD8+ T cells were independently cultured with peptide-pulsed, irradiated T-cell-depleted PBMCs (serving as antigen-presenting cells) in RPMI + 10% serum type AB (to avoid potential reactivity) supplemented with IL-2 (10 U ml−1) and IL-7 (20 ng ml−1) twice a week. Cells were assessed for specificity at day 10 of culture for CD8, using autologous antigen-presenting cells pulsed with NY-ESO-1 peptides or controls (influenza nucleoprotein peptide pool or dimethyl sulfoxide (DMSO)). The IFNγ ELISpot assay was performed on CD8+ T cells. In brief, 96-well nitrocellulose ELISpot plates (Millipore, MAHA S4510) were coated overnight at 4 °C with 2 μg ml−1 anti-human IFNγ monoclonal antibody (1-D1K) and blocked with 10% human AB serum containing RPMI 1640 for 2 h at 37 °C. Then, 2 × 104 sensitized CD8+ T cells and 2 × 104 peptide-pulsed T-APCs were placed in each well of the ELISpot plate at a final volume of 100 μl of RPMI 1640 medium without serum. After incubation for 22 h at 37 °C in a CO2 incubator, the plate was developed using 0.2 μg ml−1 biotinylated anti-human IFNγ monoclonal antibody (Mabtech, 7-B6-1), 1 μg ml−1 streptavidin-alkaline phosphatase conjugate (Roche Diagnostics) and 5-bromo-4-chloro-3-indolyl phosphate/NBT (Sigma-Aldrich). The number of spots was evaluated using a CTL ImmunoSpot analyzer and software (Cellular Technology Limited). Results are shown as the number of spot-forming cells without subtracting the number of background spots, because the number of spot-forming cells in negative control was fewer than three spots per well in all assays. A positive response with more than 50 spot counts per well as well as spot counts ≥twofold more than background spots obtained with non-pulsed target cells was considered to be significant. The significance was defined descriptively only, if the number of spots observed for NY-ESO-1 in pre, post or lymph node was greater than >2× the number of spots in control DMSO as well as more than 50 spots per 50,000 cells. Seromics profiling was performed using CDI Labs' HuProt Human Proteome Microarray version 4.0, which includes over 21,000 individually purified full-length human proteins and isoforms, providing comprehensive coverage of more than 80% of the human proteome. The proteins were printed in duplicate pairs on PATH nitrocellulose slides (CDI Labs). Patient sera were diluted 1:500 in Seromics Sample Buffer. Simultaneously, the barcoded HuProt Microarrays were blocked using CDIArrayBlock buffer to minimize non-specific binding. The diluted sera samples were applied to the blocked microarrays and incubated for 1 h at room temperature on a shaker. After incubation, the microarrays underwent a series of washes. Goat Anti-Human IgG Fc Cross-Adsorbed Secondary Antibody DyLight 550 and Goat Anti-Human IgA Chain Alpha Antibody DyLight 650 were applied to the microarrays. These were incubated for 1 h at room temperature on a shaker, followed by additional washes. The microarrays were gently dried and scanned immediately using a GenePix 4300A Microarray Scanner, using GenePix Pro software (Molecular Devices). The resulting images were analyzed with Mapix microarray image acquisition and analysis software (Innopsys), where signal intensities of background and positive and negative control spots were quantified. Sample preparation: single-cell suspensions from HCC tissues were obtained, as described above. Samples were broadly enriched for CD45+ cells by fluorescence-activated cell sorting, and these cells were suspended in PBS supplemented with 0.05% BSA. Viability of single cells was assessed using Acridine Orange/Propidium Iodide viability staining reagent (Nexcelom Bioscience), and debris-free suspensions of more than 80% viability were deemed suitable for the experiments. Single-cell RNA-seq was performed using the Chromium platform (10x Genomics) with the 5′ gene expression (5′ GEX) V2 kit, as per the manufacturer's instructions, for a target cell recovery of 10,000 cells per lane. Both gene expression and BCR V(D)J libraries were constructed, according to the manufacturer's instructions. All libraries were quantified via Agilent 2100 hsDNA Bioanalyzer or TapeStation 4200 and KAPA library quantification kit (Roche, 0797014001). Libraries were sequenced at a targeted depth of 25,000 reads per cell for gene expression and 5,000 reads per cell for BCR V(D)J, using the paired-end Illumina NovaSeq S4 300-cycle kit. Spatial clustering was performed with CellCharter (version 0.3.3), and differential gene expression analysis was conducted within the scVI framework. For each cluster, canonical immune and HCC-related genes were subjected to pathway enrichment analysis (using gseapy version 1.1.8) to guide annotation. Non-canonical genes were included if uniquely or highly expressed within a specific cluster. Heatmaps of enriched pathways and marker expression profiles were generated for visual comparison. Seventeen mIHC biopsies (six responders and 11 non-responders) from the discovery cohort (D1) were analyzed to identify TLS-like communities. CD3+, CD8+ and CD20+ cells were used as seeds for community detection. Cells within 10 µm of each other were iteratively connected if positive for any of the three markers, forming spatially contiguous TLS-like communities. Radial density profiles were computed for CD20+, CD3+, CD8+ and MZB1+ cells within each community by defining concentric rings from the centroid—determined via mean shift clustering—to encompass approximately 10% of the effective community area per ring. Marker-specific densities were calculated cumulatively across these rings. To assess global infiltration of MZB1+ PCs, spatial graphs were constructed using all cell centroids within each tissue section. A k-nearest neighbor (KNN) graph (k = 10) was constructed and partitioned using the Leiden algorithm. These scores were used to compare PC dispersion between responders and non-responders. Gene expression reads were aligned to the hg38 reference transcriptome, and count matrices were generated using the default Cell Ranger 2.1 workflow, using the ‘raw' matrix output. After alignment, barcodes matching cells contained more than 200 unique genes and at maximum 1,000 counts. From these cells, those with transcripts more than 25% mitochondrial genes were filtered from downstream analyses. Matrix scaling, logarithmic normalization and batch correction via data alignment through canonical correlation analysis and unsupervised clustering using a KNN graph partitioning approach were performed as previously described. Immunoglobulin light, heavy and variable chains were excluded from clustering due to their overabundance in PCs and B cells. Differentially expressed genes were identified using the FindMarkers function (Seurat). Mean unique molecular identifiers (UMIs) were imputed to determine logarithmic fold changes in expression between cell states to further the analysis of markers of interest. Gene set enrichment analysis was performed using the Enrichr, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes databases. Differential abundance was done using Dirichlet regression modeling strategies. BCR analysis was done using immunarch and Wilcoxon rank test. Trajectory analyses were conducted using Monocle 3 and Moran's I index. Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. The following external bulk and single-cell RNA-seq datasets were used for analyses shown in this study: GSE206325, GSE238264, GSE120575, GSE151530 and EGAS00001007547. The data generated by this study are available via Zenodo (https://doi.org/10.5281/zenodo.17393774)61. For additional details, please contact edgar.gonzalez-kozlova@mssm.edu, and we will respond within 48 h. The analysis code is available at https://github.com/eegk/B_and_Plasma_Cell_Studies. For additional details, please contact edgar.gonzalez-kozlova@mssm.edu, and we will respond within 48 h. Wei, S. C., Duffy, C. R. & Allison, J. P. Fundamental mechanisms of immune checkpoint blockade therapy. Laumont, C. M., Banville, A. C., Gilardi, M., Hollern, D. P. & Nelson, B. H. Tumour-infiltrating B cells: immunological mechanisms, clinical impact and therapeutic opportunities. Defining T cell states associated with response to checkpoint immunotherapy in melanoma. Sui, H. et al. Immunotherapy of targeting MDSCs in tumor microenvironment. Magen, A. et al. Intratumoral dendritic cell−CD4+ T helper cell niches enable CD8+ T cell differentiation following PD-1 blockade in hepatocellular carcinoma. Laumont, C. M. & Nelson, B. H. B cells in the tumor microenvironment: multi-faceted organizers, regulators, and effectors of anti-tumor immunity. A. et al. B cells and tertiary lymphoid structures promote immunotherapy response. Petitprez, F. et al. B cells are associated with survival and immunotherapy response in sarcoma. Griss, J. et al. B cells sustain inflammation and predict response to immune checkpoint blockade in human melanoma. Patil, N. S. et al. Intratumoral plasma cells predict outcomes to PD-L1 blockade in non-small cell lung cancer. Tertiary lymphoid structures and B cells: an intratumoral immunity cycle. Petroni, G., Pillozzi, S. & Antonuzzo, L. Exploiting tertiary lymphoid structures to stimulate antitumor immunity and improve immunotherapy efficacy. Fridman, W. H. et al. B cells and tertiary lymphoid structures as determinants of tumour immune contexture and clinical outcome. Yuan, J. et al. CTLA-4 blockade enhances polyfunctional NY-ESO-1 specific T cell responses in metastatic melanoma patients with clinical benefit. Integrated NY-ESO-1 antibody and CD8+ T-cell responses correlate with clinical benefit in advanced melanoma patients treated with ipilimumab. Presence of B cells in tertiary lymphoid structures is associated with a protective immunity in patients with lung cancer. Marron, T. U. et al. Neoadjuvant cemiplimab for resectable hepatocellular carcinoma: a single-arm, open-label, phase 2 trial. Buckup, M. et al. Multiparametric cellular and spatial organization in cancer tissue lesions with a streamlined pipeline. Kudo, M. et al. IMbrave150: efficacy and safety of atezolizumab plus bevacizumab versus sorafenib in patients with Barcelona Clinic Liver Cancer stage B unresectable hepatocellular carcinoma: an exploratory analysis of the phase III study. Cappuyns, S. et al. PD-1−CD45RA+ effector-memory CD8 T cells and CXCL10+ macrophages are associated with response to atezolizumab plus bevacizumab in advanced hepatocellular carcinoma. Ma, L. et al. Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Matsuo, M. et al. IFN-γ enables cross-presentation of exogenous protein antigen in human Langerhans cells by potentiating maturation. Gupta, A. et al. A novel human-derived antibody against NY-ESO-1 improves the efficacy of chemotherapy. Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Fehrenbacher, L. et al. Atezolizumab versus docetaxel for patients with previously treated non-small-cell lung cancer (POPLAR): a multicentre, open-label, phase 2 randomised controlled trial. Rittmeyer, A. et al. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial. Nutt, S. L., Hodgkin, P. D., Tarlinton, D. M. & Corcoran, L. M. The generation of antibody-secreting plasma cells. Yang, B. et al. An Asia-specific variant of human IgG1 represses colorectal tumorigenesis by shaping the tumor microenvironment.J. Gu, Y. et al. Tumor-educated B cells selectively promote breast cancer lymph node metastasis by HSPA4-targeting IgG. Leader, A. M. et al. Single-cell analysis of human non-small cell lung cancer lesions refines tumor classification and patient stratification. Characterization of a rare IL-10–competent B-cell subset in humans that parallels mouse regulatory B10 cells. Ng, K. W. et al. Antibodies against endogenous retroviruses promote lung cancer immunotherapy. Lavie, D., Ben-Shmuel, A., Erez, N. & Scherz-Shouval, R. Cancer-associated fibroblasts in the single-cell era. Aranda, C. J. et al. IgG memory B cells expressing IL4R and FCER2 are associated with atopic diseases. Mlynarczyk, C. et al. BTG1 mutation yields supercompetitive B cells primed for malignant transformation. & Tunyaplin, C. Regulatory mechanisms that determine the development and function of plasma cells. Kim, S. S. et al. B cells improve overall survival in HPV-associated squamous cell carcinomas and are activated by radiation and PD-1 blockade. Tertiary lymphoid structures in cancer: the double-edged sword role in antitumor immunity and potential therapeutic induction strategies. Schumacher, T. N. & Thommen, D. S. Tertiary lymphoid structures in cancer. Roudko, V. et al. Immunological biomarkers of response and resistance to treatment with cabozantinib and nivolumab in recurrent endometrial cancer. Ohue, Y. et al. Serum antibody against NY-ESO-1 and XAGE1 antigens potentially predicts clinical responses to anti-programmed cell death-1 therapy in NSCLC. Wang, H. et al. NY-ESO-1 expression in solid tumors predicts prognosis: a systematic review and meta-analysis. Gure, A. O. et al. Cancer-testis genes are coordinately expressed and are markers of poor outcome in non-small cell lung cancer. Expression of the cancer/testis antigen NY-ESO-1 in primary and metastatic malignant melanoma (MM)–correlation with prognostic factors. Dai, Y. et al. Humoral determinants of checkpoint immunotherapy. The single-cell immunogenomic landscape of B and plasma cells in early-stage lung adenocarcinoma. Immune checkpoint therapy-elicited sialylation of IgG antibodies impairs antitumorigenic type I interferon responses in hepatocellular carcinoma. Luan, X. et al. Cyclophilin A is a key positive and negative feedback regulator within interleukin-6 trans-signaling pathway. & Ge, J. STAT3−CyPA signaling pathway in endothelial cell apoptosis. Calandra, T. & Roger, T. Macrophage migration inhibitory factor: a regulator of innate immunity. Tang, C.-H. A. et al. Phosphorylation of IRE1 at S729 regulates RIDD in B cells and antibody production after immunization. Bujisic, B. et al. Impairment of both IRE1 expression and XBP1 activation is a hallmark of GCB DLBCL and contributes to tumor growth. Zhu, H., Jiang, C., Kaufman, R. J., Li, H. & Singh, N. In vitro stimulation of IRE1α/XBP1-deficient B cells with LPS. Altorki, N. K. et al. Neoadjuvant durvalumab plus radiation versus durvalumab alone in stages I−III non-small cell lung cancer: survival outcomes and molecular correlates of a randomized phase II trial. Ban, Y. et al. Radiation-activated secretory proteins of Scgb1a1+ club cells increase the efficacy of immune checkpoint blockade in lung cancer. Zhang, S. et al. Spatial transcriptomics analysis of neoadjuvant cabozantinib and nivolumab in advanced hepatocellular carcinoma identifies independent mechanisms of resistance and recurrence. Humoral IgG1 responses to tumor antigens underpin clinical outcomes in immune checkpoint blockade. We thank the Biorepository and Pathology CoRE Laboratory of the Icahn School of Medicine at Mount Sinai for support. We also thank members of the Merad and Gnjatic laboratories for their help and support. This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards grant UL1TR004419 from the National Center for Advancing Translational Sciences. The clinical trial (ClinicalTrials.gov NCT03916627, cohort B) and part of this project were funded by Regeneron Pharmaceuticals, Inc. S.G. was partially supported by National Institutes of Health (NIH) grants CA224319, DK124165 and CA196521. was partially supported by NIH grants CA257195, CA254104 and CA154947. These authors contributed equally: Edgar Gonzalez-Kozlova, Robert Sweeney, Igor Figueiredo, Kevin Tuballes. Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY, USA Edgar Gonzalez-Kozlova, Robert Sweeney, Igor Figueiredo, Kevin Tuballes, Sinem Ozbey, Pauline Hamon, Matthew D. Park, Giorgio Ioannou, Yohei Nose, Ruiwei Guo, Paula Restrepo, Mark Buckup, Vladimir Roudko, Clotilde Hennequin, Jessica Le Berichel, Nicholas Venturini, Laszlo Halasz, Leanna Troncoso, Alexandra Tabachnikova, Christie Chang, Amanda Reid, Haley Brown, Theodore Chin, Rafael Cabal, Raphaël Mattiuz, Shingo Eikawa, Diane Marie Del Valle, Tina Ruth Gonsalves, Nelson M. LaMarche, Hajra Jamal, Alona Lansky, Nancy Yi, Daniella Nelson, Jarod Morgenroth-Rebin, Raphael Merand, Bryan Villagomez, Darwin D'Souza, Emir Radkevich, Kai Nie, Zhihong Chen, Stephen C. Ward, Maria Isabel Fiel, Rachel Brody, Parissa Tabrizian, Ganesh Gunasekaran, Alice O. Kamphorst, Noah Cohen, Maria Curotto de Lafaille, Olivia Hapanowicz, Natalie Lucas, Kathy Wu, Myron Schwartz, Seunghee Kim-Schulze, Miriam Merad, Thomas U. Marron & Sacha Gnjatic Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA Edgar Gonzalez-Kozlova, Robert Sweeney, Igor Figueiredo, Kevin Tuballes, Sinem Ozbey, Giorgio Ioannou, Yohei Nose, Ruiwei Guo, Paula Restrepo, Mark Buckup, Vladimir Roudko, Clotilde Hennequin, Jessica Le Berichel, Nicholas Venturini, Laszlo Halasz, Leanna Troncoso, Alexandra Tabachnikova, Christie Chang, Amanda Reid, Haley Brown, Theodore Chin, Rafael Cabal, Raphaël Mattiuz, Shingo Eikawa, Diane Marie Del Valle, Tina Ruth Gonsalves, Nelson M. LaMarche, Hajra Jamal, Alona Lansky, Nancy Yi, Daniella Nelson, Jarod Morgenroth-Rebin, Raphael Merand, Bryan Villagomez, Darwin D'Souza, Emir Radkevich, Kai Nie, Zhihong Chen, Stephen C. Ward, Maria Isabel Fiel, Rachel Brody, Parissa Tabrizian, Ganesh Gunasekaran, Alice O. Kamphorst, Noah Cohen, Maria Curotto de Lafaille, Olivia Hapanowicz, Natalie Lucas, Kathy Wu, Myron Schwartz, Seunghee Kim-Schulze, Miriam Merad, Thomas U. Marron & Sacha Gnjatic Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA Edgar Gonzalez-Kozlova, Robert Sweeney, Igor Figueiredo, Kevin Tuballes, Sinem Ozbey, Giorgio Ioannou, Yohei Nose, Ruiwei Guo, Mark Buckup, Vladimir Roudko, Clotilde Hennequin, Jessica Le Berichel, Nicholas Venturini, Laszlo Halasz, Leanna Troncoso, Alexandra Tabachnikova, Christie Chang, Amanda Reid, Haley Brown, Theodore Chin, Rafael Cabal, Raphaël Mattiuz, Shingo Eikawa, Diane Marie Del Valle, Tina Ruth Gonsalves, Nelson M. LaMarche, Hajra Jamal, Alona Lansky, Nancy Yi, Daniella Nelson, Jarod Morgenroth-Rebin, Raphael Merand, Bryan Villagomez, Darwin D'Souza, Emir Radkevich, Kai Nie, Zhihong Chen, Stephen C. Ward, Maria Isabel Fiel, Rachel Brody, Parissa Tabrizian, Ganesh Gunasekaran, Alice O. Kamphorst, Noah Cohen, Maria Curotto de Lafaille, Olivia Hapanowicz, Natalie Lucas, Kathy Wu, Myron Schwartz, Seunghee Kim-Schulze, Miriam Merad, Thomas U. Marron & Sacha Gnjatic Human Immune Monitoring Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA Hajra Jamal, Alona Lansky, Nancy Yi, Daniella Nelson, Jarod Morgenroth-Rebin, Raphael Merand, Bryan Villagomez, Darwin D'Souza, Emir Radkevich, Kai Nie, Zhihong Chen, Seunghee Kim-Schulze, Miriam Merad & Sacha Gnjatic Nicola James, John C. Lin, Gavin Thurston & Nathalie Fiaschi Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar contributed to writing, data analysis, figure generation and assay development. supported assay execution, data analysis, figure generation and interpretation of results. assisted with data analysis and figure preparation. Assay development and execution were supported by M.B., V.R., G.I., R.C., S.E., H.J., A.L., N.Y., D.N., J.M., R.M., B.V., D.D., E.R., K.N., Y.T. contributed to both assay development and writing. All authors reviewed and provided feedback on the final manuscript. S.G. reports other research funding from Boehringer Ingelheim, Bristol Myers Squibb, Celgene, Genentech and Takeda not related to this study. S.G. has served on a scientific advisory board for Taiho Pharmaceuticals. S.G. is a named co-inventor on an issued patent (US20190120845A1) for Multiplexed Immunohistochemical Consecutive Staining on Single Slide (MICSSS), an mIHC technology to characterize tumors and treatment responses that is filed through the Icahn School of Medicine at Mount Sinai and currently remains unlicensed. serves on the scientific advisory board of and holds stock from Compugen Inc., Dynavax Inc., Morphic Therapeutic Inc., Asher Bio Inc., Dren Bio Inc., Nirogy Inc., Oncoresponse Inc. and Owkin Inc. M.M. also serves on the scientific advisory board of Innate Pharma Inc., DBV Inc. and Genenta Inc. M.M. receives funding for contracted research from Regeneron and Boerhinger Ingelheim. has served on advisory and/or data safety monitoring boards for Rockefeller University, Regeneron, AbbVie, Bristol Myers Squibb, Boehringer Ingelheim, Atara, AstraZeneca, Genentech, Celldex, Chimeric, Glenmark, Simcere, Surface, G1 Therapeutics, NGMbio, DBV Technologies, Arcus and Astellas and has research grants from Regeneron, Bristol Myers Squibb, Merck and Boehringer Ingelheim. Finally, this study was partially funded by Regeneron. The remaining authors declare no competing interests. Nature Medicine thanks Zlatko Trajanoski, Changhoon Yoo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Anna Ranzoni, in collaboration with the Nature Medicine team. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Table showing the number of patients without treatment, with and without treatment stratified in adjacent normal and tumor tissue. c. Stacked bar plot showing the cluster composition per sample. d. Bar plot showing the landscape of cells per cluster. e, f. UMAP and pie chart showing that 54% of cells corresponded to tumor tissue and 29% of cells belong to patients who responded to therapy, respectively. g. Dot plot showing the percent expressed and average expression levels of canonical genes associated with Plasma and B Cells. In parallel, we show the absence of gene expression associated to non-B/Plasma cells such as CD3, CD4, CD8 and similar. i. Volcano plot showing the significant genes for differential expression between non-responders (NR) in dark red and responders (R) in dark blue. Differential expression was calculated using a two-sided moderate t test through mixed effect models. The top 4 genes based on Log2FC, and percent expressed cells (transparency) are labeled. Bulk RNAseq HCC cohort, violin plot showing the percent variance explained associated with patient, infiltration, response and timepoint, ordered from most to least variance explained. k. Principal component analysis of HCC bulk RNAseq data showing the variance explained for each component. a. Pie chart showing on top the number of patients and cells with available scBCRseq. Bottom shows the number of responder and non-responders. b. Heatmap showing the average log10 expression per isotype for every cluster of cells identified. c. Percent of plasma cell isotypes identified in responder and non-responders in normal tissue; differences did not reach statistical significance. d. Distribution of clonally expanded B and plasma cells. e. Pie chart showing the total number of cells classified as expanded or unique. f. Stacked bar plot panel showing expansion levels for all the data set, normal tissue and tumor issue, from left to right, respectively. g. Stacked bar plot showing the composition of expanded and non-expanded cells in normal tissue for responders and non-responders. Overlap of CDR3 barcodes between LN, tumor and adjacent normal tissue samples for 3 patients, samples are matched per patient. j. Y-axis average shared clonotypes per individual identified in cells across different clusters (LN (lymph node), Naïve, Memory and Plasma compartments) for IgG1, IgG2, IgM and IgA. This figure shows that plasma IgG1 and IgG2 have larger clones in responders, while Naïve or Memory cells with IgM or IgA isotype are most commonly present in non-responders. Table showing the number of samples per tissue type and response. c. Violin plot from the variance analysis showing the effect of covariates modeled in this analysis. Tissue type was the largest source of variance for this bulk RNAseq. d. Barplot showing the number of differentially expressed associated with either response or non-response per comparison using moderated t test statistics. Volcano plots showing the direction and position of IG heavy chains in the differential expression gene (DEG) results. In tissue comparisons, there was no difference between responders and non-responders for IGH isotypes. However, changes over time for either responders or non-responders showed an increase of IGH expression (i-j). The p-values were calculated using a two-sided moderate t test. k. Boxplots (same definition as 1c) showing the average clonal sizes for responders and non-responders. Median increase in IGHG1 and IGHG2 reached nominal statistical significance (pval<0.05), but not FDR correction when using a two-sided Wilcoxon rank test. l. Boxplots (same definition as 1c) showing the results from a projection of previously identified single cell signatures for Plasma and Plasmablast clusters into bulk RNAseq showed a significant increase when using a two-sided Wilcoxon rank test at the time of biopsy and borderline significant changes post treatment for plasma cells. m, n. Principal component analysis showing PC1 and 2 reflecting approximately 50% of the variance distinguishing clinical response and tissue type. Box plots (same definition as 1c) showing the normalized expression levels of IGHX heavy chain genes corresponding to either responder and non-responder. The p-values were estimated using a two-sided Wilcoxon rank test p. Dotplots showing the average clonal sizes per patient for the different tissue types and timepoints available. q. Dotplots showing the heavy chain clonal sizes per isotype that are shared between tissue origin and clinical response status. b, c. Boxplot (same definition as 1c) showing the Dirichlet regression and log-likelihood test comparing responders and non-responders in either normal or tumor tissue for 8 responders and 12 non-responders. d. Sample distribution in validation cohort scRNAseq data. e, f. Principal components of the RNAseq profiles labeled for response and type of treatment, respectively. g. Dotplot showing the average clonal sizes between tissues for every patient regardless of response and IG heavy chain isotype. h. Dotplots showing the heavy chain clonal sizes that are shared between tissues for every IG isotypes stratified per response. Top bar shows response in red (NR) and blue (R), while lateral bar shows CTs in color black. a. Heatmap showing incoming and outgoing signaling pathways between B or plasma cell clusters in tumor tissue calculated using the algorithm cellchat. b. Heatmap showing the average expression per category of cells and specific genes corresponding to signaling pathways with FDR < 0.01. c. Heatmap showing incoming and outgoing signaling pathways between B or plasma cell clusters in responders and non-responders calculated using the algorithm cellchat. d. Spearman Correlation plots showing the directionality and strength of the association between differentially expressed genes that are also high in Moran I's score for responders and non-responders (FDR < 0.05 & Moran I > 0.5). e. Barplot showing the isotype composition of immunoglobulin heavy chain stratified by responders, treatment and pre/post tissues in Feldman et al. re-analysis. Supplementary File 2 containing pages for clone tracking and gene signatures per cluster. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Gonzalez-Kozlova, E., Sweeney, R., Figueiredo, I. et al. Humoral IgG1 responses to tumor antigens underpin clinical outcomes in immune checkpoint blockade. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. , Article number: (2026) Cite this article We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply. Alternative splicing (AS) is a ubiquitous post-transcriptional regulatory mechanism, that has greatly expanded the transcriptomic and proteomic diversity in vertebrates. While gene regulation of hematopoiesis has been extensively researched in vertebrates, the functions of species- and cell lineage-specific splice variants in vertebrates are largely unknown. Here, we curate transcriptomic data on fetal hematopoietic organ development in six vertebrates and hematopoietic cell differentiation in humans and mice. To identify functional exon-skipping events among thousands of cassette exons in protein-coding genes for a specific differentiation lineage and species, we develop a machine-learning model interrogating 19 features including dynamic expression, protein structure, and evolutionary conservation, and integrate them into a single prediction score, named Functional AS Score (FAScore). Using FAScore, we identify four previously-uncharacterized functional AS events in which deletion of the AS exon leads to defects in erythropoiesis and myelopoiesis. Furthermore, we demonstrate that deletion of exon 15 of TBC1D23 reduces erythropoiesis in mice and zebrafish through elevated binding capacity to RANBP2/RANGAP1 leading to increased SUMOylation level of HDAC1. Collectively, our study presents a valuable tool to identify functional exon skipping (ES) events during hematopoietic lineage commitment, and establishes a research paradigm that can be broadly applied to other biological processes. For lineage differentiation datasets, raw sequencing data of mouse are deposited in the Gene Expression Omnibus (GEO) database under the accession number GSE142216, and raw sequencing data of human are available from the Blueprint Consortium website (http://www.blueprint-epigenome.eu) and the European Genome-phenome Archive with accession number EGAD00001000745. For FHO development datasets, the raw sequencing data are available through the NCBI: human (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB26969), rhesus (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB26956), rabbit (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB26840), mouse (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB26869), rat (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB26889), Opossum (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB27035) and chicken (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB26695). For zebrafish, the accession number for the RNA-seq data is the GEO database: GSE120581. The data of single-cell RNAseq, including sequencing reads and single-cell expression matrices, are available from the GEO: GSE276911. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (https://proteomecentral.proteomexchange.org) via the iProX partner repository106,107 with the dataset identifier PXD072614 and PXD072611. Source Data are provided with this paper. The intermediate data related to model training and prediction have been deposited in the Figshare database (https://doi.org/10.6084/m9.figshare.29069336.v4). All newly generated plasmids and other relevant materials are available upon request from the corresponding author. Source data are provided with this paper. FAScore code, documentation and tutorials are available at GitHub: (https://github.com/LuChenLab/FAScore.git108). Jiang, W. & Chen, L. Alternative splicing: human disease and quantitative analysis from high-throughput sequencing. Google Scholar Leung, S. K. et al. Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing. Google Scholar Weyn-Vanhentenryck, S. M. et al. Precise temporal regulation of alternative splicing during neural development. Zhang, M. et al. Axonogenesis is coordinated by neuron-specific alternative splicing programming and splicing regulator PTBP2. Gómez-Redondo, I., Planells, B., Navarrete, P. & Gutiérrez-Adán, A. Role of alternative splicing in sex determination in vertebrates. Han, C. et al. A male-specific doublesex isoform reveals an evolutionary pathway of sexual development via distinct alternative splicing mechanisms. Li, Y. et al. A splicing factor switch controls hematopoietic lineage specification of pluripotent stem cells. Fisher, E. & Feng, J. RNA splicing regulators play critical roles in neurogenesis. Wiley Interdiscip. Wang, F. et al. Single-cell architecture and functional requirement of alternative splicing during hematopoietic stem cell formation. Polfus, L. M. et al. Whole-exome sequencing identifies loci associated with blood cell traits and reveals a role for alternative GFI1B splice variants in human hematopoiesis. Parra, M. et al. An important class of intron retention events in human erythroblasts is regulated by cryptic exons proposed to function as splicing decoys. Li, Z. et al. Pro-apoptotic effects of splice-switching oligonucleotides targeting Bcl-x pre-mRNA in human glioma cell lines. Komeno, Y. et al. Runx1 exon 6-related alternative splicing isoforms differentially regulate hematopoiesis in mice. Bradley, R. K. & Anczukow, O. RNA splicing dysregulation and the hallmarks of cancer. Peng, Q. et al. Impacts and mechanisms of alternative mRNA splicing in cancer metabolism, immune response, and therapeutics. Marasco, L. E. & Kornblihtt, A. R. The physiology of alternative splicing. Cell Biol. Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Reimer, K. A. & Neugebauer, K. M. Blood relatives: splicing mechanisms underlying erythropoiesis in health and disease. Edwards, C. R. et al. A dynamic intron retention program in the mammalian megakaryocyte and erythrocyte lineages. Noh, J. Y. et al. Megakaryopoiesis and platelet biology: roles of transcription factors and emerging clinical implications. Keightley, M. C. & Lieschke, G. J. Splicing dysfunction and disease: the case of granulopoiesis. Semin Cell Dev. Eksi, R. et al. Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data. PLoS Comput Biol. Li, W. et al. High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method. Nucleic Acids Res 42, e39 (2014). Panwar, B. et al. Genome-wide functional annotation of human protein-coding splice variants using multiple instance learning. J. Proteome Res 15, 1747–1753 (2016). Chen, H., Shaw, D., Zeng, J., Bu, D. & Jiang, T. DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning. Shaw, D., Chen, H. & Jiang, T. DeepIsoFun: a deep domain adaptation approach to predict isoform functions. Wang, K., Wang, J., Domeniconi, C., Zhang, X. & Yu, G. Differentiating isoform functions with collaborative matrix factorization. Yu, G., Wang, K., Domeniconi, C., Guo, M. & Wang, J. Isoform function prediction based on bi-random walks on a heterogeneous network. Hao, Y. et al. Semi-supervised learning predicts approximately one third of the alternative splicing isoforms as functional proteins. Cendrowski, J. et al. Splicing variation of BMP2K balances abundance of COPII assemblies and autophagic degradation in erythroid cells. Elife https://doi.org/10.7554/eLife.58504 (2020). Ergun, A. et al. Differential splicing across immune system lineages. Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Sarropoulos, I., Marin, R., Cardoso-Moreira, M. & Kaessmann, H. Developmental dynamics of lncRNAs across mammalian organs and species. Mazin, P. V., Khaitovich, P., Cardoso-Moreira, M. & Kaessmann, H. Alternative splicing during mammalian organ development. Wang, F. et al. A comprehensive RNA editome reveals that edited Azin1 partners with DDX1 to enable hematopoietic stem cell differentiation. Holt, P. G. & Jones, C. A. The development of the immune system during pregnancy and early life. Xue, Y. et al. A 3D atlas of hematopoietic stem and progenitor cell expansion by multi-dimensional RNA-Seq analysis. Li, Z. et al. MeDAS: a metazoan developmental alternative splicing database. Nucleic Acids Res 49, D144–D150 (2021). Trincado, J. L. et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. Carramolino, L. et al. Platelets play an essential role in separating the blood and lymphatic vasculatures during embryonic angiogenesis. Hisa, T. et al. Hematopoietic, angiogenic and eye defects in Meis1 mutant animals. EMBO J. Azcoitia, V., Aracil, M., Martínez, A. C. & Torres, M. The homeodomain protein Meis1 is essential for definitive hematopoiesis and vascular patterning in the mouse embryo. Watkins, N. A. et al. A HaemAtlas: characterizing gene expression in differentiated human blood cells. Zeddies, S. et al. MEIS1 regulates early erythroid and megakaryocytic cell fate. Breiman, L. Random forests. Zibetti, C. et al. Alternative splicing of the histone demethylase LSD1/KDM1 contributes to the modulation of neurite morphogenesis in the mammalian nervous system. Fiszbein, A. et al. Alternative splicing of G9a regulates neuronal differentiation. UniProt, C. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49, D480–D489 (2021). Hornbeck, P. V. et al. 15 years of PhosphoSitePlus(R): integrating post-translationally modified sites, disease variants and isoforms. Nucleic Acids Res 47, D433–D441 (2019). Cowley, M. J. et al. PINA v2.0: mining interactome modules. Nucleic Acids Res 40, D862–D865 (2012). Kumar, L. & E Futschik, M. Mfuzz: a software package for soft clustering of microarray data. Kumar, P. et al. PI3K/AKT Activation Mediates B-Cell Transformation By the "T" Splice-Variant of Fyn Kinase. Shin, J. J. H., Gillingham, A. K., Begum, F., Chadwick, J. & Munro, S. TBC1D23 is a bridging factor for endosomal vesicle capture by golgins at the trans-Golgi. Cell Biol. Deng, H. et al. The WDR11 complex is a receptor for acidic-cluster-containing cargo proteins. Wang, J. et al. Trans-Golgi network tethering factors regulate TBK1 trafficking and promote the STING-IFN-I pathway. Cell Discov. Zhao, L. et al. FAM91A1-TBC1D23 complex structure reveals human genetic variations susceptible for PCH. Skinnider, M. A. et al. Cell type prioritization in single-cell data. Gulati, G. S. et al. Single-cell transcriptional diversity is a hallmark of developmental potential. Werner, A., Flotho, A. & Melchior, F. The RanBP2/RanGAP1*SUMO1/Ubc9 complex is a multisubunit SUMO E3 ligase. Morabito, S. et al. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Rep Methods 3, 100498 (2023) Joung, H. et al. Sumoylation of histone deacetylase 1 regulates MyoD signaling during myogenesis. Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Kim, M. Y., Yan, B., Huang, S. & Qiu, Y. Regulating the regulators: the role of histone deacetylase 1 (HDAC1) in Erythropoiesis. Jayapal, S. R. et al. Down-regulation of Myc is essential for terminal erythroid maturation. Caria, C. A., Faa, V. & Ristaldi, M. S. Kruppel-Like Factor 1: a pivotal gene regulator in erythropoiesis. Chasis, J. A. et al. Differentiation-associated switches in protein 4.1 expression. Synthesis of multiple structural isoforms during normal human erythropoiesis. Birzele, F., Csaba, G. & Zimmer, R. Alternative splicing and protein structure evolution. Nucleic Acids Res 36, 550–558 (2008). Hiller, M., Huse, K., Platzer, M. & Backofen, R. Creation and disruption of protein features by alternative splicing - a novel mechanism to modulate function. Genome Biol. Kriventseva, E. V. et al. Increase of functional diversity by alternative splicing. Trends Genet 19, 124–128 (2003). Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. Spearman, C. The proof and measurement of association between two things. Int J. Epidemiol. Hunsicker, M. E. et al. Characterizing driver-response relationships in marine pelagic ecosystems for improved ocean management. Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Rodriguez, J. M. et al. APPRIS 2017: principal isoforms for multiple gene sets. Nucleic Acids Res 46, D213–D217 (2018). Burley, S. K. et al. RCSB protein data bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res 49, D437–D451 (2021). Lopez, G., Maietta, P., Rodriguez, J. M., Valencia, A. & Tress, M. L. Firestar-advances in the prediction of functionally important residues. Nucleic Acids Res 39, W235–W241 (2011). Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res 49, D412–D419 (2021). Jones, D. T. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Käll, L., Krogh, A. & Sonnhammer, E. L. A combined transmembrane topology and signal peptide prediction method. Viklund, H. & Elofsson, A. Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci. Emanuelsson, O., Brunak, S., von Heijne, G. & Nielsen, H. Locating proteins in the cell using TargetP, SignalP and related tools. Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 49, D10–D17 (2021). Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997). Claesen, M., Smet, F. D., Suykens, J. A. K. & Moor, B. D. A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Hanley, J. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Davis, J. & Goadrich, M. H. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd international conference on Machine learning (2006). Fawcett, T. Introduction to ROC analysis. Zhou, T., Ren, J., Medo, M. & Zhang, Y. C. Bipartite network projection and personal recommendation. Nonlin Soft Matter Phys. & Yen, G. G. An evolutionary algorithm based on Minkowski distance for many-objective optimization. IEEE Trans. Cabarle, F. G. C., Adorna, H. N., Jiang, M. & Zeng, X. Spiking neural P systems with scheduled synapses. IEEE Trans. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15, 1034–1050 (2005). Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Singh, I. et al. Widespread intronic polyadenylation diversifies immune cell transcriptomes. Kuhn, R. M., Haussler, D. & Kent, W. J. The UCSC genome browser and associated tools. Necsulea, A. et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Tu, Y. et al. TBC1D23 mediates Golgi-specific LKB1 signaling. Liu, Z. et al. SCGN deficiency is a risk factor for autism spectrum disorder. Signal Transduct. Target Ther. Detrich, H. W. et al. Intraembryonic hematopoietic cell migration during vertebrate development. Leet, J. K. et al. High-content screening in zebrafish embryos identifies butafenacil as a potent inducer of anemia. PLoS One 9, e104190 (2014). Bastian, M. et al. Gephi: An Open Source Software for Exploring and Manipulating Networks. Proceedings of the international AAAI conference on web and social media 3, 361–362 (2009). Das, A. T., Tenenbaum, L. & Berkhout, B. Tet-on systems for doxycycline-inducible gene expression. Gene Ther. Wang, T. et al. SENP1-Sirt3 signaling controls mitochondrial protein acetylation and metabolism. Lumpkin, R. J. et al. Site-specific identification and quantitation of endogenous SUMO modifications under native conditions. Chen, T. et al. iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Res 50, D1522–D1527 (2022). Ma, J. et al. iProX: an integrated proteome resource. Nucleic Acids Res 47, D1211–D1217 (2019). Xiao, H. et al. The functional landscape of alternative splicing in hematopoietic lineage commitment. FAScore, https://doi.org/10.5281/zenodo.18146085 (2025). Download references This work was supported by Noncommunicable Chronic Diseases-National Science and Technology Major Project (Grant No. 2023ZD0500500 to L. C.), the National Natural Science Foundation of China (Grant No. ), National Key Research and Development Program of China (Grant No. 2022YFA1105200 to D. ), and National Science Fund for Distinguished Young Scholars (Grant No. 32125012 to D. We also appreciate the help of Prof. Lunzhi Dai at Sichuan University for the analysis of Proteomics and mass spectrometry. We would like to thank Huifang Li at Core Facilities of West China Hospital for her assistance in cell sorting. These authors contributed equally: Xiao Hu, Jinrui Wang, Li Chen, Qin Yang, Manuel Tardaguila. These authors jointly supervised this work: Nicole Soranzo, Jing-wen Lin, Da Jia, Lu Chen. Department of Laboratory Medicine, West China Second University Hospital. Key Laboratory of Birth Defects and Related Diseases of Women and Children, Ministry of Education, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, China Xiao Hu, Jinrui Wang, Qin Yang, Bin Mao, Shenghui Niu, Zijie Xu, GuiHua Wang, Dan Zhang, Yating Zhang, Zhen Zhou, Jing Luo, Zhifeng He, Defu Liu, Da Jia & Lu Chen Biosafety Laboratory of West China Hospital, Center for Biological and Translational Research, West China Hospital, Sichuan University, Chengdu, China Li Chen, Chao Tang & Jing-wen Lin Human Technopole, Fondazione Human Technopole, Milan, Italy Manuel Tardaguila & Nicole Soranzo Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK Nicole Soranzo Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Puddicombe Way, Cambridge, UK Nicole Soranzo British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK Nicole Soranzo National Institute for Health and Care Research Blood and Transplant Research Unit in Donor Health and Behaviour, University of Cambridge, Cambridge, UK Nicole Soranzo Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar Search author on:PubMed Google Scholar conceived and supervised the project. performed all the mouse studies and analyzed the data. assisted in the mouse studies. performed all the cellular studies. assisted in the plasmid construction. performed all zebrafish studies. performed the machine learning and bioinformatics analysis. assisted in bioinformatics analysis. wrote the manuscript and generated the figures. edited the manuscript. All authors read and approved the final version of the manuscript. Correspondence to Nicole Soranzo, Jing-wen Lin, Da Jia or Lu Chen. The authors declare no competing interests. Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions Hu, X., Wang, J., Chen, L. et al. The functional landscape of alternative splicing in hematopoietic lineage commitment. Nat Commun (2026). Download citation Received: 24 April 2025 Accepted: 18 January 2026 Published: 27 January 2026 Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative (Nat Commun) ISSN 2041-1723 (online) © 2026 Springer Nature Limited Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.