Van Andel Institute scientists and collaborators have developed a new method for identifying and classifying pancreatic cancer cell subtypes based on sugars found on the outside of cancer cells. These sugars, called glycans, help cells recognize and communicate with each other. The new method, multiplexed glycan immunofluorescence, combines specialized software and imaging techniques to pinpoint the exact mix of pancreatic cancer cells that comprise tumors. Pancreatic cancer often does not cause noticeable symptoms in its early stages. As a result, only 15% of pancreatic cancers are found in time to allow for surgical removal. To make things more complicated, pancreatic tumors can include many different subtypes of malignant cells, each of which respond differently to treatment. "Our new method allows us to go one step beyond cancer diagnosis by revealing which subtypes of pancreatic cancer cells make up a tumor. The glycan signatures were identified by analyzing tumor tissue. This is important, Haab said, because blood tests are easier on patients -- they are far quicker, cheaper and less invasive than surgery. They're also exploring if it can be used to better detect and characterize other gastrointestinal cancers. Note: Content may be edited for style and length. Stay informed with ScienceDaily's free email newsletter, updated daily and weekly. Or view our many newsfeeds in your RSS reader: Keep up to date with the latest news from ScienceDaily via social networks: Tell us what you think of ScienceDaily -- we welcome both positive and negative comments.
You might say it's just a pigment of your imagination We may earn commission if you buy from a link. Well, despite what you may have come to believe, violet is not purple. In fact, violet (along with the rest of the colors in a naturally occurring rainbow) has something purple doesn't—its own wavelength of light. Anyone who ever ended up with a sunburn knows violet wavelengths are real, as the Sun's ultraviolet (UV) radiation is the reason you need to wear sunscreen, even though you can't see those wavelengths (more on that later). Red, orange, yellow, green, blue, and indigo are all just as real. Well, purple is just your brain's way of resolving confusion. Red and blue (or violet) wavelengths are two opposite extremes on the spectrum. When you see both of these wavelengths in the same place, you eyes and brain don't know what to do with them, so they compensate, and the clashing wavelengths register as the color we call purple. The visible light spectrum detectable by human eyes makes up only a small fraction of wavelengths (0.0035%, to be exact). Those colors are made available to us by millions of densely packed photoreceptor cells known as cones, which respond to light hitting our retina. That's why we cannot make out UV or infrared light—UV wavelengths are too short for our cones to detect, and infrared wavelengths are too long. Approximately 60% of cones are L cones that best absorb reddish wavelengths (as a result of the reddish pigments they contain), 30% are M cones that best absorb greenish wavelengths (and have greenish pigment), and 10% are S cones that best absorb bluish wavelengths (and have bluish pigment). All three types of cones can absorb numerous wavelengths close to their peak—though, that absorption gets weaker the farther you stray from the peak absorption wavelength—and overlap in their ability to detect colors like yellows and teals. The brain then determines what color you are looking at by comparing the differences in signal strength, allowing us to see up to a million colors. When you look at in-between colors (like teal, for example), your brain averages out how many cones of which types responded to the detection of that in-between wavelength. If there is more blue than green, you see what you perceive as a shade of blue, and vice versa. The problem with purple is that it isn't supposed to be possible to create a color from wavelengths on opposite ends of the spectrum. The shortest wavelength detection made by your S cones (violet light) has no overlap with the longest wavelength detection made by your L cones (red light). It's an illusion of physics and neuroscience that makes us think we see a nonspectral color. Despite the fact that it is technically a figment—more like pigment—of our imaginations, purple has earned a rich reputation as the color of royalty, nobility, power, luxury, devotion, mystery, and magic. Maybe the most appropriate association is that last one. Her work has appeared in Popular Mechanics, Ars Technica, SYFY WIRE, Space.com, Live Science, Den of Geek, Forbidden Futures and Collective Tales. She lurks right outside New York City with her parrot, Lestat. When not writing, she can be found drawing, playing the piano or shapeshifting. A Student Sniffed Out an Ancient Circle of Stones
You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). You can also search for this author in PubMed Google Scholar You have full access to this article via your institution. The high-resolution 3D map contains more than 200,000 brain cells, around 82,000 of which are neurons. It also includes more than 500 million of the neuronal connection points called synapses and more than 4 kilometres of neuronal wiring, all found in a tiny block of tissue in a brain region involved in vision. This brain-activity map, combined with the wiring diagram, marks a milestone in connectomics, a field that aims to show how brains process and organize information. Behind the massive efforts are more than 150 researchers in the Machine Intelligence from Cortical Networks (MICrONS) project, who described their work in a package of eight papers published today in Nature and Nature Methods. The MICrONS project has made its resources available for the neuroscience community online, and other teams are already exploring them in different studies. “They managed to do something that we haven't done as a neuroscience community in basically all of our history, which is to be able to map the activity of neurons onto the wiring on a very large population of neurons,” says Mariela Petkova, a neuroscientist at Harvard University in Cambridge, Massachusetts, who is not involved with the project. Moritz Helmstaedter, a neuroscientist at the Max Planck Institute for Brain Research in Frankfurt, Germany, says “the combination of function and structure at that scale” is unprecedented. Read the related News & Views, ‘A vast brain map links neural activity and wiring'. Largest brain map ever reveals fruit fly's neurons in exquisite detail How the world's biggest brain maps could transform neuroscience The infuriating, expensive road to a good night's sleep AlphaFold is running out of data — so drug firms are building their own version Meeting the energy challenge posed by data centres is central to a green future Innovative light-based methods for studying biology, using microscopy and image analysis from molecules to tissues in fields like medicine and biology We are looking for an experienced researcher to lead the Department of Antarctic Biology at IBB PAS. Largest brain map ever reveals fruit fly's neurons in exquisite detail How the world's biggest brain maps could transform neuroscience An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday. Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Humans were not thought to have reached and inhabited such small and isolated islands until the regional shift to Neolithic lifeways, around 7.5 thousand years ago (ka)1. In the standard view, the limited resources and ecological vulnerabilities of small islands, coupled with the technological challenges of long-distance seafaring, meant that hunter-gatherers were either unable or unwilling to make these journeys2,3,4. Here we describe chronological, archaeological, faunal and botanical data that support the presence of Holocene hunter-gatherers on the Maltese islands. At this time, Malta's geographical configuration and sea levels approximated those of the present day, necessitating seafaring distances of around 100 km from Sicily, the closest landmass. Occupations began at around 8.5 ka and are likely to have lasted until around 7.5 ka. These hunter-gatherers exploited land animals, but were also able to take advantage of marine resources and avifauna, helping to sustain these groups on a small island. The emergence of long-distance seafaring varies considerably around the globe, with an early appearance in Southeast Asia and Sahul seemingly not replicated until later in other regions, such as the islands off the African coast5,6,7,8,9. With a sea crossing of around 100 km from Sicily, and around three times as far to the Maghreb, the Maltese Archipelago is among the most remote groups of islands in the Mediterranean, the world's largest inland sea (Fig. Sea-level rise rapidly submerged the low-lying, now around 95 m deep, hypothesized land bridge from Malta to Sicily around 13 ka. Over the next few thousand years, both Sicily and the Maltese islands reached their current configurations, with Malta now having a combined landmass of just 316 km2 (ref. Like other small Mediterranean islands, and particularly given its semi-arid climate, Malta was inferred to have been too small and remote to support human populations before the adoption of farming and more advanced seafaring technology (see Supplementary Information 1 for discussion). The general consensus has been that hunter-gatherers only journeyed to Mediterranean islands that were large, and/or easy to reach, such as through chains of connecting islands, proximity to the mainland or favourable currents1,2,11 (Supplementary Information 1). Bottom left, digital elevation model of Latnija, showing the current dripline in dashed lines. Bottom right, the site, showing the sea channel and Gozo in the background, with past sea levels based on a previous study49. LGM, Last Glacial Maximum; MASL, metres above sea level. Previous research has supported this view, with the evidence suggesting that the first people to reach Malta were Neolithic farmers, associated with impressed ware pottery, stemming from the Sicilian ‘Stentinello' phase of the Neolithic12,13,14. These farmers were assumed to have introduced crops and domesticated and commensal fauna into a pristine island ecosystem14. The directly dated and secure evidence for the start of the Neolithic in Malta indicates an age of around 7.4 ka (ref. It is also consistent with our own chronological model (Methods, Supplementary Information 1 and Extended Data Fig. 1) based on an extensive database of radiocarbon dates with good contextual information, indicating that the earliest Neolithic in southern Italy and Sicily dates to around 7.9–7.5 ka, and later in Malta at around 7.4–7.1 ka. Although occasional claims for an earlier Neolithic in both Sicily and Malta have been suggested, they are problematic because of radiocarbon dates and age models with high levels of uncertainty, in addition to being inconsistent with the regional chronology mentioned above (Supplementary Information 1 and Extended Data Figs. Although claims of a far earlier Pleistocene human presence on Malta have also been made17,18, they have so far failed to stand up to scrutiny on morphological and chronological grounds (Supplementary Information 1). Here we provide decisive evidence for a pre-Neolithic human presence on the Maltese islands, in the form of a previously unknown Mesolithic phase characterized by the presence of Holocene hunter-gatherers. This discovery casts new light on the age and extent of Mesolithic sea crossings in the Mediterranean, and on hunter-gatherer interactions with endemic island fauna. Joint investigations led by the Max Planck Institute of Geoanthropology and the University of Malta have unravelled a deep archaeological sequence at the site of Latnija (Lat-nee-yuh). The site is located in a large doline in the Mellieħa area of northern Malta (Fig. 1), in the vicinity of several fresh water sources and close to a coastline that has both sandy beaches and rocky shorelines19. Crucially, they also reveal the longest sea crossing yet documented in the Mediterranean by hunter-gatherers, highlighting the considerable seafaring abilities of late European hunter-gatherers. Even in the subsequent Neolithic, there are only occasional indications of such long sea crossings in the Mediterranean20. Our findings upend the established notions that small and remote islands were beyond reach in the Mesolithic world. We divided the trench plan into an alphanumeric grid of 1-m2 squares (J–N, 2–6; Fig. 1) and recorded the position of all artefacts and bones larger than 20 mm in three dimensions using a total station. We describe the excavated sequence in six phases (labelled Phase I–VI from top to bottom), combining distinct differences in depositional processes (Supplementary Information 2) and material culture. The base of our excavated sequence (Phase VI; Fig. 2, Beds 15–13) comprises a naturally formed fine-grained cave sediment, pale orange to pink in colour (dominated by fine sands and silts), on top of sloping boulders. The character of the deposits in Phase VI is in stark contrast to that of the deposits that overlie them in Phases V–III, in which the presence of ash, fauna and shell-rich sediments presents conspicuous evidence for anthropic activity, which we refer to as the Mesolithic Horizon (Fig. Top, illustration of the key stratigraphic sequence (numbered Beds are described in Supplementary Information 2) highlighting a thick bed of ash (A; bottom left), and a hearth deposit or combustion structure (B; bottom right), with combustion residue (ash on top), thermal impact zone and a natural substrate (Supplementary Information 3), at the base of the Mesolithic Horizon. Note also the Phorcus turbinatus tip line, starting in the mid-right of box A. The earliest Mesolithic deposit in Phase V (Fig. 2, Beds 12–10) is marked by discrete hearth features, overlain by a bed of grey ash-rich sediment of varying thickness, a rich faunal assemblage and stone tools. The lowest hearth, from N2 (Phase V; Fig. This includes a heterogeneous light-grey ashy combustion residue of varying thickness (typically 6–12 cm) overlying a homogeneous brown-to-black thermally impacted sediment, which are distinct from the underlying Phase VI substrate (Fig. Fourier transform infrared spectroscopy (FTIR) analyses confirmed that combustion residues are composed mainly of pyrogenic calcite (ash), with some thermally altered clay, which might have been introduced between burning episodes; more limited pyrogenic alteration is evident in the thermal impact zone (Supplementary Information 3). Analyses showed that higher concentrations of phytoliths were present in the same samples in which ash has been documented, compared with other parts of the combustion structure (thermal impact) and control samples (Supplementary Information 3). This indicates that the phytoliths (described below) reached the site as a result of an anthropic contribution in the form of fuel or related to use of the combustion structure. Micromorphological analysis of the sediments directly below the hearth feature in N2 show evidence for enhanced reddening and enrichment in iron oxides relative to the natural cave floor sediments (Methods and Supplementary Information 2). In L2 and M2, micromorphology and detailed sediment analysis indicate a more complex relationship between deposits that are rich in combustion products and the sediments that underlie them; some show evidence for in situ burning, whereas others indicate erosion or cutting into the underlying sediments and the localized remobilization and redistribution of ash-rich materials (Supplementary Information 2). The onset of episodes of cave-wall collapse is observed at the top of this ash-rich deposit, marked by a clast-dominant layer closer to the cave wall, grading to finer sediments beyond the dripline, also containing fauna and artefacts (Phase IV; Fig. A subcircular pit (Phase III; Fig. 2, Beds 9–5) has been dug through this layer, truncating the top of Phase V deposits; this pit contains discrete dumps of marine shells and ashes (Fig. The Mesolithic Horizon is sealed by more conspicuous episodes of cave-wall collapse, including both clast- and matrix-dominated cave sedimentation that contains artefacts attesting to later prehistoric, historic (Phase II; Fig. 2, Beds 4–1) and modern (Phase I) occupations. We selected samples for chronometric dating to constrain the age range of key sedimentary deposits, the boundaries of major sediment phases and the shells of edible marine gastropods (Phorcus turbinatus; n = 49) accumulated by humans. A total of 32 dates (obtained using accelerator mass spectrometry) on charcoal were used to constrain the different phases at the site. One additional date was also recovered on bone, whereas insufficient collagen meant that all other attempts to date bone failed (Supplementary Information 4). These dates were then calibrated to estimate the boundaries between depositional phases with a Bayesian phase model (Fig. The P. turbinatus shells were corrected for the marine reservoir effect (MRE) and calibrated ages were calculated (Methods and Supplementary Information 4). The P. turbinatus shells range from around 8.6 ka to 7.5 ka, supporting the charcoal age model. Crucially, the limited variability of these shell ages supports the intact stratigraphic character of the Mesolithic Horizon, a feature particularly visible in the conspicuous tip lines in Phase III (Fig. Laboratory codes are included in the left box. A total of 64 lithics (knapped stone tools) were recovered from the Mesolithic Horizon (Phase V–III) deposits (Supplementary Information 5). Except for one chert artefact, all stone tools were made of limestone, much of which was clearly procured in the form of beach cobbles or pebbles, with the remainder sourced from terrestrial outcrops. This contrasts with younger, Neolithic, assemblages from Malta, which are made from chert (both local and imported) and small amounts of imported obsidian14,23,24. Cores, blades and bladelets and retouched tools are rare in the Latnija Mesolithic assemblage, which is instead focused on simple flakes produced by hard hammer percussion. The main reduction products were squat and often cortical flakes, with generally unidirectional dorsal scar patterns. In contrast to penecontemporaneous assemblages from Sicily and other adjacent areas, which generally exhibit complex technologies and geometric forms (for example, trapezes), the lithic material from Latnija most resembles relatively expedient Mesolithic lithic technology from Sardinia25 (Supplementary Information 1 and 5). The simple character of the Latnija lithic assemblage might reflect the poor quality of the limestone used and expediency, but could also reflect other factors, including demographic aspects, such as small population size and isolation. A total of 955 piece-plotted specimens (larger than 20 mm) from the Mesolithic Horizon were recorded during the 2021 and 2022 seasons, in addition to many smaller fragments recovered during sieving and flotation (Supplementary Information 6). The fauna is all wild, and overall is dominated by red deer (Cervus elaphus), birds and marine gastropods (P. turbinatus in particular, but also limpets), with the latter so far comprising some 10,000 shells (Fig. Small numbers of reptiles (for example, turtles and tortoises), fish (for example, groupers), crustaceans (crabs), echinoderms (sea urchins) and marine mammals (seals) were also found (Fig. In line with the extensive evidence for anthropic combustion, around 25% of taphonomically studied faunal remains, including those of red deer, birds and tortoises, as well as the marine gastropods, had evidence of burning or charring (Supplementary Information 6). Although a detailed taphonomic analysis is ongoing, other traces of anthropogenic activity can also be observed, including probable percussion notches and green fracturing. a–q, Selected fauna and lithics from the Mesolithic Horizon. Limestone flakes (a–d,f,g), red deer left mandible (e), and metatarsal (q), Phorcus turbinatus (h), Patella sp. (i), crab claw (k), turtle or tortoise carapace (l,p), fish vertebra (m), seal proximal phalanx (n), bird humerus (j) and coracoid (o). Scale bar is 50 mm and applies to all. This includes terrestrial animals and marine mammals. s, Percentage of the number of identified specimens (NISP) of fish and marine invertebrates from squares L2 and N2 recovered during wet sieving and flotation. The use of marine resources, including not only small gastropods and crustaceans, but also large marine mammals, matches well with subsistence behaviours observed at other Mesolithic sites in the Mediterranean26,27,28. Notably, studies of Neolithic and younger sites in Malta have uncovered little evidence for marine resource exploitation—and archaeological and isotopic studies suggest that people had diets that were focused mostly on terrestrial resources, including livestock and wild and domesticated plants15,29. The Mesolithic deposits at Latnija therefore represent a unique level of marine resource engagement in Malta and a substantially different diet to that of later, farming communities. Archaeobotanical analyses were further used to understand the environmental context of the Mesolithic Horizon. Grasses were abundant and are represented by many different phytolith morphologies (Supplementary Information 3). Most of the grasses correspond to C3 types, although phytolith morphologies ascribed to C4 plants are also present. Pollen analysis of two samples from Phase V provides evidence of an open shrub vegetation consisting of Erica multiflora and Euphorbia melitensis, with patches of Pistacia lentiscus shrub communities occupying areas in which higher moisture levels were present and soil development occurred. Macrobotanical samples were recovered from the systematic flotation of sediments (Methods). Seeds of a few small, wild herbaceous plants were identified, including a small-seeded grass (Poaceae), small-seeded legumes and seeds of a member of the Chenopodioideae, as well as Mercurialis cf. All of these plants grow wild on Malta today and might have been introduced to the site either by the burning of brush or through natural processes, such as the activity of rodents or birds, and inadvertently burned with cave sediments. Complementing the phytolith and macrobotanical data, the charcoal analyses reflect a shrubby vegetation adapted to the island environment, and characterized by an open scrubland dominated by Pistacia cf. lentiscus, Juniperus and Tetraclinis among other shrubs, similar to the present day. These data together indicate the presence of vegetative communities typical of the Early to Middle Holocene in the Central Mediterranean region, which have been linked to the onset of more humid climate conditions30,31,32. These observations were further complemented by isotopic analyses of ungulate and rodent teeth from the site (Methods and Supplementary Information 3), which indicate a fairly stable mixture of dry C3 grassland, scrubland and woodland. This probably corresponds to the indigenous Chamaerops humilis (Mediterranean fan palm)33. Chamaerops humilis and other palms have a wide range of uses, ranging from textiles to construction materials and food, among others. However, the greater presence of these morphotypes in the samples related to the combustion residue seems to indicate that they were also used as fuel. Anthracological analyses revealed that the most common fuel was Pistacia cf. Wild seeds of grasses, and a few other low-growing herbaceous plants, were recovered in a carbonized state, either representing the burning of vegetation around the site or the construction of a hearth on top of seed-laden sediments. The evidence from Latnija confirms a Mesolithic occupation of the Maltese islands spanning from around 8.5 ka to 7.5 ka, which differs markedly from younger, agro-pastoral societies in technology, raw materials, diet and subsistence practices. The earliest Mesolithic arrivals on what we presume were dugout canoes, date to a time when Malta had almost reached its current configuration, which today has a minimum straight-line distance of around 85 km to Sicily34,35. However, sea surface currents and prevailing winds, as well as the use of landmarks, stars and other wayfinding practices, mean that the distances traversed by hunter-gatherers to Malta could have been considerably longer, and a crossing of about 100 km has been proposed for the Neolithic36,37,38,39 (Supplementary Information 1). In particular, any crossing from Sicily to Malta would have had to contend with the ocean current dynamics in the Malta Channel40. Experimental voyages on a replica of an Early Neolithic dug-out canoe from La Marmotta (Italy) suggest that crossings of 50 km could be accomplished at a speed of about 4 km h−1 (just over 2 knots)41, implying an outward summer sea journey that would have necessitated all daylight hours and an additional 8 h of darkness. In the summer, the drift caused by a southeasterly current that goes up to as much as 2 knots would have extended this outward journey even further42. In antiquity, as well as more historic periods, these conditions seem to have led sailboats to prioritize ports along the Gulf of Gela as a point of departure from Sicily, rather than the closest point to Malta43. These findings therefore provide evidence of long-distance, open-water sea journeys that were far longer than any previously documented in the Mediterranean, before the Neolithic and Bronze Age, when developments such as the invention of the sail occurred1. Such inter-island crossings fall into the category of ‘difficult routes'; evidence from elsewhere suggests that canoers would avoid the dangers of voyaging at night altogether44. The motivation for these long sea crossings remains ambiguous. It might be that movement to Malta was driven by the availability of (perhaps seasonal) subsistence resources, catalysed by the slightly improved climate of the Early Holocene. These are both important in their own right, and also set the cultural and ecological scene for the transition to the Neolithic. The ability of Mesolithic hunter-gatherers to reach small and remote Mediterranean islands forces a re-evaluation of the capabilities and strategies of the last hunter-gatherers of the region. It also shows that Neolithic arrivals did not enter a pristine insular landscape on Malta, but rather an ecosystem that had been shaped by humans for centuries. Finally, the presence of Mesolithic hunter-gatherers on Malta raises the possibility of other long-distance connections. For example, the technological similarities between contemporary Mesolithic and Epipaleolithic communities on the African and European sides of the Mediterranean have been noted45,46,47. The combination of several islands, and their proximity to indented mainland shorelines, has also suggested that the south-central Mediterranean and eastern Maghreb could have been a hub for early maritime activity in the region48. The methods used are described below, with further contextual information in the Supplementary Information and Extended Data Figs. Here, we describe the excavation of a 5 × 5-m trench, designated Trench 4, at Latnija between 2021 and 2023, expanding on a 1 ×1-m test trench excavated in 2019. We set up an alphanumeric grid system in the doline to label each individual 1 ×1-m square, aligned in orientation with the 2019 test trench and with the nearby cave wall, with letters running on a SW–NE axis and numbers increasing on a NW–SE axis. Excavation was performed using a single-context recording methodology to resolve between discrete sediment units, with arbitrary subdivisions within a single deposit as 5–10-cm spits where necessary to aid control of find recovery and sediment sampling. Features of post-depositional disturbance, such as animal burrows, were readily differentiated from undisturbed sediments owing to their mixed character and friable texture and the presence of sediment voids, and were excavated in their entirety and excluded from our analyses. Finer-scale post-depositional disturbance occurs as limited fine rooting and is restricted to the uppermost deposits. The natural deposition of clasts from the shelter wall presents an alternate form of potential post-depositional disturbance that might have led to localized soft-sediment deformation. The three-dimensional position of all artefacts larger than 20 mm, bones larger than 20 mm and charcoal, and the geometry of excavation context boundaries, were recorded using a total station. Bulk sediment sampling retained a minimum of 60 l per context (predominantly in the uppermost deposits) up to 100% sampling of sediments, which were processed by bucket flotation using 250-µm mesh for macrobotanical recovery, followed by wet sieving through 5-mm screens for artefact recovery; sediments that were not retained for flotation and wet sieving were dry sieved through 5-mm screens. Additional sediment samples were recovered from each context for ancillary analyses. So far, we have identified 309 discrete sedimentary contexts, reaching a maximum depth of 1.48 m from the surface. We have grouped contexts into six phases (Phases I to VI) on the basis of major changes in sediment colour, texture, composition and structure, alongside patterns evident in material culture. The stratigraphic matrix for the Mesolithic Horizon and immediately underlying deposits is presented in Extended Data Fig. The laboratory samples were air-dried for two weeks and placed in labelled plastic pots. The samples were immersed in a mixture of clear casting resin (four parts) to acetone (one part). To accelerate curing, a catalyst of methylethylketone peroxide was added (3 ml catalyst to 2,000 ml resin). The samples were impregnated under a stepped-vacuum regime to a maximum vacuum pressure of −25 in Hg for eight hours. The samples were left to cure for around six weeks until the resin was hardened, followed by a final cure at 65 °C for 15 h. The blocks were removed from the sample frame, split along their long axes and one surface polished on fixed diamond abrasives with successively finer grades (70 µm, 45 µm and 20 µm). The polished sample was stuck to a labelled slide using an epoxy resin that cures overnight. The slide and sample were cut down to around 1 mm and then excess sample was removed using a Jones and Shipman surface grinder. Analysis of the thin sections was performed on a Leica M205C petrological stereo zoom microscope and image capture was done using the Image Pro-Express software. Studies of plants in the Mesolithic Horizon at Latnija were performed in the form of pollen analyses, anthracology, hearth phytolith analyses and macrobotanical identifications from remains recovered through flotation. These analyses were performed to reconstruct the vegetation of the site, determine whether any domesticated plants were present, investigate the use of different fuels at the site and unravel mineral composition to identify combustion structures. For pollen analysis, we collected sedimentary samples to perform palynological analyses focused on the reconstruction of past vegetation at and near Latnija. Sampling was performed in Phase V contexts (034) and (048), both of which are characterized by the presence of thick ash and combustion residue deposits. This approach was adopted to correlate the palaeobotanical remains preserved in the sediment with human activities during Phase V, which is characterized by the oldest Mesolithic. Samples were treated following pollen concentration techniques52. This included sediment deflocculation with sodium pyrophosphate, Lycopodium tablets with known content to calculate palynomorph concentration values53 and 7-µm nylon sieve to discard clay-sized particles. Carbonates were removed with 10% HCl and concentrated at 2,500 rpm for three minutes. Heavy liquid separation using sodium metatungstate with a specific gravity of 2.0 and centrifugation at 1,500 rpm for 20 min was done to separate organic and mineral fractions. After recovering the upper supernatant fraction, this step was repeated to increase the concentration. The remaining fraction was treated with cold 40% HF for one night to eliminate remaining silicates. The solution was kept in glycerol, mounted on slides and identified at 400× magnification under a light-transmitted microscope by referring to established literature54,55. Pollen counts were done up to 250 identifiable grains. A pollen diagram (Extended Data Fig. 4a) indicating values for each taxon as percentages of the total pollen sum was plotted with the help of C2 software56. For anthracological analyses, bucket flotation was used to recover charcoal and other carbonized archaeobotanical remains from the sediments, all of which were collected. Charcoal was also handpicked to provide a larger number to select for dating purposes and anthracology. Images were taken with an environmental scanning electron microscope (FEI Quanta 600) coating charcoal with gold. Each charcoal piece was manually fragmented into the three wood anatomy sections (transverse, tangential and radial). Observing the three anatomy sections allowed us to identify taxonomic characters. Different wood anatomy atlases and a comparative collection at the Catalan Institute of Human Paleoecology and Social Evolution were used to support the identifications57,58. The assemblage is characterized by a number of indeterminable fragments related to wood anatomy alterations (cracks and vitrification) and/or size of the fragments. We analysed 24 samples that were collected during the 2022 fieldwork from a large combustion structure identified in square N2 at the base of Phase V (Fig. Phytoliths were extracted following the fast extraction method59. Phytolith quantification and identification was done using a Zeiss Axioscope transmitted light microscope at ×200 and ×400 magnifications. Phytolith morphological identification followed the standard literature and modern plant reference collections33,60,61,62. We followed the terminology of the International Code for Phytolith Nomenclature (ICPN 2.0) for phytolith descriptions63. Infrared spectra were collected in the 4,000–400 cm−1 wavelength range at a resolution of 4 cm−1 using the conventional KBr pellets method. Thermally altered clay was identified on the basis of specific absorption peaks in the clay spectrum65, and the presence of anthropogenic or geogenic calcite was determined following previous studies66,67. The archaeobotanical samples from Latnija's Mesolithic Horizon were recovered from the 2021 and 2022 excavation seasons. Although we engaged in a 100% sediment collection strategy, after flotation, not all samples from these phases contained plant macrofossils. Each sample was processed in the field using a basic bucket flotation method, as described previously68,69. The samples were then sent to the Max Planck Institute of Geoanthropology in Jena, Germany, for analysis. Once in the laboratory, samples were passed through nested U.S. Geological sieves to ease sorting. Material smaller than 0.50 mm was not sorted. Carbonized wood fragments larger than 2 mm were counted, although wood identification was done as a separate analysis and is reported above. Seeds and seed fragments were separated from all sieved contexts, and charred seeds were systematically collected. The identified taxa are presented in Supplementary Table 1. Except for the bone samples, radiocarbon dating was performed at the Curt-Engelhorn-Centre Archaeometry (CEZA) in Mannheim, Germany. Samples included charcoal, seeds and marine shells. Bone samples were analysed at the University of Georgia Centre for Applied Isotope Studies (CAIS). We used a multistep chronological study to clearly constrain the Mesolithic Horizon at Latnija. First, we constructed a chronological framework for the site, which involved 31 charcoal samples and the one bone (Supplementary Tables 2 and 3). Charcoal samples were selected from contexts directly underlying the Mesolithic Horizon to help constrain the onset of Mesolithic occupation, excluding samples from burrows that appear at the interface of major divides in sediment depositional processes (Phases VI–V) (see Extended Data Fig. In addition, charcoal samples were selected from contexts throughout the Mesolithic Horizon (Phases V–III), including direct sampling from hearths that appear at the base of Phase V (Fig. 2, Supplementary Information 2 and Extended Data Fig. To obtain independent verification of the integrity of the age model, we also targeted marine gastropods (P. turbinatus in particular) because they formed clear in situ tip lines identified in Phase III. Forty-nine samples of P. turbinatus were dated for this purpose. The number of samples was chosen to reflect the fact that: (i) marine calibration is more complex than terrestrial calibration, thus a larger sample size was required to account for the natural spread in the data; and (ii) these shells are a direct measure of human presence, because they have been imported to the site by people. Charcoal samples were prepared using a standard ABA pretreatment. This covers an acid step with diluted hydrochloric acid to remove calcite and lime attached to the sample. A base step with diluted sodium hydroxite follows to remove soluble humic acids. As the base attracts fresh CO2, another acid step finalizes the pretreatment and removes any modern contamination. The samples are then combusted in an elemental analyser (MicroCube, Elementar) and the CO2 is collected and graphitized to elemental carbon. The shell samples only undergo a treatment with diluted acid to remove adjacent carbon contamination from limestone or calcite. For shell samples, the CO2 is extracted using phosphoric acid in an autosampler before graphitization, and measurements are the same as for the charcoal samples described in a previous study71. The bone sample was cleaned by wire brush and washed using an ultrasonic bath. After cleaning, the sample was then reacted under vacuum with 1 M HCl to dissolve the bone mineral and release CO2 from bioapatite. The residue was filtered, rinsed with deionized water and, under slightly acid conditions (pH 3), heated at 80 °C for six hours to dissolve collagen and leave humic substances in the precipitate. The collagen solution was then filtered to isolate pure collagen and dried out. The dried collagen was combusted at 575 °C in evacuated and sealed Pyrex ampoules in the present CuO. The resulting CO2 was cryogenically purified from the other reaction products and catalytically converted to graphite. Graphite 14C/13C ratios were measured using the CAIS 0.5 MeV accelerator mass spectrometer. The error is quoted as one standard deviation and reflects both statistical and experimental errors. The date has been corrected for isotope fractionation. As with other terrestrial radiocarbon dates in this study, calibration was performed with OxCal 4.4 using IntCal20 and as part of a phase model for the site. It was modelled using a phase model and choosing a wide restriction for ΔR. Samples marked by OxCal as outliers are presented in the table but are not included in the next modelling step if the model cannot deal with them leading to an A of less than 60%. These outliers might reflect processes such as bioturbation. The results of the MRE calculations are shown in Supplementary Table 4, and the corrected dates for each P. turbinatus age are shown in Supplementary Table 5 (see also Supplementary Information 4). Models involving radiocarbon dates were used to address the key question of whether there is evidence of occupation in the Latnija cave excavation sequence that securely relates to human activity predating the available evidence for Neolithic habitation elsewhere on Malta and in the surrounding Mediterranean archaeological record. This was done by: (1) establishing the age of the Mesolithic deposits at Latnija; (2) determining when the wider regional Mesolithic-to-Neolithic transition is most likely to have occurred; and (3) determining whether there is evidence for an early Neolithic occupation of Malta in in a sediment core extracted from Salina Bay in northeast Malta, while accounting for the high-energy depositional environment and chronological uncertainty associated with radiocarbon dates used to produce associated age–depth models. Each of the analyses was conducted in R and is fully replicable, with scripts, data and outputs contained in a GitHub repository along with further replication instructions (https://github.com/wccarleton/mesoneomalta). First, we used a standard archaeological phase model to determine start and end boundaries for major depositional phases identified at Latnija. For this model, the excavation team constructed a general Harris matrix relating different contexts to major phases of sediment deposition and artefact accumulation. Thirty-three radiocarbon samples—charcoal from short-lived local shrubs and one bone—recovered from these units were then dated and the dates were placed into an OxCal phase model to estimate phase boundary distributions. All phase boundaries were of the ‘sigma' type. This boundary allows the tails of the distribution of events (dates) making up abutting phases to overlap. Following previously published guidance72, we included a general outlier model along with the phases, allowing for the model to identify potential outliers (events with extreme dates relative to both their phases and the structure of the model as a whole). Next, we used a cleaned regional database of radiocarbon dates associated with securely identified Mesolithic and Neolithic sites or site components from Italy, Sicily, Corsica, Sardinia and Malta. We divided the dates by region and cultural association. Then, we used a simple OxCal phase model to estimate when the Mesolithic phase ended and the Neolithic phase began in each of the regions (details in Supplementary Information 4). This flexibility reflects the fact that both phases refer to cultural traditions or packages that are known to have overlapped in space and time throughout the Mediterranean and that have well-established spatio-temporal trends. The core was argued to contain evidence for an early Neolithic in Malta, because it contains findings such as the pollen of domesticated cereals, which was estimated to date to around 8 ka on the basis of an age–depth model. However, the age–depth model used (Bchron), like many sophisticated sedimentation models, assumes monotonicity in the age–depth relationship, which we argue does not apply in the Salina Deep case. Although monotonicity is typically a good working assumption in low-energy depositional environments without evidence of disturbance, Salina Bay in the past and present is a high-energy littoral and fluvial environment that is subject to frequent storms. The core itself contains evidence of marine ingression and many of the radiocarbon dates indicate substantial sediment redeposition, with very old dates near the surface and segments showing a wide radiocarbon temporal spread. Together, this evidence suggests that monotonicity is a poor assumption for Salina Deep and, consequently, that the published age–depth model is overly (unduly) precise because it cannot account for the wide variance in radiocarbon sample dates for many of the core's segments. To account for this, and produce a model that is more representative of the empirical temporal variance, we used a linear Bayesian regression to model the age–depth relationship. The model recognizes a general relationship in the available age–depth observations indicating a trend toward older dates correlated with depth. However, it also does not assume strict monotonicity, instead focusing on the broad age–depth relationship. We used a custom distribution (based on standard radiocarbon-date calibration) to add a measurement uncertainty component to the model, representing radiocarbon dating and calibration uncertainties. We also used Bayesian imputation to model dates with full posterior uncertainty for a sequence of undated sediment depths (see Supplementary Information 1 for further details). During the 2021 and 2022 field seasons, faunal remains greater than 20 mm in length were piece plotted using a total station, given a unique identifier and bagged. Smaller bone fragments, shells and other faunal remains were recovered through various methods, including an exhaustive programme of wet sieving, flotation and manual inspection of 8-mm, 4-mm, 2-mm, 1-mm and 0.5-mm sieved sinks under microscopy. Here we present a preliminary taxonomic and taphonomic analysis of this faunal material, but note that a full detailed analysis is currently underway that comprises all remains recovered during excavation. Bones were identified to skeletal element and, for the most part, to broad taxonomic categories (for example, fish and birds), facilitated by relevant literature73,74,75,76, online resources and comparative material housed at the University of Malta. The taphonomic analysis focused on identifying bone fractures and surface modifications, such as burning, butchery marks (such as cut marks) and carnivore damage (for example, gnawing) following standard protocols77,78,79,80. Remains are reported as the number of specimens (NRSP) and number of identified specimens (NISP), following a previous report81. NRSP includes all skeletal remains (bones and teeth) included in this study, whereas NISP is defined as all skeletal elements (bones and teeth) identified minimally to class. In addition to the piece-plotted bone, we also report here the complete counts of marine fauna for two excavation squares (L2 and N2), reflecting material that was directly recovered and bagged during excavation and material from wet sieving and flotation. Given the very different sediment volumes exposed for the different phases, we chose here to focus at first on these two squares, which offer a good sequence through the phases, to showcase the marine component at the site. Nineteen samples, comprising 12 wood mouse (Apodemus sylvaticus) and 7 red deer (Cervus elaphus), were selected for δ13C and δ18O isotope analysis of tooth enamel (Supplementary Table 7). For red deer, molar teeth were targeted for analysis, although the sample set does include one red deer premolar tooth. It should also be noted that because some of these samples are non-overlapping teeth, it is possible that some pseudo-sampling (sampling from the same individual) took place. For wood mouse, whole molar and incisor teeth were used to ensure that the minimum sample size for stable isotope analysis was met. Before sampling, red deer were cleaned through gentle abrasion with a diamond-tipped drill to remove any adhering material. For wood mouse, as much of the dentine was removed as possible using a drill before the remaining whole teeth were crushed using a mortar and pestle, with cleaning of the mortar and pestle using 70% ethanol between samples. To remove organic or secondary carbonate contaminates, all samples underwent pretreatment, which involved soaking in 0.1 M acetic acid for 10 min followed by three rinses in purified water82,83. After reaction with 100% phosphoric acid, gasses were analysed using a Thermo GasBench II connected to a Thermo Delta V Advantage mass spectrometer housed at the Department of Archaeology at the Max Planck Institute of Geoanthropology. Carbon and oxygen isotopes are reported as the ratio of heavier to lighter isotopes (13C/12C or 18O/16O) in parts per million (‰) relative to international standards (Vienna Peedee Belemnite, VPDB). δ13C and δ18O values were normalized using a three-point calibration against the international standards IAEA-603 (δ13C = 2.5‰, δ18O = −2.4‰), IAEA-CO-8 (δ13C = −5.8‰, δ18O = −22.7‰) and IAEA NBS 18 (δ13C = −5.014‰, δ18O = 23.2‰), as well as the in-house standard of USGS44 (δ13C = −42.2‰). Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. Data required to reproduce the chronological models are available at https://github.com/wccarleton/mesoneomalta, and are archived with Zenodo at https://doi.org/10.5281/zenodo.14192393 (ref. Code required for reproducing the chronological models is available at https://github.com/wccarleton/mesoneomalta, and is archived with Zenodo at https://doi.org/10.5281/zenodo.14192393 (ref. Broodbank, C. The origins and early development of Mediterranean maritime activity. Cherry, J. F. & Leppard, T. P. Patterning and its causation in the Pre-Neolithic colonization of the Mediterranean islands (Late Pleistocene to Early Holocene). Clarkson, C. et al. Human occupation of northern Australia by 65,000 years ago. Gaffney, D. Pleistocene water crossings and adaptive flexibility within the Homo genus. Global patterns in island colonization during the Holocene. Mitchell, P. African Islands: A Comparative Archaeology (Routledge, 2022). O'Connor, S., Kealy, S., Reepmeyer, C., Samper Carro, S. C. & Shipton, C. Terminal Pleistocene emergence of maritime interaction networks across Wallacea. Foglini, F. et al. in Geology and Archaeology: Submerged Landscapes of the Continental Shelf (eds Harff, J. et al.) 77–95 (Geological Society of London, 2015). Leppard, T. P. Process and dynamics of Mediterranean neolithization (7000–5500 BC). Evans, J. Prehistoric Antiquities of The Maltese Islands: A Survey (Althone Press, 1971). Malone, C. et al. Temple Places: Excavating Cultural Sustainability in Prehistoric Malta (McDonald Institute for Archaeological Research, 2020). & Malone, C. in Temple Places: Excavating Cultural Sustainability in Prehistoric Malta (eds Malone, C. et al.) 27–38 (McDonald Institute for Archaeological Research, 2020). Binder, D. et al. Modelling the earliest north-western dispersal of Mediterranean impressed wares: new dates and Bayesian chronological model. Discovery of Neanderthal man in Malta. & Mifsud, S. Dossier Malta—Evidence for the Magdalenian (Proprint, 1997). Natura 2000 Management Planning for Terrestrial Sites in Malta and Gozo https://era.org.mt/topic/natura-2000-management-planning-for-terrestrial-sites-in-malta-gozo (2016). Freund, K. P. & Batist, Z. Sardinian obsidian circulation and early maritime navigation in the Neolithic as shown through social network analysis. The black layer of Middle Palaeolithic combustion structures. Aldeias, V. Experimental approaches to archaeological fire features and their behavioral relevance. Vella, C. Manipulated connectivity in island isolation: Maltese prehistoric stone tool technology and procurement strategies across the fourth and third millennia BC. Groucutt, H. S. Maltese chert: an archaeological perspective on raw material and lithic technology in the central Mediterranean. Lo Vetro, D. & Martini, F. Mesolithic in Central–Southern Italy: overview of lithic productions. Mannino, M. A. et al. Climate-driven environmental changes around 8,200 years ago favoured increases in cetacean strandings and Mediterranean hunter-gatherers exploited them. Starkovich, B. M., Munro, N. D. & Stiner, M. C. Terminal Pleistocene subsistence strategies and aquatic resource use in southern Greece. Yu, H. et al. Genomic and dietary discontinuities during the Mesolithic and Neolithic in Sicily. McLaughlin, R. et al. in Temple People: Bioarchaeology, Resilience and Culture in Prehistoric Malta (eds Stoddart, S. et al.) 295–302 (McDonald Institute for Archaeological Research, 2022). Tinner, W. et al. Holocene environmental and climatic changes at Gorgo Basso, a coastal lake in southern Sicily, Italy. Calò, C. et al. Spatio-temporal patterns of Holocene environmental change in southern Sicily. Djamali, M. et al. Vegetation dynamics during the early to mid-Holocene transition in NW Malta, human impact versus climatic forcing. García-Granero, J. J. et al. A long-term assessment of the use of Phoenix theophrasti Greuter (Cretan Date Palm): the ethnobotany and archaeobotany of a neglected palm. Caruso Fermé, L., Mineo, M., Remolins, G., Mazzucco, N. & Gibaja, J. F. Navigation during the early Neolithic in the Mediterranean area: study of wooden artifacts associated with dugout canoes at La marmotta (Lago di Bracciano, Anguillara Sabazia, Lazio, Italy). & Gildor, H. Neolithic voyages to Cyprus: wind patterns, routes, and mechanisms. The orientations of prehistoric temples in Malta and Gozo. Bedford, S. & Spriggs, M. Debating Lapita: Distribution, Chronology, Society and Subsistence (ANU Press, 2019). Reyes-Suarez, N. C. et al. Sea surface circulation structures in the Malta–Sicily Channel from remote sensing data. Heikell, R. Mediterranean Cruising Handbook (Imray, Laurie, Norie & Wilson, 1998). Tanasi, D. & Vella, N. C. in The Cambridge Prehistory of the Bronze and Iron Age Mediterranean (eds Knapp, A. Do stormy seas lead to better boats? Tixier, J. Typologie de L'Épipaleolithique Du Maghreb (Arts et Métiers Graphiques, 1963). Laplace, G. Recherches Sur l' Origine et l'Évolution Des Complexes Leptolithiques (De Boccard, 1966). Broodbank, C. & Lucarini, G. The dynamics of Mediterranean Africa, ca. & Sambridge, M. Sea level and global ice volumes from the Last Glacial Maximum to the Holocene. The significance of intestinal parasite remains in pollen samples from medieval pits in the Piazza Garibaldi of Parma, Emilia Romagna, northern Italy. Stockmarr, J. Tablets with spores used in absolute pollen analysis. Leitfaden Der Pollenbestimmung Für Mitteleuropa Und Angrenzende Gebiete (Verlag Friedrich Pfeil, 2004). User guide, version 1.5 http://www.campus.ncl.ac.uk/staff/Stephen.Juggins/software/C2Home.htm (Newcastle University, 2007). Wheeler, E. A. InsideWood—a web resource for hardwood identification. Schweingruber, F. H. Anatomy of European Woods: An Atlas for the Identification of European Trees, Shrubs and Dwarf Shrubs (Verlag Kessel, 1990). Rapid phytolith extraction for analysis of phytolith concentrations and assemblages during an excavation: an application at Tell es-Safi/Gath, Israel. Piperno, D. R. Phytoliths: A Comprehensive Guide for Archaeologists and Paleoecologists (AltaMira Press, 2006). Twiss, P. C. in Phytolith Systematics: Emerging Issues (eds Rapp, G. & Mulholland, S. C.) 113–128 (Plenum Press, 1992). Neumann, K. et al. International code for phytolith nomenclature (ICPN) 2.0. Weiner, S. Microarchaeology Beyond the Visible Archaeological Record (Cambridge Univ. Berna, F. et al. Sediments exposed to high temperatures: reconstructing pyrotechnological processes in Late Bronze and Iron Age Strata at Tel Dor (Israel). Poduska, K. M. et al. Decoupling local disorder and optical effects in infrared spectra: differentiating between calcites with different origins. Regev, L., Poduska, K. M., Addadi, L., Weiner, S. & Boaretto, E. Distinguishing between calcites formed by different mechanisms using infrared spectrometry: archaeological applications. & Warinner, C. Method and Theory in Paleoethnobotany (Univ. Pearsall, D. M. Paleoethnobotany: A Handbook of Procedures (Routledge, 2016). & Wacker, L. MAMS—a new AMS facility at the Curt-Engelhorn-Centre for Achaeometry, Mannheim, Germany. The local marine reservoir effect at Kalba (UAE) between the Neolithic and Bronze Age: an indicator of sea level and climate changes. Bronk Ramsey, C. Dealing with outliers and offsets in radiocarbon dating. Schmid, E. Atlas of Animal Bones (Elsevier, 1972). Pales, L. & Lambert, C. Atlas Osteologique pour Servir a l'Identification des Mammiferes du Quaternaire (Centre National de la Recherche Scientifique, 1971). Lister, A. M. The morphological distinction between bones and teeth of fallow deer (Dama dama) and red deer (Cervus elaphus). & Lapham, H. A. Assessing the reliability of criteria used to identify postcranial bones in sheep, Ovis, and goats, Capra. Shipman, P., Foster, G. & Schoeninger, M. Burnt bones and teeth: an experimental study of color, morphology, crystal structure and shrinkage. Villa, P. & Mahieu, E. Breakage patterns of human long bones. & Andrews, P. Atlas of Taphonomic Identifications (Springer, 2016). Sponheimer, M. et al. Hominins, sedges, and termites: new carbon isotope data from the Sterkfontein valley and Kruger National Park. Lee-Thorp, J. et al. Isotopic evidence for an early shift to C4 resources by Pliocene hominins in Chad. This article is dedicated to the memory of Christopher Foyle, who did so much to encourage, support and drive investigations into Maltese prehistory. Without him, this project would never have happened. We are grateful to K. Farrugia and K. Borda for their support over the years. We thank Heritage Malta, in particular S. Sultana, for assistance and support in this work; the students who assisted in excavation and flotation; B. Restall for advice regarding fresh water sources; H. Russ for assistance identifying grouper remains; and S. O'Reilly and H. Sell for assistance with the main-text figures. ), the European Research Council ‘IslandLab' project grant no. ), the Max Planck Society ‘Lise Meitner' Excellence Scheme (E.M.L.S. ), the University of Malta Research Excellence Award grant 202103 (N.C.V., R.G. ), the Spanish Ministry of Science and Innovation ‘María de Maeztu' program for Units of Excellence in R&D grant CEX2019-000945-M (E.A.) Open access funding provided by Max Planck Society. Human Palaeosystems Group, Max Planck Institute of Geoanthropology, Jena, Germany Eleanor M. L. Scerri, James Blinkhorn, Andrés Currás, Margherita Colucci, Johanna Kutowsky, Mario Mata-González & Khady Niang Eleanor M. L. Scerri, Huw S. Groucutt, Mario Mata-González, Nicolette Mifsud & Nicholas C. Vella Eleanor M. L. Scerri, Huw S. Groucutt & Andreas Maier Australian Research Centre for Human Evolution, Griffith University, Brisbane, Queensland, Australia Institut Català de Paleoecologia Humana i Evolució Social (IPHES-CERCA), Tarragona, Spain Departament d'Història i Història de l'Art, Universitat Rovira i Virgili (URV), Tarragona, Spain W. Christopher Carleton, Amy Hatton & Patrick Roberts Domestication and Anthropogenic Evolution Research Group, Max Planck Institute of Geoanthropology, Jena, Germany Superintendence of Cultural Heritage, Valletta, Malta isoTROPIC Research Group, Max Planck Institute of Geoanthropology, Jena, Germany You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar Correspondence to Eleanor M. L. Scerri, James Blinkhorn or Nicholas C. Vella. The authors declare no competing interests. Nature thanks Cyprian Broodbank, Dylan Gaffney, Carlos Duarte Simões, Sahra Talamo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. OxCal phase modelling of the estimated start and end dates for the Mesolithic and Neolithic phases in and around Malta125 using the IntCal20 terrestrial calibration curve55. Results indicate a general geographical cline in the spread of the Neolithic in mainland Italy from north to south, to Sicily, Sardinia, and Corsica, and then finally to Malta. See the OxCal script in https://github.com/wccarleton/mesoneomalta for specifics. Results show many potentially intrusive samples used to date the sediments and very few sequences of dates in strict stratigraphic order. b, Our Bayesian regression model to relate depth to age in the Salina Deep core using the IntCal20 calibration curve55 to calibrate the dates. The results suggest that the first Neolithic evidence in the Salina Deep record has a date with a broad error range of around two thousand years. Harris matrix of the Latnija excavation organized by phase illustrating the stratigraphic relationship between excavation contexts, with numbers shown in red indicating deposits containg dated material. a, Pollen diagram from Latnija archaeological site, with percentage values of pollen remains identified from contexts (034) and (048). b, The deep Mesolithic hearth from Phase V, square N2, with sample locations for phytolith and FTIR studies. The internal structure of the hearth can be observed, from top to bottom, combustion residue, thermal impact and natural substrate (control). c, ESEM images of Pistacia cf. charcoal remains showing wood anatomical characters: (i) Juniperus sp. charcoal fragment tangential section; (ii) Juniperus sp. charcoal fragment tangential section showing rays and tracheids; (iii) Juniperus sp. charcoal fragment tangential section showing a detail of tracheid pits; (iv) Pistacia cf. lentiscus charcoal fragment transverse section showing ring porous distribution and vessel clusters; (v) Pistacia cf. lentiscus charcoal fragment tangential section showing spiral thickenings and biseriated rays; (vi) Pistacia sp. charcoal fragment transverse section showing cracks and vitrification altering the wood cell structure. Oblique view illustrating the location of dated contexts (red) with respect to sediment phase boundaries spanning grid squares J–N, showing the distribution of sediments at the base of Phase II; the location of dated contexts from Phase III, with the upper boundary of Phase III deposits shown as a wireframe; the location of dated contexts from Phase IV, with the upper boundary of Phase IV deposits shown as a wireframe; the location of dated contexts from Phase V, with the upper boundary of Phase V deposits shown as a wireframe; and the location of dated contexts from Phase VI, with the upper boundary of Phase VI deposits shown as a wireframe. a, Section drawing of the Latnija exposure with the detailed section shown in Fig. The section was recorded in September 2024, note that some of the large clasts that were recorded in the September 2022 section (a) had been removed from the section by September 2024. The sediments are dominated by limestone derived clasts and fine-grained material. b, Biological material within Unit 1 (Sh – shell, Bo – Bone, BF – Burrow fill). e, Burnt limestone fragment (BLF) next to an iron oxide enriched intraclast (IC) of reworked sediment. f, Charcoal Fragment (ChF) and Burnt limestone fragment (BLF) in Bed 10. g, Iron enriched sediments of Unit 1 from directly below the contact with Bed 12. h, Pelleted microfabric of lens b below Bed 10. 6 and sediment section (Supplementary Information 2) for the location of thin-section samples. This sample presents the key characteristics of patterns of sedimentation prior to the Mesolithic occupation. These can be summarized as consisting of: 1) rare limestones clasts (frequently showing evidence for in situ decay), 2) intraclasts of reworked sediment and 3) fragments of terrestrial mollusc shell. MM3 is rich in >1 mm sized charcoal fragments and limestone clasts, some of which show evidence for a strong degree of burning. The sediment of MM5 is, consequently, characterized by a mixture of both sediment types. Charcoal fragments are abundant (but only rarely >1 mm) but the matrix overall is more typical of MM2. Bones fragments are present as are circular features that are characteristics of deformation which could be either biological or physical in origin. d, MM6 is taken from below darkened sediment believed to be in situ burning. The sediments of MM6 are more reddened than any other sampled sediments and occur directly below sediments the colour of which are more typical of the unaltered cave sediments of MM2. This unit is interpreted as being thermally altered as a direct result of in situ burning directly above these deposits. a, Phytoliths image identified in the Phase V deep hearth. b, Different dynamics of phytoliths identified in the layers that make up the hearth in relation to the number of phytoliths identified as spheroid echinate in each sample. Key: C, Control; TI, Thermal Impact; CR, Combustion Residue. c, Infrared spectra of sediments from some representative samples. Key: C, Control; TI, Thermal Impact; CR, Combustion Residue; Ca, Calcite; Cl, Clay; Qz, quartz; b, thermally altered clay; nb, not thermally altered clay. a, Photographs of lithics from Phase III (left) and Phase V (right) showing terrestrial and coastal raw material forms. b–d, Illustrations of lithics from Phases V (b), IV (c) and III (d). All are flakes except b3 (core) and b4 (retouched flake). Red deer remains, taphonomic modifications, and stable isotope results. a, Remains of red deer including a proximal radius (1), distal radius (b), proximal metatarsal (3), proximal metacarpal (4), scapula (5), and a distal metatarsal (6). Examples of taphonomic modifications including a midshaft fragment with a green fracture and a double notch with corresponding negative flake scars (7), a midshaft fragment with a green fracture (8), examples of charred bone (9–11), and examples of bone covered in adhering matrix (12–14). b, Results of the stable carbon δ13C and δ18O analysis by taxa. This document contains additional details and information including the context of research (1), a description of the deposits (2), details of the archaeobotany (3), a chronology (4), lithic analyses (5), faunal analyses (6), Supplementary Tables 1–16 (7), OxCal scripts (8) and Supplementary References (9). Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. et al. Hunter-gatherer sea voyages extended to remotest Mediterranean islands. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Nature volume 640, pages 435–447 (2025)Cite this article Here we introduce the MICrONS functional connectomics dataset with dense calcium imaging of around 75,000 neurons in primary visual cortex (VISp) and higher visual areas (VISrl, VISal and VISlm) in an awake mouse that is viewing natural and synthetic stimuli. These data are co-registered with an electron microscopy reconstruction containing more than 200,000 cells and 0.5 billion synapses. Proofreading of a subset of neurons yielded reconstructions that include complete dendritic trees as well the local and inter-areal axonal projections that map up to thousands of cell-to-cell connections per neuron. Released as an open-access resource, this dataset includes the tools for data retrieval and analysis1,2. Accompanying studies describe its use for comprehensive characterization of cell types3,4,5,6, a synaptic level connectivity diagram of a cortical column4, and uncovering cell-type-specific inhibitory connectivity that can be linked to gene expression data4,7. Functionally, we identify new computational principles of how information is integrated across visual space8, characterize novel types of neuronal invariances9 and bring structure and function together to uncover a general principle for connectivity between excitatory neurons within and across areas10,11. Francis Crick wrote in 197912 that “It is no use asking for the impossible, such as, say, the exact wiring diagram for a cubic millimetre of brain tissue and the way all its neurons are firing”. Crick's request was presumably motivated by the idea that the function of every neuron depends on its synaptic connections13, and such dataset would allow the rigorous test and refinement of hypotheses about network anatomy. For decades, these relationships were studied through challenging single-cell experiments14,15,16,17 or electrophysiology recordings18,19. Later, by combining calcium imaging with in vitro electrophysiological20 and viral tracing methods21 it was possible to link the functional recordings to the underlying connectivity. Much has been learned from these experiments, but they provide fragmentary information. To realize Crick's vision, volumetric electron microscopy (EM) can be combined with calcium imaging22,23, as demonstrated at smaller scales in the visual cortex24,25,26, retina27,28,29 and other systems30,31. Here we present a dataset (Fig. 1) that bridges neuronal function and connectivity at the cubic millimetre scale in mouse visual cortex (in vivo dimensions 1.3 × 0.87 × 0.82 mm3). To measure visual responses, we performed calcium imaging of excitatory neurons across cortical layers in response to visual stimuli. To map connectivity, we imaged the same cubic millimetre with serial section transmission EM (TEM). Using scalable convolutional networks and custom computational systems, we reconstructed neurons and their synaptic connections in 3D, with extensive proofreading to ensure accuracy. Finally, we co-registered the calcium-imaging and TEM data to match neuronal responses to neurons and their connectivity. a, The nine data resources that are publicly available at https://www.microns-explorer.org/. b, Relationship between different data types. The primary in vivo data resource consists of 2P calcium images, 2P structural images, natural and parametric video stimuli used as visual input, and behavioural measurements. The secondary (derived) in vivo data resource includes the responses of approximately 75,909 pyramidal cells from cortical layer 2 to 5 segmented from the calcium videos, along with the pupil position and diameter extracted from the video of eye movements and locomotion measured on a single-axis treadmill. The primary anatomical data are composed of ex vivo serial section transmission EM images registered with the in vivo 2P structural stack. The volume includes a portion of VISp and three higher visual areas—VISlm, VISrl and VISal—for all cortical layers except extremes of layer 1. The secondary anatomical data is derived from the serial section EM image stack, and consists of semi-automated segmentation of cells, automated segmentation of nuclei, and automatically detected synapses. The tertiary anatomical data consists of assignments of the synapses to presynaptic and postsynaptic cells, triangle meshes for these segments, classification of nuclei as neuronal versus non-neuronal, and classification of neurons into excitatory and inhibitory cell classes. Secondary data for co-registration of in vivo and ex vivo images consists of manually chosen correspondence points between 2P structural images and EM images. Tertiary co-registration data are a transformation derived from these correspondence points. Using the interactive tools, one can visualize the input and output synapses of a single cell (https://go.nature.com/io). The database of functional recordings (https://www.microns-explorer.org/cortical-mm3#f-data) is also available for download to explore how cells responded to visual stimuli. Detailed morphological and synaptic data enabled novel approaches to characterize cell types3,4,5,6,7 and show that connectivity can be used to identify cell types that are difficult to identify by morphology alone4, a recurring theme in connectomic cell typing. We also began to establish correspondences between connectivity and transcriptomics-defined cell types7. The combination of structural connectivity and functional similarity across thousands of pairs of individual neurons enabled a new examination of ‘like-to-like' connectivity25,32 and shows that this principle generalizes across cortical layers and visual areas10. This work relied on a novel approach using an artificial neural network that was trained to predict neural activities from visual stimuli10,11. Further linked Articles utilize this model to point the way to experimental studies of the mechanisms supporting contextual interactions8,9,10 and invariances9 in visual cortical computations. To maximize its impact, we have made the data publicly available as a resource (https://www.microns-explorer.org/) with tools for interactive exploration and programmatic analysis. Finally, the accompanying studies highlight the tools that we developed to scale up connectomics to a cubic millimetre1,2,11,33. These technologies are enabling broader applications, such as reconstruction of the entire wiring diagram of a whole fly brain34,35,36, the first adult connectome to be completed since that of Caenorhabditis elegans. The data were collected from a single mouse and involved a pipeline spanning three primary sites. First, two-photon (2P) in vivo calcium imaging under various visual stimulation conditions was performed at Baylor College of Medicine. Then the mouse was shipped to the Allen Institute, where the imaged tissue volume was extracted, prepared for EM imaging, sectioned and imaged over a period of six months of continuous imaging. The EM data were then montaged, roughly aligned and delivered to Princeton University, where fine alignment was performed and the volume was densely segmented. Finally, extensive proofreading was performed on a subset of neurons to correct errors of automated segmentation, and cell types and various other structural features were annotated (Fig. First, in vivo measurements of neuronal functional properties are acquired from a region of interest (ROI) in the mouse visual cortex. In addition, a spatial overlapping in vivo structural image stack is collected to facilitate later registration with postmortem data. These sections are then imaged by TEM, and the resulting images are assembled into a 3D volume. Automated methods subsequently reconstruct the cellular processes and synapses within this volume, and the automated reconstructions are proofread as needed to ensure accuracy for further analysis. Image panels are adapted from Yin et al.63, Springer Nature Limited, and mouse and autoTEM drawings are adapted from Mahalingam et al.64, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). The calcium-imaging data include the responses to visual stimuli of an estimated 75,909 excitatory neurons spanning cortical layers 2 to 5 across 4 visual areas in a transgenic mouse that expressed GCaMP6s in excitatory neurons via Slc17a7-Cre and Ai162. The dataset contains 14 individual scans, collected between postnatal day 75 (P75) and P81, spanning a volume of approximately 1,200 × 1,100 × 500 µm3 (anteroposterior × mediolateral × radial depth; Fig. The centre of the volume was placed at the junction of primary visual cortex (VISp) and three higher visual areas—lateromedial area (VISlm), rostrolateral area (VISrl) and anterolateral area (VISal)—in order to image retinotopically matched neurons that were potentially connected via inter-areal feedforward and feedback connections. a, Representation of the 2P functionally imaged volume with area boundaries (white) and vascular label from structural stack (red). b, Wireframe representation of 104 planes registered in the structural 2P stack. c, Mean depth of posterior (post.) registered fields relative to the pial surface. d, 3D scatter plot of each functional mask in its registered location in the structural 2P stack. Black, VISp; red, VISlm; blue, VISal; green, VISrl. e, Example frames from each of the five stimulus types (cinematic, Sports-1M, rendered, Monet2 and Trippy) shown to the mouse. f, Raster of deconvolved calcium activity for three neurons to repeated stimulus trials (oracle trials; ten repeats of six sequential clips, with each repeat normalized independently). Rasters for high (top), medium (middle) and low (bottom) oracle scores with the percentile shown on the right. g, Trial-averaged raster (central 500 ms of trial-average raster for each direction, out of 937 ms) of deconvolved calcium activity for 80 neurons in 40 Monet2 trials (16 randomly ordered directions) grouped by preferred direction (5 neurons per direction; alternating blue shading) and sorted according to the stimulus directions. Each scan consisted of two adjacent overlapping 620-µm-wide fields at multiple imaging planes, imaged with the wide field of view (FOV) of the 2P random access mesoscope (2P-RAM). The scans ranged up to approximately 500 µm in depth, with a target spacing of 10–15 µm to maximize the coverage of imaged cells in the volume (Fig. For 11 of the 14 scans, 4 imaging planes were distributed widely in depth using the mesoscope remote focus, spanning roughly 300–400 µm with an average spacing of approximately 125 µm between planes for near-simultaneous recording across multiple cortical layers. In the remaining 3 scans, fewer planes were imaged at 10–20 µm spacing to achieve a higher effective pixel density (Extended Data Table 1). These higher-resolution scans were designed to be amenable to future efforts to extract signals from large apical dendrites from deeper layer 5 and layer 6 neurons. However, for this release, imaging data were automatically segmented only from somas using a constrained non-negative matrix factorization approach and fluorescence traces were extracted and deconvolved to yield activity traces. In total, 125,413 masks were generated across 14 scans, of which 115,372 were automatically classified as somatic masks by a trained classifier (Fig. In addition, we developed an imaging workflow with the goal of full coverage within the target volume. This required several optimizations—for example, to densely target scan planes across multiple days, we needed a common reference frame to assess the coverage of scans within the volume. Therefore, in addition to the functional scans, high-resolution (0.5–1.0 pixels per µm) structural volumes were acquired for registration with the subsequent EM data. At the end of each imaging day, individual imaging fields of the functional scans were independently registered into a structural stack (Fig. On the last day of imaging, a 2-channel (green, red) 1,412 × 1,322 × 670 μm3 (anteroposterior × mediolateral × radial depth) structural stack was collected at 0.5 pixels per μm after injection of fluorescent dye (Texas Red) to label vasculature, enhancing fiducial labelling for co-registration with the EM volume (Fig. Based on this analysis, we estimate the functional imaging volume contains 75,909 unique functionally imaged neurons consolidated from 115,372 segmented somatic masks, with many neurons imaged in 2 or more scans. Treadmill rotation (single axis) and video of the left eye were captured throughout the scan, yielding locomotion velocity, eye movements and pupil diameter data. The stimulus for each scan lasted approximately 84 min, and consisted of naturalistic (complex scenes with real-world statistics) and parametric (simpler, artificially generated) video stimuli. The majority of the stimulus (64 min) was made up of 10 s clips drawn from films, the Sports-1M dataset37 or rendered first-person point of view (POV) movement through a virtual environment (Fig. Our goal was to approximate natural statistical complexity to cover a sufficiently large feature space. These data can support multiple lines of investigation, including applying deep learning-based systems identification methods to build highly accurate models that predict neural responses to arbitrary visual stimuli11,38. These models enable a systematic characterization of tuning functions with minimal assumptions relative to classical methods using parametric stimuli38. The stimulus composition included a mixture of unique stimuli for each scan, some that were repeated across every scan, and some that were repeated within each scan. In particular, 6 natural film stimuli clips totalling 1 min (oracle natural videos) were repeated in the same order 10 times per scan, and were used to evaluate the reliability of the neural responses to repeated visual stimuli (Fig. Variations in this ‘oracle score' from scan to scan serve as an important indicator of scan quality, since reliable responses are not observed when imaging conditions are poor or the mouse is not engaged with the stimulus. To relate our findings to previous work, we also included a battery of parametric stimuli (Monet2 and Trippy, 10 min each; Methods, ‘Stimulus composition') that were generated to produce spatially decorrelated stimuli that were suitable for characterizing receptive fields while also containing local or global directional and orientation components for extracting basic tuning properties such as orientation selectivity (Fig. After the in vivo neurophysiology data collection, we imaged the same volume of cortex ex vivo using TEM, which enabled us to map the connectivity of neurons for which we measured functional properties. These required considerable scaling from previous state-of-the-art datasets, with particular emphasis on automation and on reducing rare but potentially catastrophic events that could incur loss of multiple serial sections. The tissue sample was trimmed and sectioned into 27,972 serial sections (nominal thickness 40 nm) onto grid tape to facilitate automated imaging. Although the cutting was automated, it was supervised by humans who worked in shifts around the clock for 12 days. They were ready to stop and restart the ultramicrotome immediately if there was a risk of multiple section loss. As will be described later, the EM dataset is subdivided into two subvolumes owing to sectioning and imaging events (details of sectioning timeline and artefacts are presented in Methods). A total of 26,652 sections were imaged by 5 customized automated TEMs (autoTEMs), which took approximately 6 months to complete and produced a dataset composed of 2 Pb of raw data at a resolution of approximately 4 nm (Fig. Area borders calculated from calcium imaging are shown as black lines. b,c, Top view of small region showing the quality of the fine alignment and its robustness to large folds shown in c (the dataset is available at https://ngl.microns-explorer.org/#!gs://microns-static-links/mm3/data_fig/4b.json). d, Montage of a single section showing the coverage from pia to white matter and across three different cortical regions. f,g, Examples of excitatory synapses indicated with arrowheads (dataset available at https://ngl.microns-explorer.org/#!gs://microns-static-links/mm3/data_fig/4f.json (f) and https://ngl.microns-explorer.org/#!gs://microns-static-links/mm3/data_fig/4g.json (g)). h, Example of an inhibitory synapse (arrowhead) (dataset available at https://ngl.microns-explorer.org/#!gs://microns-static-links/mm3/data_fig/4h.json). 4a) was selected for further processing, as it had no consecutive section loss and an overall section loss of around 0.1%. This region contains approximately 95 million individual tiles that were stitched into 2D montages per section and then aligned in 3D. The two subvolumes were processed individually and later aligned to each other in the same global coordinate frame, enabling the tracing of axons and dendrites across their border (Fig. However, the two subvolumes were reconstructed separately and each has a distinct representation in the analysis infrastructure and database. b, Pyramidal cells from both subvolumes as they cross the subvolume boundary. To achieve this at petabyte scale, we split the process into distinct coarse and fine pipelines. For the coarse pipeline, sections were initially stitched using a per image affine transformation, and a polynomial transformation model was applied to a subset of sections whose stitching quality had a local misalignment error of more than five pixels. Down-sampled 2D stitched sections were then roughly aligned in 3D. The rough alignment process ensured global consistency within the dataset and accounted for images from multiple autoTEMs with varied image sizes and resolutions. It is also corrected for locally varying misalignments such as scale differences and deformations between sections and aids the fine alignment process. To further refine image alignment, we developed a set of convolutional networks to estimate pixel-wise displacement fields between pairs of neighbouring sections33. This process was able to correct nonlinear misalignments around cracks and folds that occurred during sectioning. Although this fine alignment does not restore the missing data inside a fold, it was still effective in correcting the distortions caused by large folds (Fig. 4b,c), which caused large displacements between sections and were the main cause of reconstruction errors. Although imaging was performed with 4 nm resolution, the aligned imagery volume was generated at 8 nm resolution to decrease data size for subsequent processing. We densely segmented cellular processes across the volume using affinity-predicting convolutional neural networks and mean affinity agglomeration (Fig. The automatic segmentation produced highly accurate dendritic arbors before proofreading, enabling morphological identification of broad cell types. Many non-neuronal objects are also well-segmented, including astrocytes, microglia and blood vessels. Nuclei were also automatically segmented (n = 144,120) within subvolume 65 using a distinct convolutional network33. To use nucleus shape to map cell classes across the dataset, we manually labelled a subset of the 2,751 nuclei in a 100-µm-square column of the dataset as non-neuronal, excitatory or inhibitory. We then developed machine learning models to automate distinguishing neurons from non-neuronal cells such as glia, as well as to classify cells at different levels of resolution2,6 within the subvolume with high accuracy (Methods). The results of this nucleus segmentation, manual cell classification and model building are provided as part of this data resource. Synaptic contacts were automatically segmented in the aligned EM image, and the presynaptic and postsynaptic partners from the cell segmentation were automatically assigned to identify each synapse (Fig. We manually identified synapses in 70 small subvolumes (n = 8,611 synapses) distributed across the dataset, giving the automated detection an estimated precision of 96% and recall of 89% (Extended Data Fig. We estimated partner assignment accuracy at 98% from a separate dataset of manually annotated synapses (n = 191) that were held-out from training. Although the automated segmentation creates impressive reconstructions, proofreading is required to make those reconstructions more complete and accurate. The proofreading process involves merging additional segments of the neurons that were missing in the reconstruction, and splitting segments that were incorrectly associated with a neuron. To perform real-time collaborative proofreading in a petascale dataset, we developed the ChunkedGraph proofreading system1 that can be used with Neuroglancer as a user interface or a REST (representational state transfer) application programming interface (API) for computationally driven edits. This flexibility enabled the proofreading methods to be tailored to different scientific needs, including manual, semi-automated and automated proofreading. Note that all proofreading was performed in subvolume 65. The released segmentation now contains all 1,046,656 edits of the proofreading that had occurred as of 16 September 2024 and is being updated quarterly. Proofreading was performed by individual scientists and focused teams of proofreaders to both support targeted scientific discovery for companion studies3,4,5,6,7,10 and correct errors that most affected general connectivity. 5), as neurons have been proofread as part of multiple Machine Intelligence from Cortical Networks (MICrONS) data analysis projects. For example, in the functional connectomics study, we proofread the full extent of axonal and dendritic arbors of 85 excitatory neurons within subvolume 65 (Fig. 5c), whereas for a broad columnar sample only the dendrites of 1,188 excitatory neurons were proofread. The result is a wide variation in edits per neuron with more edits generally corresponding to more extensive axons (100–1,000 edits per axon) (Extended Data Fig. The most time-consuming task is extending axons, and thus this is where the data varies most across cells and studies. In total, the released dataset includes 1,433 neurons that have proofread axons with varying levels of extension, where all incorrect mergers have been removed and many false splits corrected. From the proofread dendrites, we determined that 99% of inputs were correct when assigned to a postsynaptic soma in the automated segmentation. As a result, for the neurons with proofread axons all synapses—both input and output—are now correctly associated. The proofread excitatory neurons contain some of the most extensive axonal arbors reconstructed in the neocortex at EM resolution, with the longest excitatory axon measuring 18.9 mm with 2,483 synaptic outputs and inhibitory axons ranging from in length from 1.1 to 32.3 mm with a mean of 2,754 synaptic outputs (range 99–14,019) (Fig. In general, inhibitory axons were more complete in the automated reconstruction, probably because their axons are slightly thicker than those of most excitatory axons. In addition to proofreading axons and dendrites, we made widespread edits to enhance the general dataset quality. Following the automated segmentation, there were 7,050 segmented objects consisting of a total of 17,753 neurons that were merged together (based on nucleus segmentation), preventing analysis of these cells. Using a combination of manual and automated error-detection workflows, we have split almost all neurons into single-soma objects, bringing the total number of individually segmented neurons to 84,035 (Extended Data Fig. To work through such dataset-wide tasks more quickly, we developed and validated an automated error-detection and correction workflow using graph and morphological analysis to identify merge error locations and generate edits that could be executed using PyChunkedGraph (PCG)1. This automated approach (NEURD) was also used to remove false axon merges onto dendritic segments and split axon branches with abnormally high degree across the dataset2, totalling more than 164,000 edits. Proofreading is ongoing in the dataset with regular public updates, and there is now a project called the Virtual Observatory of the Cortex (https://www.microns-explorer.org/vortex) funded by the National Insitutes of Health (NIH), to which individual researchers can submit scientific requests to steer proofreading and annotation of the dataset in directions that will move their research questions forward. Functional connectomics requires that cells are matched between the 2P calcium-imaging and EM coordinate frames. We achieved this using a three-phase approach combining expert annotations and automatic methods. In the first step, we generated a co-registration transform using a set of 2,934 expert-matched fiducials between the EM volume and the 2P structural dataset (1,994 somata and 942 blood vessels, mostly branch points, which are available as part of the resource; Methods). To evaluate the error of the transform we evaluated the distance in micrometres between the location of a fiducial after co-registration and its original location; a perfect co-registration would have residuals of 0 μm. For the second step we used the results of the transform to guide a group of experts to manually match 19,181 functional ROIs from 14 scans to 15,439 individual EM neurons (multiple functional ROIs can match to a single EM neuron if it was present in multiple scans). The results of manual matching provide both high-confidence matches for analysis and ‘ground truth' for fully automated approaches. These results help to validate the first phase, as most matched ROIs have low residuals and high separation scores (Extended Data Fig. Furthermore, as expected for successful matches, ROIs with at least moderate visual responses that are independently matched to the same neuron across multiple scans have higher signal correlations than adjacent neurons (Extended Data Fig. In the third and final step, we used two automated approaches to match the entire set of functional ROIs. The first approach used the EM-to-2P co-registration transform to move the centroids of all EM neurons (predicted from nucleus detections) to the 2P coordinate space, and then used minimum weight matching for bipartite graphs to assign functional ROIs to EM neurons. This method (referred to as the fiducial-based automatch table) resulted in 84,198 functional ROIs matched to 37,364 EM neurons. Considering all matches, this method achieved 83% precision relative to manual matchers, but filtering out matches in the bottom 30% of separation scores yields 90% precision, while still including 59,934 functional ROIs and 31,042 EM neurons. The second automated approach used only the EM and 2P blood vessel segmentations to generate a novel co-registration between the two volumes, using a fine-scale deformable B-spline-based registration. Then, minimum weight matching for bipartite graphs was used to assign functional ROIs to EM neurons. This table (referred to as the vessel-based automatch table) contains 75,856 functional ROIs matched to 34,712 EM neurons. Remarkably, this fiducial-free method performed as well as the fiducial-based method, achieving 84% precision with manual matches. Filtering out matches in the bottom 30% of separation scores yielded 90% precision, while including 53,248 functional ROIs and 28,233 EM neurons (Extended Data Fig. Finally, we tested whether taking only the matches for which both automated methods agree would increase the performance relative to manual matches. Indeed, this hybrid automated table achieves 89% agreement with no additional filtering, yielding 60,091 functional ROIs and 29,620 EM neurons (Extended Data Fig. From the site, users can browse through the large-scale EM imagery and segmentation results using Neuroglancer (https://github.com/google/neuroglancer); several example visualizations are provided to get started. All data are served from publicly readable cloud buckets hosted through Amazon Web Services (AWS) and Google Cloud Storage. To enable systematic analysis without downloading hundreds of gigabytes of data, users can selectively access cloud-based data programmatically through a collection of open source Python clients (Extended Data Table 2). The functional data, including calcium traces, stimuli, behavioural measures and more, are available in a DataJoint database that can be accessed using DataJoint's Python API (https://datajoint.com/docs/), or is available as neurodata without borders (NWB) files on the Distributed Archives for Neurophysiology Data Integration (DANDI) Archive (https://dandiarchive.org/dandiset/000402). EM imagery and segmentation volumes can also be selectively accessed using cloud-volume (https://github.com/seung-lab/cloud-volume), a Python API that simplifies interacting with large-scale image data. Mesh files describing the shape of cells can be downloaded with cloud-volume, which also provides features for convenient mesh analysis, skeletonization and visualization. Annotations on the structural data, such as synapses and cell body locations, can be queried via CAVE client, a Python interface to the Connectome Annotation Versioning Engine (CAVE) APIs (Fig. CAVE encompasses a set of microservices for collaborative proofreading and analysis of large-scale volumetric data. a–e, Cell body locations and cell-are type classifications, all nucleus detections shown in light grey. a, Non-neuronal cells, manually typed (dark outlines) and classifier-based (no outline)6. b, Excitatory cells, labelled by unsupervised clustering of morphological features4 (dark outline) and a model based on those labels6. c, Inhibitory cells, classified by human experts4 and trained models6. d, Neurons registered to in vivo functional traces. e, Proofreading status of neurons in subvolume 65: black dots (fully proofread), red (cleaned of false merges but potentially incomplete) and blue (dendrites cleaned/extended). f, The number of output synapses per neuron shown in e versus the fraction mapped to a single postsynaptic soma, coloured by cell class. g, A fully proofread pyramidal cell (nucleus ID: 294657, segment ID: 864691135701676411) with postsynaptic soma locations shown as coloured dots (by cell class). Cells with functionally co-registered regions are outlined in dark green. The cell and its synapses are viewable at https://neuroglancer-demo.appspot.com/#!gs://microns-static-links/mm3/data_fig/6f.json. i, EM image (i) and corresponding image from the 2p structural stack (j) centred on the cell shown in g (yellow circle). k, Functional responses of the presynaptic (presyn) neuron (g; yellow) and its functionally co-registered postsynaptic (postsyn) targets. Heat maps show average ΔF/F traces for the presynaptic neuron and postsynaptic targets, sorted by synaptic strength, in response to oracle clips from functional scans. The first collection of annotation tables available through CAVE client focus on the larger subvolume of the dataset, which we refer to within the infrastructure as Minnie65, and which has been the current focus of proofreading and ongoing analysis (Extended Data Table 3). The largest table describes connectivity, contains all 337.3 million synapses and is searchable by presynaptic ID, postsynaptic ID and spatial location. In addition, there are several tables that describe the soma location of key cells, predictions for which cells are different non-neuronal (Fig. There are also annotations that denote which cells have been functional co-registered (Fig. 6d) and which cells have been proofread to different degrees of completion (Fig. In this release, the only table available for Minnie35 contains synapses, as its segmentation and alignment occurred later and little proofreading, annotation or analysis has been conducted within it. We expect that continued proofreading and analysis of the data will lead to updated and additional tables for both portions of the data in future data releases. Here, we provide an example to suggest how the data might be used together. The power of the dataset lies in the fact that when an axon is proofread, it contains hundreds to more than ten thousand output synapses (Fig. Furthermore, between 60 and 95% of those outputs can be accurately mapped onto their postsynaptic targets with a known soma location, depending on the cell type and its spatial location in the volume (Fig. This is because the segmentation is highly accurate for dendritic inputs, with a 99% input precision based on comparing proofread with non-proofread dendrites. To seed an analysis with an as-complete-as-possible cell, one might begin by using the proofreading table to identify a neuron with complete axons and dendrites and querying for all the synaptic inputs and outputs for the cell, in this case a L2/3 cell in VISp (Fig. For this particular proofread neuron, 74.5% (1,053 out of 1,412 synapses) are onto objects with a single nucleus (as determined from automated detection), with 275 synapses onto cells classified as inhibitory, 662 synapses onto cells classified as excitatory, and 116 synapses onto cells whose soma did not pass classification quality control (Fig. By filtering the synaptic targets with functionally matched neurons (Fig. 6k), one can further identify which targets have been matched to the functional experiments (365 out of 1,412) and use DataJoint to query the functional data or read NWB files deposited in the DANDI data archive (Fig. Subsequent investigation could examine the morphology of such cells in detail, or consider functional responses of their targets. We have provided example notebooks that walk through the above examples and more to help users get started. Together, these data provide a platform for analysis of the relationship between the synaptic structure, neuronal morphology and functional tuning of mouse visual circuits. Connectivity and morphology are key properties of cell types, and the scale of this dataset enables an unprecedented exploration of the anatomical diversity of cortical neurons as well as a need to relate known cell types to EM data. We have taken multiple approaches to addressing these challenges in the accompanying studies. Two projects3,4 applied data-driven methods to dendritic reconstructions to characterize excitatory neurons across cortical depth and visual areas, revealing intralaminar subtypes and inter-areal differences in populations. Another study linked transcriptomic types of inhibitory neurons to EM reconstructions, establishing a proof of concept for linking molecular cell types to anatomical cell types that use morphology and synapse connectivity7. Although these studies used proofread or post-processed neuronal reconstructions, not all segmented neurons in the dataset were amenable to such analysis due to truncation by dataset boundaries or segmentation quality. To push cell typing even in such difficult cases, a fourth study showed that key features of the soma and nucleus of a cell alone was sufficient to predict cell classes such as glia, excitatory neuron or inhibitory neuron, as well as subclasses such as basket cells versus bipolar cells or microglia versus oligodendrocytes, or identify similar cells to a cell of interest6. Together, these approaches enable matching known cell types with EM neurons and using the EM data to discover new cell types. The integration of cell-type classifications with additional modalities enables a powerful set of tools for discovery. Examining the output of proofread neurons, which includes more than 900,000 synaptic connections between neurons, reveals key differences in the interlaminar communication between excitatory and inhibitory neurons (Fig. The size of the dataset also allows for a comprehensive analysis of cell-type connectivity, including tracing across one or more steps along the synaptic network. A major finding from multiple studies of the MICrONS dataset is the widespread specificity of connectivity exhibited by various inhibitory4,7 and excitatory5 cell types. As an example of such analysis, we can follow a collection of layer 3 pyramidal neurons and compare their first-order (direct) connectivity onto excitatory cell types and inhibitory neurons as well as the second-order (two-hop) connectivity of those inhibitory neurons that are targeted by the layer 3 cells (Fig. a–c, Connectivity matrix for proofread neurons connecting to all postsynaptic targets of the predicted class: excitatory→excitatory (a); excitatory→inhibitory (b); inhibitory→excitatory (c). Dots are transparent, with darker shades indicate more connections between laminar depths. Layer boundaries are shown as dashed grey lines. d, First-order and second-order synaptic output heat maps of seven layer 3 pyramidal cells similar to the one shown in Fig. Left, total number of synapses that each layer 3 pyramidal cell makes with each of their order 1 postsynaptic excitatory cell types. Greyscale heat map (top) showing number of synapses that each L3 pyramidal cell makes with their individual order 1 postsynaptic inhibitory partners, sorted by synaptic targeting types and soma depth from the pia to white matter (WM). Coloured heat map (bottom) showing total number of synapses that each order 1 inhibitory partner makes with each of their postsynaptic order 2 excitatory partners of layer 3 pyramidal cells, colour-coded by the synaptic targeting types of order 1 inhibitory partners. EM is widely recognized as the gold standard for identifying structural features of synapses, and most datasets, including the output of the MICrONS project, were primarily created to answer questions related to circuit-level connectivity. Regardless of the original intent, the scale and high resolution of the MICrONS dataset offers information that is far richer and of broader interest than just connectivity. Furthermore, the segmentation includes non-neuronal cells such as microglia, astrocytes, oligodendrocyte precursor cells and oligodendrocytes, as well the fine morphology of the cortical vasculature. The scale of large functional and EM datasets presents a wealth of opportunities for analysis and discovery. With advances in microscopy and computing power, it is now possible to work with datasets that are orders of magnitude larger than just a few years ago with millions of synapses and tens of thousands of recorded neurons. Among the key opportunities presented by this data is the ability to identify patterns and trends that may be hidden in smaller datasets, the ability to identify and validate general principles at a larger scale, and the ability to perform more sophisticated analyses—since with more data, it is possible to use more complex algorithms and models including machine learning techniques. The accessibility of these datasets also enhances hypothesis-driven approaches by enabling scientists to investigate whether specific types of connectivity exist among different cell types of interest. Additionally, the scale of the data and the availability of exploration tools to facilitate the discovery of anomalies or contradictions to current hypotheses and provide opportunities to address and resolve them effectively. Both of these approaches can help to identify patterns and trends that would be difficult to observe using smaller datasets. However, larger datasets have limitations and challenges associated with them. When analysing the connectivity graph, it is essential to keep in mind that although, as shown by our results, the automatic segmentation of dendritic inputs is highly accurate, the automatic segmentation of axons is not as accurate. Therefore, it is essential to be aware of which processes have been proofread and to what extent. Additionally, it is worth considering that although each neuron in the dataset receives thousands of inputs, a percentage of synapses in the dataset are on detached spines. Depending on the scientific question being asked, it is worth considering whether these detached spines may create bias in the conclusions drawn, such as distinguishing between excitatory and inhibitory inputs5. In the functional data, it is important to recognize that photon scattering and out-of-plane fluorescence may cause signal degradation and contamination with increasing depth from the pia surface, especially given the dense GCaMP6s expression in excitatory somas and neurites39. Caution should be taken to disentangle true biological variation in neuronal tuning across layers from these optical artefacts, by either matching controls at the same depth, or validating the finding with a method that is less prone to these artefacts (such as electrophysiology or 2P microscopy with more sparse or targeted labelling). Furthermore, although all functional imaging was done in the same volume, it was done across several distinct imaging sessions. Technical factors as well as changes in the physiological state of the mouse should be taken into account when analysing functional recordings that were taken at different times. The simultaneous recordings of treadmill activity and pupillometry can be used to help account for variability due to state. Developing and executing this pipeline took a large team effort, and so it is worth reflecting on the practical limitations and bottlenecks in generating datasets of this scale. Proofreading and analysis remains the largest overall expense in terms of person hours, although it can be distributed across diverse scientific interests. Improvements in data quality, such as folds, membrane clarity and errors in computational image alignment are the most pressing technical issues that appear to limit the quality of the automated segmentation. The present dataset has already collected more than a million manual corrections to the automated segmentation, which are available for querying via CAVE1. We hope that these edits can be leveraged in the future to make more accurate automated segmentation, or a more extensively automated edit approach that can further increase the efficiency of proofreading. Analysis questions are often diverse in nature, so it is difficult to predict all the computational steps that are required, but having a more general framework and scalable technique of identifying specific features (such as cell types, spines and organelles) within the dataset would help increase efficiency, rather than using the specialized pipelines we used here. Beyond these, there are no fundamental technical limitations to producing more data at this scale for other individual animals, species or brain regions. The importance of high-resolution structural data was recognized early in invertebrate systems, particularly in the worm41,42. The whole fly brain fills about one-third of a 750 × 350 × 250 µm3 bounding box, and the nerve cord fills about one-quarter of a 950 × 320 × 200 µm3 bounding box45, well within the bounds of contemporary EM methods. The creation of these datasets has spurred investment in both manual skeletonized reconstruction and automated dense reconstructions33,44,46, with both centralized and community-minded efforts to proofread and mine them for biological insight26,44,47,48,49. In addition to the many targeted reconstructions in these datasets, large-scale proofread reconstructions from these datasets now include a manually traced full larval brain, a densely segmented and extensively proofread partial central brain and a densely segmented and proofread complete adult brain. These datasets collectively span nearly the entire fly nervous system and are driving a revolution in how fly systems neuroscience is being studied. In the mammalian system, there is currently no EM dataset that contains a complete area, let alone a complete brain. There is however, as mentioned above, an established culture of making data open and publicly available24,26,50,51,52. In the past 10 years, there have been only three other rodent EM datasets with publicly available reconstructions that are at least 5% the size of the MICrONS multi-area dataset presented in this Article. A dataset from mouse lateral geniculate nucleus that is 500 × 400 × 280 µm3 in size and contains around 3,000 neuronal cell bodies is publicly available54. This dataset is large enough that dendritic reconstructions from the centre of the volume are nearly complete, and it has a sparse manual segmentation, covering around 1% of the volume, which includes 304 thalamocortical cells and 162 axon fragments. The third dataset is a 424 × 453 × 360 µm volume covering layer 4 of mouse primary somatosensory cortex, with manual reconstruction of 52 interneuronal dendrites and many axons55. It is critically important to compare circuit architectures across regions and species. The neocortex is of particular interest as it is expanded in human compared to mouse. This research includes morphological and electrical properties of neurons, density of spines, synapses and neurons, as well as biophysical properties and morphology of synaptic connections56,57,58. This human dataset is publicly available, including a dense automated reconstruction of all objects, with around 16,000 neurons, 130 million synapses and an initial release of 104 proofread cells. These human connectomics data will doubtless yield critical insights. To some extent, the wide and thin dimensions of the human dataset trades off completeness of local neurons and circuits in order to sample all layers, whereas the nearly cubic volume described here is more suitable for studying local circuits and long-range connections across areas. By contrast, the functional connectomics data we have released includes both anatomy and activity of the same cells. In the mammalian nervous system, transcriptomics has been the most scalable approach for cell-type taxonomies. In smaller organisms such as the fly, for which we have both extensive gene expression maps, whole-brain neuronal reconstructions and nearly complete connectomes, integration across modalities has been a powerful engine of discovery. Moreover, the availability of connectomes in the fly have enabled a much higher resolution of cell types, with novel taxonomies and new cell types being discovered44. The accompanying studies4,5,6,7 suggest that a similar path to cell-type discovery will be enabled by large-scale EM in the mammalian system with novel cell types and novel patterns of connectivity. This wealth of structural data on cell types and circuits provides strong constraints on the nature of the computations that the brain performs, whereas genes provide constraints on how this structure is built and operates. Linking connectomics to transcriptomics is a first step for merging connectivity with molecular information and building cell-type-specific tools that are informed by how neurons connect. In one of the accompanying studies7, we offer a proof of concept on how to achieve this link for Martinotti cells, using morphology as a common feature to integrate PatchSeq and EM datasets, suggesting a broader pathway for multimodal integration. There is however an important difference to be drawn with Drosophila, in which a cell type often consists of just a few neurons that share similar functional properties that are reproducible across individuals. Owing to this stereotypy, a connectome mapped in one fly can usually be used by researchers studying neuronal function in other flies. Rules of connectivity based on cell types have proved sufficient for understanding and modelling many functions of increasingly complex neural circuits60,61. This is why it is important to combine cortical connectomics with functional studies of the same neurons in the same brain. This is also why the mapping of cortical connectivity must go beyond rules that depend solely on cell types. Almost 50 years after Crick described his “impossible” experiment, we have provided a first draft, but its full promise will take some time to achieve. Most importantly, complete segmentation still requires an extensive amount of proofreading for the largest datasets, such as the millimetre scale cortical reconstruction reported here. Similarly, simultaneously recording single action potentials from tens of thousands of neurons is constrained by sensor dynamics and optical sampling constraints. Nonetheless, there has been steady progress. The first structure–function studies that combined 2P microscopy and EM examined how the wiring of mouse retina27,28,29,30,31 and mouse visual cortex24 related to functional properties. Lee et al.25 related visual tuning properties of 50 functionally characterized neurons in primary visual cortex to their connectivity measured via EM reconstruction of a 450 × 450 × 150 µm volume. One thousand synapses were mapped by hand, yielding a graph of connectivity between 29 orientation-tuned cells (a subset of the characterized cells, as in the current dataset). Subsequently, our consortium used dense segmentation plus proofreading of a 250 × 140 × 90 µm dataset26 from mouse layer 2/3 visual cortex, yielding many more overall connections, but still only twice the number of functionally characterized cells. Perhaps most impressively, In the olfactory bulb of the zebrafish, Wanner et al.62 manually reconstructed almost all neurons (n = 1,003) within a 72 × 108 × 119 µm3 volume, in which responses to odours were measured in vivo. Their analysis of the 18,483 measured connections revealed how this structural network mediated de-correlation and variance normalization of the functional responses and demonstrates how larger measurements of network structure and function can provide mechanistic insights. By contrast, the data released here contains tens of thousands of neurons with functionally characterized responses to visual stimuli and, because it is densely segmented and contains complete dendritic and local axonal arbors of centrally located cells, the opportunities to study connected neurons are orders of magnitude greater. As an example, from just 94 proofread excitatory axons, one can query 69,962 output synapses, which map to 20,112 distinct neuron soma in the volume. Moreover, inspired by recent advancements in artificial intelligence, we also created a functional digital twin of the MICrONS mouse that can enable a more comprehensive analysis of function10,11. Specifically, we developed a ‘foundation model'11 for the mouse visual cortex using deep learning that was trained using large-scale datasets from multiple visual cortical areas and mice, recorded while they viewed ecological videos. The model demonstrated its generalization abilities by accurately predicting neuronal responses, not only to natural videos, but also to various new stimulus domains, such as coherent moving dots and noise patterns, as confirmed through in vivo testing10,11. By applying the foundation model to the MICrONS mouse data, we created a functional digital twin of this mouse, paving the way for a systematic exploration of the relationship between circuit structure and function for tens of thousands of neurons connected with millions of synapses. In a large volume with complete and segmented dendrites and local axons, this can be achieved. Currently, the dendrites are nearly completely segmented (Fig. A goal in future years will be to complete the segmentation through a combination of additional machine learning and improved proofreading. If, in addition, most cell bodies have physiology with single-spike resolution, then Crick's experimental challenge will be met. These remaining hurdles may take some time to clear, but the next steps are becoming apparent. All procedures were approved by the Institutional Animal Care and Use Committee (IACUC) of Baylor College of Medicine. All results described here are from a single male mouse, age 65 days at onset of experiments, expressing GCaMP6s in excitatory neurons via Slc17a7-Cre65 and Ai16266 heterozygous transgenic lines (recommended and generously shared by H. Zeng at Allen Institute for Brain Science; JAX stock 023527 and 031562, respectively). In order to select this animal, 31 (12 female, 19 male) GCaMP6-expressing animals underwent surgery as described below. Of these, eight animals were chosen based on a variety of criteria including surgical success and animal recovery, the accessibility of lateral higher visual areas in the cranial window, the degree of vascular occlusion, and the success of cortical tissue block extraction and staining. Of these 8 animals, one was chosen for 40-nm slicing and EM imaging based on overall quality using these criteria. Anaesthetized mice were placed in a stereotaxic head holder (Kopf Instruments) and their body temperature was maintained at 37 °C throughout the surgery using a homeothermic blanket system (Harvard Instruments). After shaving the scalp, bupivicane (0.05 ml, 0.5%, Marcaine) was applied subcutaneously, and after 10–20 min an approximately 1 cm2 area of skin was removed above the skull and the underlying fascia was scraped and removed. The wound margins were sealed with a thin layer of surgical glue (VetBond, 3 M), and a 13-mm stainless steel washer clamped in the headbar was attached with dental cement (Dentsply Grip Cement). Using a surgical drill and HP 1/2 burr, a 4-mm-diameter circular craniotomy was made centred on the border between primary visual cortex and lateromedial visual cortex (V1, lateral–medial; 3.5 mm lateral of the midline, ~1 mm anterior to the lambda suture), followed by a durotomy. The cortical window was then sealed with a 4-mm coverslip (Warner Instruments), using cyanoacrylate glue (VetBond). Prior to surgery and throughout the imaging period, mice were singly housed and maintained on a reverse 12-h light cycle (off at 11:00, on at 23:00). Mice were head-mounted above a cylindrical treadmill and calcium imaging was performed using Chameleon Ti-Sapphire laser (Coherent) tuned to 920 nm and a large FOV mesoscope67 equipped with a custom objective (excitation NA 0.6, collection NA 1.0, 21 mm focal length). Laser power after the objective was increased exponentially as a function of depth from the surface according to: Here P is the laser power used at target depth z, P0 is the power used at the surface (not exceeding 10 mW), and Lz is the depth constant (not less than 150 μm). Maximum laser output of 115 mW was used for scans approximately 450–500 μm from the surface and below. Visual stimuli were presented to the left eye with a 31.8 × 56.5 cm (height × width) monitor (ASUS PB258Q) with a resolution of 1,080 × 1,920 pixels positioned 15 cm away from the eye. As the craniotomy coverslip placement during surgery and the resulting mouse positioning relative to the objective is optimized for imaging quality and stability, uncontrolled variance in animal skull position relative to the washer used for head-mounting was compensated with tailored monitor positioning on a six-dimensional monitor arm. In order to optimize the translational monitor position for centred visual cortex stimulation with respect to the imaging FOV, we used a dot stimulus with a bright background (maximum pixel intensity) and a single dark square dot (minimum pixel intensity). Dot locations were randomly ordered from a 5 × 8 grid to tile the screen, with 15 repetitions of 200 ms presentation at each location. An L-bracket on a six-dimensional arm was fitted to the corner of the monitor at this location and locked in position, so that the monitor could be returned to the chosen position between scans and across days. The craniotomy window was leveled with regards to the objective with six degrees of freedom, five of which were locked between days to allow us to return to the same imaging site using the z axis. Pixel-wise responses from a 3,000 × 3,000 μm ROI spanning the cortical window (150 μm from surface, five 600 × 3,000 μm fields, 0.2 pixels per μm) to drifting bar stimuli were used to generate a sign map for delineating visual areas68. Our target imaging site was a 1,200 × 1,100 × 500 μm volume (anteroposterior × mediolateral × radial depth) spanning layer 2 to layer 6 at the conjunction of VISp and three higher visual areas: VISlm, VISrl and VISal69. This resulted in an imaging volume that was roughly 50% VISp and 50% higher visual area (HVA). This target was chosen to maximize the number of visual areas within the reconstructed cortical volume, as well as maximizing the overlap in represented visual space. Once the imaging volume was chosen, a second retinotopic mapping scan with the same stimulus was collected at 12.6 Hz and matching the imaging volume FOV with four 600 × 1,100 μm fields per frame at 0.4 pixels per μm xy resolution to tile a 1,200 × 1,100 μm FOV at 2 depths (2 planes per depth, with no overlap between coplanar fields). Area boundaries on the sign map were manually annotated. Scan placement targeted 10–15 μm increments in depth to maximize coverage of the volume in depth. For 11 scans, imaging was performed at 6.3 Hz, collecting eight 620 × 1,100 μm fields per frame at 0.4 pixel per μm xy resolution to tile a 1,200 × 1,100 μm (width × height) FOV at four depths (two planes per depth, 40 μm overlap between coplanar fields). For 2 scans, imaging was performed at 8.6 Hz, collecting six 620 × 1,100 μm fields per frame at 0.4 pixels per μm xy resolution to tile a 1,200 × 1,100 μm (width × height) FOV at 3 depths (2 planes per depth, 40 μm overlap between coplanar fields). For 1 scan, imaging was performed at 9.6 Hz, collecting four 620 × 1,000 μm fields per frame at 0.6 pixels per μm xy resolution to tile a 1,200 × 1,000 μm (width × height) FOV at 2 depths (2 planes per depth, 40 μm overlap between coplanar fields). The higher-resolution scans were designed to enable future analysis efforts to extract signals from large apical dendrites for example using EM-Assisted Source Extraction (EASE70). In addition to locking the craniotomy window mount between days, the target imaging site was manually matched each day to preceding scans within several micrometres using structural features including horizontal blood vessels (which have a distinctive z-profile) and patterns of somata (identifiable by GCaMP6s exclusion as dark spots). The full 2P imaging processing pipeline is available at (https://github.com/cajal/pipeline). Raster correction for bidirectional scanning phase row misalignment was performed by iterative greedy search at increasing resolution for the raster phase resulting in the maximum cross-correlation between odd and even rows. Neurons were automatically segmented using constrained non-negative matrix factorization, then deconvolved to extract estimates of spiking activity, within the CaImAn pipeline71. Cells were further selected by a classifier trained to separate somata versus artefacts based on segmented cell masks, resulting in exclusion of 8.1% of masks. The functional data is available in a DataJoint72 database and can also be read as NWB files deposited in the DANDI data archive73. Approximately 55 min prior to collecting the stack, the mouse was injected subcutaneously with 60 μl of 8.3 mM Dextran Texas Red fluorescent dye (Invitrogen, D3329). The stack was composed of 30 repeats of three 620 × 1,300 μm (width × height) fields per depth in 2 channels (green and red, respectively), tiling a 1,400 × 1,300 μm FOV (460 μm total overlap in width) at 335 depths from 21 μm above the surface to 649 μm below the surface. The green channel average image across repetitions for each field was enhanced with local contrast normalization using a Gaussian filter to calculate the local pixel means and standard deviations. The resulting image was then Gaussian smoothed and sharpened using a Laplacian filter. Enhanced and sharpened fields were independently stitched at each depth. Finally, the resulting alignment was detrended in z using a Hann filter with a size of 60 μm to remove the influence of vessels passing through the fields. The resulting transform was applied to the original average images resulting in a structural 2P 1,412 × 1,322 × 670 μm (width × height × depth) volume at 0.5 × 0.5 × 0.5 pixels per μm resolution in both red and green channels. Owing to tissue deformation from day to day across such a wide FOV, some cells are recorded in more than one scan. To assure we count cells only once, we subsample our recorded cells based on proximity in 3D space. Functional scan fields were independently registered using an affine transformation matrix with 9 parameters estimated via gradient ascent on the correlation between the sharpened average scanning plane and the extracted plane from the sharpened stack. Using the 3D centroids of all segmented cells, we iteratively group the closest 2 cells from different scans until all pairs of cells are at least 10 μm apart or a further join produces an unrealistically tall mask (20 μm in z). Sequential registration of sections of each functional scan into the structural stack was performed to assess the level of drift in the z dimension. All scans had less than 10-μm drift over the 1.5-h recording, and for most of them drift was limited to <5 μm. These area masks were extended vertically across all depths, and functional units inherit their area membership from their stack xy coordinates. A hot mirror (Thorlabs FM02) positioned between the animal's left eye and the stimulus monitor was used to reflect an IR image onto a camera (Genie Nano C1920M, Teledyne Dalsa) without obscuring the visual stimulus. FOV was manually cropped for each session (ranging from 828 × 1,217 pixels to 1,080 × 1920 pixels at ~20 Hz), such that the FOV contained the superior, frontal, and inferior portions of the facial silhouette as well as the left eye in its entirety. Frame times were time stamped in the behavioural clock for alignment to the stimulus and scan frame times. Video was compressed using Labview's MJPEG codec with quality constant of 600 and stored the frames in AVI file. Light diffusing from the laser during scanning through the pupil was used to capture pupil diameter and eye movements. Notably, scans using wide ranges in laser power to scan both superficial and deep planes resulted in a variable pupil intensity between frames. A custom semi-automated user interface in Python was built for dynamic adaptation of fitting parameters throughout the scan to maximize pupil tracking accuracy and coverage. The video was further manually masked to exclude high intensity regions in the surrounding eyelids and fur. In cases where a whisker is present and occluding the pupil at some time points, a merge mask was drawn to bridge ROIs drawn on both sides of the whisker into a single ROI. The filtered image was an exponentially weighted temporal running average, which undergoes exponentiation, Gaussian blur, automatic Otsu thresholding into a binary image, and finally pixel-wise erosion/dilation. In cases where only one ROI was present, the contour of the binary ROI was fit with an ellipse by minimizing least squares error, and for ellipses greater than the minimum contour length the xy centre and major and minor radii were stored. In cases where more than one ROI was present, the tracking was automatically halted until the user either resolved the ambiguity, or the frame was not tracked (a NaN (Not a Number) is stored). Users could also return to previous points in the trace for re-tracking with modified processing parameters, as well as manually exclude periods of the trace in which insufficient reliable pupil boundary was visible for tracking. The mouse was head-restrained during imaging but could walk on a treadmill. Rostro-caudal treadmill movement was measured using a rotary optical encoder (Accu-Coder 15T-01SF-2000NV1ROC-F03-S1) with a resolution of 8,000 pulses per revolution, and was recorded at ~57–100 Hz in order to extract locomotion velocity. The stimulus was designed to cover a sufficiently large feature space to support training highly accurate models that predict neural responses to arbitrary visual stimuli11,38,74,75. Each scan stimulus was approximately 84 min in duration and comprised: 10 s each, 10 repeats per scan, 10 min total. 10 s each, 1 repeat per scan, 24 min total. Local directional parametric stimulus (Trippy): 20 seeds, 15 s each, 2 repeats (one in each half of the scan), 10 min total. 10 seeds conserved across all scans, 10 unique to each scan. Global directional parametric stimulus (Monet2): 20 seeds, 15 s each, 2 repeats (one in each half of the scan), 10 min total. 10 seeds conserved across all scans, 10 unique to each scan. Each scan was also preceded by 0.15–5.5 min with the monitor on, and followed by 8.3–21.2 min with the monitor off, in order to collect spontaneous neural activity. The visual stimulus was composed of dynamic stimuli, primarily including natural video but also including generated parametric stimuli with strong local or global directional component. Cinematic, from the following sources: Mad Max: Fury Road (2015), Star Wars: Episode VII—The Force Awakens (2015), The Matrix (1999), The Matrix Reloaded (2003), The Matrix Revolutions (2003), Koyaanisqatsi: Life Out of Balance (1982), Powaqqatsi: Life in Transformation (1988) and Naqoyqatsi: Life as War (2002). Sports-1M collection37, with the following keywords: cycling, mountain unicycling, bicycle, BMX, cyclo-cross, cross-country cycling, road bicycle racing, downhill mountain biking, freeride, dirt jumping, slopestyle, skiing, skijoring, Alpine skiing, freestyle skiing, Greco-Roman wrestling, luge, canyoning, adventure racing, streetluge, riverboarding, snowboarding, mountainboarding, aggressive inline skating, carting, freestyle motocross, f1 powerboat racing, basketball and base jumping. Rendered 3D video of first-person POV random exploration of a virtual environment with moving objects, produced in a customized version of Unreal Engine 4 with modifications that enable precise control and logging of frame timing and camera positions to ensure repeatability across multiple rendering runs. Environments and assets were purchased from Unreal Engine Marketplace. Assets chosen for diversity of appearance were translated along a piecewise linear trajectory, and rotated with a piecewise constant angular velocity. Intervals between change points were drawn from a uniform distribution from 1 to 5 s. If a moving object encountered an environmental object, it bounced off and continued along a linear trajectory reflected across the surface normal. Latent variable images were generated by re-generating the scenes and trajectories, rendering different properties, including absolute depth, object identification number and surface normals. All natural videos were temporally resampled to 30 frames per second, and were converted to greyscale with 256 × 144 pixel resolution with FFmpeg (ibx264 at YUV4:2:0 8 bit). Stimuli were automatically filtered for upper 50th percentile Lucas–Kanade optical flow and temporal contrast of the central region of each clip. All natural videos included in these experiments were further manually screened for unsuitable characteristics (for example, fragments of rendered videos in which the first-person perspective would enter a corner and become ‘trapped' or follow an unnatural camera trajectory, or fragments of cinematic or Sports-1M containing screen text or other post-processing editing). To probe neuronal tuning to orientation and direction of motion, a visual stimulus (Monet2) was designed in the form of smoothened Gaussian noise with coherent orientation and motion. In brief, an independently identically distributed (i.i.d.) Gaussian noise video was passed through a temporal low-pass Hamming filter (4 Hz) and a 2D Gaussian spatial filter (σ = 3.0° at the nearest point on the monitor to the mouse). Here, c = 2.5 is an orientation selectivity coefficient. At this value, the resulting orientation kernel's size is 72° full width at half maximum in spatial coordinates. To probe the tuning of neurons to local spatial features including orientation, direction, spatial and temporal frequency, the Trippy stimulus was synthesized by applying the cosine function to a smoothened noise video. In brief, a phase movie was generated as an i.i.d. uniform noise video with 4 Hz temporal bandwidth. An increasing trend of 8π s−1 was added to the video to produce drifting grating movements whereas the noise component added local variations of the spatial features. The video was spatially up-sampled to the full screen with a 2D Gaussian kernel with a sigma of 5.97 cm or 22.5° at the nearest point. A photodiode (TAOS TSL253) was sealed to the top left corner of the monitor, where stimulus sequence information was encoded in a three-level signal according to the binary encoding of the flip number assigned in order. This signal was recorded at 10 MHz on the behaviour clock (MasterClock PCIe-OSC-HSO-2 card). The signal underwent a sine convolution, allowing for local peak detection to recover the binary signal. The encoded binary signal was reconstructed for 89% of trials. We used six natural video conditions that were present in all scans and repeated ten times per scan to calculate an oracle score representing the reliability of the trace response to repeated visual stimuli. After optical imaging at Baylor College of Medicine, candidate mice were shipped via overnight air freight to the Allen Institute. All procedures were carried out in accordance with the Institutional Animal Care and Use Committee at the Allen Institute for Brain Science. All mice were housed in individually ventilated cages, 20–26 °C, 30–70% relative humidity, with a 12-h light:dark cycle. Mice were transcardially perfused with a fixative mixture of 2.5% paraformaldehyde, 1.25% glutaraldehyde, and 2 mM calcium chloride, in 0.08 M sodium cacodylate buffer, pH 7.4. A thick (1,200-μm) slice was cut with a vibratome and post-fixed in perfusate solution for 12–48 h. Slices were extensively washed and prepared for reduced osmium treatment based on the protocol of Hua et al.76. All steps were performed at room temperature, unless indicated otherwise. Osmium tetroxide (2%, 78 mM) with 8% v/v formamide (1.77 M) in 0.1 M sodium cacodylate buffer, pH 7.4, for 180 min, was the first osmication step. Potassium ferricyanide 2.5% (76 mM) in 0.1 M sodium cacodylate, 90 min, was then used to reduce the osmium. The second osmium step was at a concentration of 2% in 0.1 M sodium cacodylate, for 150 min. Samples were washed with water, then immersed in thiocarbohydrazide (TCH) for further intensification of the staining (1% TCH (94 mM) in water, 40 °C, for 50 min). After washing with water, samples were immersed in a third osmium immersion of 2% in water for 90 min. After extensive washing in water, lead aspartate (Walton's (20 mM lead nitrate in 30 mM aspartate buffer, pH 5.5), 50 °C, 120 min) was used to enhance contrast. After two rounds of water wash steps, samples proceeded through a graded ethanol dehydration series (50%, 70%, 90% w/v in water, 30 min each at 4 °C, then 3× 100%, 30 min each at room temperature). Two rounds of 100% acetonitrile (30 min each) served as a transitional solvent step before proceeding to epoxy resin (EMS Hard Plus). Epoxy was cured at 60 °C for 96 h before unmolding and mounting on microtome sample stubs for trimming. Though empty resin increases the number of folds in resulting sections, we left some resin so as to keep the upper layers (L1 and L2) intact to preserve inter-areal connectivity and the apical tufts of pyramidal neurons. Similarly, white matter was also maintained in the block to preserve inter-areal connections despite the risk of increased sectioning artefacts that then have to be corrected through proofreading. The sections were then collected at a nominal thickness of 40 nm using a modified ATUMtome63 (RMC/Boeckeler) onto 6 reels of grid tape45. The knife was cleaned every 100–500 sections, occasionally leading to the loss of a very thin partial section (≪40 nm). Thermal expansion of the block as sectioning resumed post-cleaning resulted in a short series of sections substantially thicker than the nominal cutting thickness. The sectioning took place in two sessions, the first session took 8 consecutive days on a 24 h a day schedule and contained sections 1 to 14773. The loss rate on this initial session was low, but before section 7931 there were two events that led to consecutive section loss (due to these consecutive section losses we decided to not reconstruct the region containing sections 1 to 7931 even though the imagery was collected). The first event that led to consecutive section loss was due to sections being collected onto apertures with damaged films. To prevent this from happening again, we installed a camera that monitors the aperture before collection. Because this could lead to severe section artefacts, we paused to trim additional empty resin from the block and also replaced the knife. The second session lasted five consecutive days and an additional 13,199 sections were cut. Due to the interruption, block shape changes and knife replacement, there are approximately 45 partial sections at the start of this session; importantly, these do not represent tissue loss (see stitching and alignment section). As will be described later, the EM dataset is subdivided into two subvolumes due to sectioning and imaging events that resulted in loss of a series of sections. The parallel imaging pipeline described here63 converts a fleet of TEMs into high-throughput automated image systems capable of 24/7 continuous operation. It is built upon a standard JEOL 1200EXII 120 kV TEM that has been modified with customized hardware and software. The key hardware modifications include an extended column and a custom electron-sensitive scintillator. A single large-format CMOS camera outfitted with a low distortion lens is used to grab image frames at an average speed of 100 ms. The autoTEM is also equipped with a nano-positioning sample stage that offers fast, high-fidelity montaging of large tissue sections and an advanced reel-to-reel tape translation system that accurately locates each section using index barcodes for random access on the GridTape. In order for the autoTEM system to control the state of the microscope without human intervention and ensure consistent data quality, we also developed customized software infrastructure piTEAM that provides a convenient GUI-based operating system for image acquisition, TEM image database, real-time image processing and quality control, and closed-loop feedback for error-detection and system protection etc. During imaging, the reel-to-reel GridStage moves the tape and locates targeting aperture through its barcode. Images along with metadata files are transferred to the data storage server. We perform image quality control on all the data and reimage sections that fail the screening. Pixel sizes for all systems were calibrated within the range between 3.95 and 4.05 nm per pixel and the montages had a typical size of 1.2 mm × 0.82 mm. The most commonly used was a 20-megapixel camera that required 5,000 individual tiles to capture the 1 mm2 montage of each section. During the dataset acquisition, three autoTEMs were upgraded with 50-megapixel camera sensors, which increased the frame size and reduced the total number of tiles required per montage to ~2,600 A nonlinear transformation of higher order is computed for each section using a set of 10 × 10 highly overlapping images collected at regular intervals during imaging64. The lens distortion correction transformations should represent the dynamic distortion effects from the TEM lens system and hence require an acquisition of highly overlapping calibration montages at regular intervals. Overlapping image pairs are identified within each section and point correspondences are extracted for every pair using a feature based approach. In our stitching and alignment pipeline, we use SIFT (scale invariant feature transform) feature descriptors to identify and extract these point correspondences. Per image transformation parameters are estimated by a regularized solver algorithm. Deforming the tiles within a section based on these transformations results in a seamless registration of the section. A down-sampled version of these stitched sections are produced for estimating a per section transformation that roughly aligns these sections in 3D. A process similar to 2D stitching is followed here, where the point correspondences are computed between pairs of sections that are within a desired distance in z direction. MIPmaps are utilized throughout the stitching process for faster processing without compromise in stitching quality. The rough aligned volume is rendered to disk for further fine alignment. The software tools used to stitch and align the dataset are available in our github repository (https://github.com/AllenInstitute/asap-modules). The volume assembly process is entirely based on image metadata and transformations manipulations and is supported by the Render service (https://github.com/saalfeldlab/render). Cracks larger than 30 μm in 34 sections were corrected by manually defining transforms. The smaller and more numerous cracks and folds in the dataset were automatically identified using convolutional networks trained on manually labelled samples using 64 × 64 × 40 nm3 resolution image. The same was done to identify voxels which were considered tissue. The rough alignment was iteratively refined in a coarse-to-fine hierarchy77, using an approach based on a convolutional network to estimate displacements between a pair of images78. Displacement fields were estimated between pairs of neighbouring sections, then combined to produce a final displacement field for each image to further transform the image stack. Pixels in a partial section which were not included in the tissue mask were set to the value of the nearest pixel in a higher-indexed section that was considered tissue. This composite image was used for downstream processing, but not included with the released images. Remaining misalignments were detected by cross-correlating patches of image in the same location between two sections, after transforming into the frequency domain and applying a high-pass filter. Combining with the tissue map previously computed, a mask was generated that sets the output of later processing steps to zero in locations with poor alignment. This is called the segmentation output mask. Using the outlined method79, a convolutional network was trained to estimate inter-voxel affinities that represent the potential for neuronal boundaries between adjacent image voxels. A convolutional network was also trained to perform a semantic segmentation of the image for neurite classifications, including: (1) soma plus nucleus; (2) axon; (3) dendrite; (4) glia; and (5) blood vessel. Following the described methods80, both networks were applied to the entire dataset at 8 × 8 × 40 nm3 in overlapping chunks to produce a consistent prediction of the affinity and neurite classification maps. The affinity map was processed with a distributed watershed and clustering algorithm to produce an over-segmented image, where the watershed domains are agglomerated using single-linkage clustering with size thresholds81,82. The over-segmentation was then processed by a distributed mean affinity clustering algorithm81,82 to create the final segmentation. We augmented the standard mean affinity criterion with constraints based on segment sizes and neurite classification maps during the agglomeration process to prevent neuron-glia mergers as well as axon–dendrite and axon–soma mergers. These synaptic cleft predictions were segmented using connected components, and components smaller than 40 voxels were removed. A separate network was trained to perform synaptic partner assignment by predicting the voxels of the synaptic partners given the synaptic cleft as an attentional signal83. This assignment network was run for each detected cleft, and coordinates of both the presynaptic and postsynaptic partner predictions were logged along with each cleft prediction. To evaluate precision and recall, we manually identified synapses within 70 small subvolumes (n = 8,611 synapses) spread throughout the dataset84. Following the methods described previously80, a nucleus prediction map was produced on the entire dataset at 64 × 64 × 40 nm3. The nucleus prediction was thresholded at 0.5, and segmented using connected components. Extensive manual, semi-automated, and fully automated proofreading of the segmentation data was performed by multiple teams to improve the accuracy of the neural circuit reconstruction. Critical to enabling these coordinated proofreading activities is the central ChunkedGraph system1,85,86, which maintains a dynamic segmentation dataset, and supports real-time collaborative proofreading on petascale datasets though scalable software interfaces to receive edit requests from various proofreading platforms and support querying and analysis on edit history. Multiple proofreading platforms and interfaces were developed and leveraged to support the large-scale proofreading activities performed by various teams at Princeton University, the Allen Institute for Brain Science, Baylor College of Medicine, the Johns Hopkins University Applied Physics Laboratory, and ariadne.ai (individual proofreaders are listed in Acknowledgements). Below we outline the methods for these major proofreading activities focused on improving the completeness of neurons within and proximal to the main cortical column, splitting of merged multi-soma objects distributed throughout the image volume, and distributed application of automated proofreading edits to split erroneously merged neuron segments. Following the methods described previously26,85,87 proofreaders from Princeton University, the Allen Institute for Brain Science, Baylor College of Medicine, and ariadne.ai used a modified version of Neuroglancer with annotation capabilities as a user interface to make manual split and merge edits to neurons with somata spatially located throughout the dataset. Proofreading was aided by on-demand highlighting of branch points and tips on user-defined regions of a neuron based on rapid skeletonization (https://github.com/AllenInstitute/Guidebook). This approach quickly directed proofreader attention to potential false merges and locations for extension, as well as allowed a clear record of regions of an arbor that had been evaluated. For dendrites, we checked all branch points for correctness and all tips to see if they could be extended. False merges of simple axon fragments onto dendrites were often not corrected in the raw data, since they could be computationally filtered for analysis after skeletonization (see next section). Detached spine heads were not comprehensively proofread. Dendrites that were proofread are identified in CAVE table proofreading_status_and_strategy as status_dendrite = “true”. For axons, we began by ‘cleaning' axons of false merges by looking at all branch points. The different proofreading strategies were as follows: Comprehensive extension: each axon end and branch point was visited and checked to see if it was possible to extend until either their biological completion or reached an incomplete end (incomplete ends were due to either the axon reaching the borders of the volume or an artefact that curtailed its continuation). Substantial extension: each axon branch point was visited and checked, many but not all ends were visited and many but not all ends were done. Inter_areal_extension: a subset of axons that projected either from a HVA to V1, or from V1 to a HVA were preferentially extended to look specifically at inter-areal connections. Local cylinder cutting: a subset of pyramidal cells were proofread in a local cylinder which had a 300-μm radius centred around the column featured in Schneider-Mizell et al.4. Any axon leaving the cylinder was cut and At least 100 synapses: axons were extended until at least 100 synapses were present on the axon to get a sampling of their output connectivity profile. Axons that were proofread are identified in CAVE table proofreading_status_and_strategy as status_axon=‘true' and the proofreading strategy label associated with each axon is described in the column ‘strategy_axon'. Proofreading was also performed to correctively split multi-soma objects containing more than one neuronal soma, which had been incorrectly merged from the agglomeration step in the reconstruction process. This proofreading was performed by the Johns Hopkins University Applied Physics Laboratory, Princeton University, the Allen Institute for Brain Science, and Baylor College of Medicine. These erroneously merged multi-soma objects were specifically targeted given their number, distribution throughout the volume, and subsequent impact on global neural connectivity88 (Extended Data Fig. As an example, multi-soma objects comprised up to 20% of the synaptic targets for 78 excitatory cells that with proofreading status ‘comprehensive extension'. Although the majority of multi-soma objects contained 2 to 25 nuclei (Extended Data Fig. 3a), one large multi-soma object contained 172 neuronal nuclei due to proximity to a major blood vessel present in a substantial portion of the image volume. Different Neuroglancer web-based applications1,85,86,88 were used to perform this proofreading, but most edits were performed using NeuVue88. NeuVue enables scalable task management across dozens of concurrent users, as well as provide efficient queuing, review, and execution of proofreading edits by integrating with primary data management APIs such as CAVE and PCG. Multi-soma objects used to generate proofreading tasks were originally identified using the nucleus detection table available through CAVE. Additionally, algorithms were employed in a semi-automated workflow to detect the presence of incorrect merges and proposed potential corrective split locations in the segmentation for proofreaders to review and apply2. Following methods described elsewhere2 automated error-detection and error-correction methods were utilized using the Neural De-composition (NEURD) framework to apply edits to split incorrectly merged axonal and dendritic segments distributed across the image volume. These automated methods leveraged graph filter and graph analysis algorithms to accurately identify errors in the reconstruction and generate corrective solutions. Validation and refinement of these methods were performed through manual review of proposed automated edits through the NeuVue platform88. We initially manually matched 2,934 fiducials between the EM volume and the 2P structural dataset (1,994 somata and 942 blood vessels, mostly branch points, which are available as part of the resource). Though the fiducials cover the total volume of the dataset it is worth noting that below 400 µm from the surface there is much lower signal to noise in the 2P structural dataset requiring more effort to identify somata, therefore we made use of more vascular fiducials. To evaluate the error of the transform we evaluated the distance in micrometres between the location of a fiducial after co-registration and its original location; a perfect co-registration would have residuals of 0 μm. The full 3D transform is a list of eight transforms that fall into four groups with different purposes: The first group is a single transform that is a second-order polynomial transform between the two datasets. This first group serves to scale and rotate the optical dataset into EM space, followed by a single global nonlinear term, leaving an average residual of ~10 µm. These trends are spaced in a way that is indicative of changing shape of the EM data on approximately the length scale between knife cleanings or tape changes. We addressed this with a transform that binned the data into z ranges and applied a further second-order polynomial to each bin. We did this in a 2-step hierarchical fashion, first with 5 z bins, followed by a second with 21 z bins. The third group is a set of hierarchical thin plate spline transforms. The idea here is to account for deformations on larger length scales first, so that the highest order transforms introduce smaller changes in position. The average residuals in these steps were 3.9, 3.5, 3.1 and 2.9 µm accomplished with average control point motions of 12.5, 7.5, 3.8 and 1.6 µm. The final group is a single thin plate spline transform. The control points for this transform are no longer an evenly spaced grid. This transform minimizes the residuals almost perfectly (as it should for the control points which are identical to the fiducials; 0.003 µm on average; Fig. 3) and accomplishes this final step by moving each data point on average another 2.9 µm. This last transform is very sensitive to error in fiducial location but provides the co-registration with minimal residuals. This last transform is also more likely to create errors in regions with strong distortions, as for example the edges of the dataset. We created 2,933 3D transforms, each time leaving out one fiducial and then evaluated the residual of the left-out point. We call this measure ‘leave-one-out' residuals and it evaluates how well the transform does with a new point. A custom user interface was used to visualize images from both the functional data and EM data side-by-side to manually associate functional ROIs to their matching EM cell counterpart and vice versa. To visualize the functional scans, summary images were generated by averaging the scan over time (average image) and correlating pixels with neighbour pixels over time (correlation image). The product of the average and correlation images were used to clearly visualize cell body locations. EM imagery and EM nucleus segmentation was resized to 1 μm3 resolution, and transformed into the 2P structural stack coordinates using the co-registration transform, allowing an EM image corresponding to the registered field to be extracted. The overlay of the extracted vessel field and extracted EM image were used to confirm local alignment of the vasculature visible in both domains. Using the tool, matchers generated a hypothesis for which EM cell nucleus matched to a given functional unit or vice versa. A custom version of Neuroglancer (Seung laboratory; https://github.com/seung-lab/neuroglancer) was used to visualize the region of interest in the ground truth EM data for match confirmation. The breakdown in the number of unique neuron matches per 2P scan is shown in Extended Data Fig. The resulting matches are uploaded to CAVE table coregistration_manual_v4. The latest recommended manual match table can be found at https://www.microns-explorer.org/cortical-mm3#f-coreg. In addition to the matches, the manual co-registration table includes two metrics that help assess confidence. Smaller residuals and larger separation scores indicate higher confidence matches, as is the case for a majority of matches (Extended Data Fig. To help validate the manual matches, for every EM neuron that was independently matched to at least two scans, the in vivo signal correlation (correlation between trial-averaged responses to oracle stimuli) was computed between the matched unit in scan A to the matched unit in scan B. In addition, for each neuron, two control correlations were computed, the matched unit in scan A to the nearest unit not matched to the neuron in scan B, and vice versa (Extended Data Fig. As expected, the distribution of oracle scores between the matched neurons and control neurons are qualitatively similar, with a slight right-shift towards higher oracle scores for matches, as higher oracle scores were prioritized for matching (Extended Data Fig. The comparison of signal correlation between matched neurons and their control counterparts exhibits a strong trend, with a clustering in the upper left quadrant and most data points positioned above the diagonal. This indicates that the matched neurons consistently have stronger signal correlations compared to their nearest counterparts, and high signal correlation overall, especially when the oracle score is larger (Extended Data Fig. Filtering by oracle score further refines the trend, highlighting that high oracle score neurons (score >0.2) show even more distinct separation, with matched neurons maintaining superior signal correlation values compared to the nearest-neighbour matches (Extended Data Fig. The latest recommended fiducial-based automatch table can be found at https://www.microns-explorer.org/cortical-mm3#f-coreg. To achieve co-registration starting with the 2P structural stacks and EM segmentation and without the use of fiducials, we employed a multi-scale B-spline registration91 using only vasculature data. This non-rigid transformation method corrects the extreme nonlinear tissue distortions caused by shrinkage from 2P to EM. Both the EM segmentation and the 2P structural stack volumes were subsampled to match 1-μm voxel resolution, ensuring consistent scaling and indexing between the volumes. Pre-processing on the vessels was necessary to address inconsistent signal quality in the 2P data, especially for vessels located deeper in the cortex, which emit lower fluorescence. An additional filtering step mitigated discrepancies in z resolution and errors from false splits in the EM segmentation. To address the z direction smearing in 2P due to anisotropy, both the 2P and EM volumes were binarized, skeletonized and further processed by removing small isolated segments. A Gaussian filter was convolved over the skeletons, forming tubes of constant radius for co-registration. Another round of skeletonization and Gaussian filtering was applied to correct for false splits in thicker vessels. Initially, centroid alignment was achieved via template matching within a small subvolume. Despite tissue shrinkage, the volumes were locally aligned well enough to yield good correlations. The B-spline transformation was performed across multiple scales, progressing from coarse grids with strong smoothing to finer grids with minimal smoothing. The Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimizer with 600 iterations was used, sampling 1% of the points to handle large matrices. The resulting flow field and its inverse defined how each voxel mapped between spaces. Minimum weight matching was performed (as described in ‘Generating the fiducial-based automatch') to establish match assignments, using excitatory neuron centroids from the CAVE table aibs_metamodel_mtypes_v661_v2. The latest recommended vessel-based automatch table can be found at https://www.microns-explorer.org/cortical-mm3#f-coreg. To generate the fiducial-vessel agreement automatch table, first, for each table described above (coregistration_auto_phase3_fwd, apl_functional_coreg_vess_fwd), the residual and separation scores were transformed into percentiles. Then, the two tables were merged on keys: ‘session', ‘scan_idx', ‘field', ‘unit_id' and ‘target_id'. To evaluate the automatch tables, we computed precision and recall using manual matches as ground truth. To ensure a fair comparison, we first restricted both the automatch and manual match tables to only contain rows where the functional unit or EM neuron was commonly attempted. For calculating precision and recall, true positives were rows common to both tables, false positives were rows only in the automatch table, and false negatives were rows only in the manual match table. The precision-recall curves can be used to select an automatch, and/ or a metric with which to threshold matches (Extended Data Fig. In addition, heat maps are provided indicating precision levels (Extended Data Fig. 5b) and number of automatches remaining (Extended Data Fig. 5c) for jointly applied residual and separation percentile thresholds. To apply a threshold, first convert the residual and separation (named ‘score' in the table) to percentiles. Conversely, for separation, apply the threshold as a minimum. We analysed the nucleus segmentations for features such as volume, surface area, fraction of membrane within folds and depth in cortex. We trained a support vector machine (SVM) machine classifier to use these features to detect which nucleus detections were likely neurons within the volume, with 96.9% precision and 99.6% recall. This model predicted 82,247 neurons detected within the larger subvolume. Dimensionality reduction on this feature space revealed a clear separation between neurons with well-segmented somatic regions (n = 69,957) from those with fragmented segmentations or sizable merges with other objects (n = 12,290). Combining those features with the nucleus features, we trained a multi-layer perceptron classifier to distinguish excitatory from inhibitory neurons among the well-segmented subset, using the 80% of the manual labelled data as a training set, and 20% as a validation set to choose hyper-parameters. We estimate from this test that the classifier had an overall accuracy of 97% with an estimated 96% precision and 94% recall for inhibitory calls. Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. Dorkenwald, S. et al. CAVE: connectome annotation versioning engine. Celii, B. et al. NEURD offers automated proofreading and feature extraction for connectomics. Weis, M. A. et al. An unsupervised map of excitatory neurons' dendritic morphology in the mouse visual cortex. Schneider-Mizell, C. M. et al. Inhibitory specificity from a connectomic census of mouse visual cortex. The synaptic architecture of layer 5 thick tufted excitatory neurons in the visual cortex of mice. Elabbady, L. et al. Perisomatic ultrastructure efficiently classifies cells in mouse cortex. Gamlin, C. R. et al. Connectomics of predicted Sst transcriptomic types in mouse visual cortex. Fu, J. et al. Pattern completion and disruption characterize contextual modulation in mouse visual cortex. Ding, Z. et al. Bipartite invariance in mouse primary visual cortex. Ding, Z. et al. Functional connectomics reveals general wiring rule in mouse visual cortex. Wang, E. Y. et al. Foundation model of neural activity predicts response to new stimulus types. Seung, H. S. Connectome: How the Brain's Wiring Makes Us Who We Are (HMH, 2012). Gilbert, C. D. & Wiesel, T. N. Morphology and intracortical projections of functionally characterised neurones in the cat visual cortex. & Whitteridge, D. Form, function and intracortical projections of spiny neurones in the striate visual cortex of the cat. & Nelson, J. C. Synaptic output of physiologically identified spiny stellate neurons in cat visual cortex. Tanaka, K. Cross-correlation analysis of geniculostriate neuronal relationships in cats. Reid, R. C. & Alonso, J. M. Specificity of monosynaptic connections from thalamus to visual cortex. Ko, H. et al. Functional specificity of local synaptic connections in neocortical networks. Wertz, A. et al. Single-cell-initiated monosynaptic tracing reveals layer-specific cortical network modules. Bock, D. D. et al. Network anatomy and in vivo physiology of visual cortical neurons. Lee, W.-C. A. et al. Anatomy and function of an excitatory network in the visual cortex. Turner, N. L. et al. Reconstruction of neocortex: Organelles, compartments, cells, circuits, and activity. A. et al. Digital museum of retinal ganglion cells with dense anatomy and physiology. Vishwanathan, A. et al. Electron microscopic reconstruction of functionally identified cells in a neural integrator. Cossell, L. et al. Functional organization of excitatory synaptic strength in primary visual cortex. Macrina, T. et al. Petascale neural circuit reconstruction: automated methods. Zheng, Z. et al. A complete electron microscopy volume of the brain of adult Drosophila melanogaster. Dorkenwald, S. et al. Neuronal wiring diagram of an adult brain. Schlegel, P. et al. Whole-brain annotation and multi-connectome cell typing of Drosophila. Large-scale video classification with convolutional neural networks. Inception loops discover what excites neurons most using deep predictive models. Theer, P., Hasan, M. T. & Denk, W. Two-photon imaging to a depth of 1000 microm in living brains by use of a Ti:Al2O3 regenerative amplifier. Dorkenwald, S. et al. Multi-layered maps of neuropil with segmentation-guided contrastive learning. Albertson, D. G., Thompson, J. N. & Brenner, S. The pharynx of Caenorhabditis elegans. Ohyama, T. et al. A multilevel multimodal circuit enhances action selection in Drosophila. Scheffer, L. K. et al. A connectome and analysis of the adult Drosophila central brain. Phelps, J. S. et al. Reconstruction of motor control circuits in adult Drosophila using automated transmission electron microscopy. The connectome of an insect brain. Fushiki, A. et al. A circuit mechanism for the propagation of waves of muscle contraction in Drosophila. Complete connectomic reconstruction of olfactory projection neurons in the fly brain. Marin, E. C. et al. Connectomics analysis reveals first-, second-, and third-order thermosensory and hygrosensory neurons in the adult Drosophila brain. Harris, K. M. et al. A resource from 3D electron microscopy of hippocampal neuropil for user training and tool development. Schmidt, H. et al. Axonal synapse sorting in medial entorhinal cortex. Morgan, J. L., Berger, D. R., Wetzel, A. W. & Lichtman, J. W. The fuzzy logic of network connectivity in mouse visual thalamus. Hua, Y. et al. Connectomic analysis of thalamus-driven disinhibition in cortical layer 4. Molnár, G. et al. Complex events initiated by individual spikes in the human cerebral cortex. Testa-Silva, G. et al. Human synapses show a wide temporal window for spike-timing-dependent plasticity. Shapson-Coe, A. et al. A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution. Lyu, C., Abbott, L. F. & Maimon, G. Building an allocentric travelling direction signal via vector computation. Transforming representations of movement from body- to world-centric space. A., Genoud, C., Masudi, T., Siksou, L. & Friedrich, R. W. Dense EM-based reconstruction of the interglomerular projectome in the zebrafish olfactory bulb. Yin, W. et al. A petascale automated imaging pipeline for mapping neuronal circuits with high-throughput transmission electron microscopy. Mahalingam, G. et al. A scalable and modular automated pipeline for stitching of large electron microscopy datasets. A. et al. Anatomical characterization of Cre driver mice for neural circuit mapping and manipulation. Daigle, T. L. et al. A suite of transgenic driver and reporter mouse lines with enhanced brain-cell-type targeting and functionality. & Svoboda, K. A large field of view two-photon mesoscope with subcellular resolution for in vivo imaging. Garrett, M. E., Nauhaus, I., Marshel, J. H. & Callaway, E. M. Topography and areal organization of mouse visual cortex. Swanson, L. W. Brain Maps 4.0—Structure of the Rat Brain: an open access atlas with global nervous system nomenclature ontology and flatmaps. EASE: EM-assisted source extraction from calcium imaging data. Giovannucci, A. et al. CaImAn an open source tool for scalable calcium imaging data analysis. Yatsenko, D. et al. DataJoint: managing big scientific data using MATLAB or Python. A. et al. MICrONS two photon functional imaging (version 0.230307.2132) [data set]. Stimulus domain transfer in recurrent models for large scale cortical population prediction on video. 32nd International Conference on Neural Information Processing Systems 7199–7210 (ACM, 2018). Cotton, R. J., Sinz, F. H. & Tolias, A. S. K-shot prediction of neural responses. Hua, Y., Laserstein, P. & Helmstaedter, M.Large-volume en-bloc staining for electron microscopy-based connectomics. Registering large volume serial-section electron microscopy image sets for neural circuit reconstruction using FFT signal whitening. Popovych, S. et al. Petascale pipeline for precise alignment of images from serial section electron microscopy. Lee, K., Zung, J., Li, P., Jain, V. & Seung, H. S. Superhuman accuracy on the SNEMI3D connectomics challenge. Wu, J., Silversmith, W. M., Lee, K. & Seung, H. S. Chunkflow: hybrid cloud processing of large 3D images by convolutional nets. & Seung, H. S. Large-scale image segmentation based on distributed clustering algorithms. & Seung, H. S. Image segmentation by size-dependent single linkage clustering of a watershed basin graph. Turner, N. L. et al. Synaptic partner assignment using attentional voxel association networks. CONFIRMS: a toolkit for scalable, black box connectome assessment and investigation. Dorkenwald, S. et al. Binary and analog variation of synapses between cortical pyramidal neurons. Dorkenwald, S. et al. FlyWire: online community for whole-brain connectomics. Structure and function of axo-axonic inhibition. Xenes, D. et al. NeuVue: a framework and workflows for high-throughput electron microscopy connectomics proofreading. Crouse, D. F. On implementing 2D rectangular assignment algorithms. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Unser, M. Splines: a perfect fit for signal and image processing. Meijering, E. et al. Design and validation of a tool for neurite tracing and analysis in fluorescence microscopy images. Lowekamp, B. C., Chen, D. T., Ibáñez, L. & Blezek, D. The design of SimpleITK. Silversmith, W. et al. seung-lab/cloud-volume: Zenodo Release v1 version 5.3.2. The authors thank D. Markowitz, the IARPA MICrONS Program Manager, who coordinated this work during all three phases of the MICrONS programme; IARPA programme managers J. Vogelstein and D. Markowitz for co-developing the MICrONS programme; J. Wang, IARPA SETA for her assistance; J. Philips, S. Coulter and the Program Management team at the AIBS for their guidance for project strategy and operations; H. Zeng, E. Lein, C. Koch and A. Jones for their support and leadership; the Manufacturing and Processing Engineering team at the AIBS for their help in implementing the EM imaging and sectioning pipeline; B. Youngstrom, S. Kendrick and the Allen Institute IT team for support with infrastructure, data management and data transfer; the facilities, finance and legal teams at the AIBS for their support on the MICrONS contract; S. Saalfeld, K. Khairy and E. Trautman for help with the parameters for 2D stitching and rough alignment of the dataset; Z. Hanson and J. Singh for their contribution to manual matching of functional ROIs to EM nuclei; D. Kim for his contribution to pupil tracking; R. Raju for his contribution to parametric stimuli development; A. Mok and D. Ouzounov for their contribution to three-photon imaging development; G. McGrath for computer system administration; M. Husseini, L. Jackel and J. Jackel for project administration at Princeton University; S. Hider, T. Gion, D. Pryor, D. Kleissas, L. Rodriguez, M. Wilt and the team from the John Hopkins University Applied Physics Laboratory (APL), as well as Marysol Encarnación and Martha Cervantes from the CIRCUIT Program at APL for supporting data assessments on the neural circuit reconstruction and data infrastructure through the Brain Observatory Storage Service and Database (BossDB; https://bossdb.org/; NIH/NIMH R24 MH114785); F. Chance, B. Aimone and everyone at Sandia National Laboratories for their support and assistance; the ‘Connectomics at Google' team for developing Neuroglancer and computational resource donations, in particular J. Maitin-Shepard for authoring Neuroglancer and help creating the reformatted sharded multi-resolution meshes and imagery files used to display the data; Amazon, the AWS Open Data Program, and the AWS Open Science platform for providing data and computational resources; and Intel for their assistance. The work was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DoI/IBC) contract numbers D16PC00003, D16PC00004, D16PC0005 and 2017-17032700004. The US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. also acknowledges support from NIH/NINDS U19 NS104648, NIH/NEI R01 EY027036, NIH/NIMH U01 MH114824, NIH/NIMH U01 MH117072 NIH/NINDS R01 NS104926, NIH/NIMH RF1 MH117815, NIH/NIMH RF1 MH123400 and the Mathers Foundation, as well as assistance from Google, Amazon and Intel. acknowledges support from NSF CAREER grant IOS-1552868. and A.T. acknowledge support from NSF NeuroNex grant 1707400. A.T. acknowledges support from National Institute of Mental Health and National Institute of Neurological Disorders And Stroke under award number U19MH114830. We thank the Allen Institute for Brain Science founder, Paul G. Allen, for his vision, encouragement and support. Disclaimer: the views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/IBC, or the US Government. J. Alexander Bae, Manuel A. Castro, Sven Dorkenwald, Jay Gager, Akhilesh Halageri, James Hebditch, Zhen Jia, Chris Jordan, Nico Kemnitz, Selden Koolman, Kai Kuehner, Kisuk Lee, Ran Lu, Thomas Macrina, Eric Mitchell, Shanka Subhra Mondal, Merlin Moore, Shang Mu, Barak Nehoran, Oluwaseun Ogedengbe, Sergiy Popovych, H. Sebastian Seung, Ben Silverman, William Silversmith, Amy Sterling, Nicholas L. Turner, Adrian Wanner, Sarah Williams, Kyle Willie, Ryan Willie, William Wong, Jingpeng Wu, Runzhe Yang & Szi-chieh Yu Mahaly Baptiste, Maya R. Baptiste, Victoria Brooks, Brendan Celii, Erick Cobos, Paul G. Fahey, Emmanouil Froudarakis, Sarah McReynolds, Elanine Miranda, Taliah Muhammad, Christos Papadopoulos, Stelios Papadopoulos, Saumil Patel, Guadalupe Jovita Yasmin Perez Vega, Xaq Pitkow, Anthony Ramos, Jacob Reimer, Zachary M. Sauter, Fabian H. Sinz, Cameron L. Smith, Zheng H. Tan, Andreas S. Tolias, Edgar Y. Walker, Dimitri Yatsenko & Fei Ye Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA Mahaly Baptiste, Maya R. Baptiste, Victoria Brooks, Brendan Celii, Erick Cobos, Paul G. Fahey, Emmanouil Froudarakis, Sarah McReynolds, Elanine Miranda, Taliah Muhammad, Christos Papadopoulos, Stelios Papadopoulos, Saumil Patel, Guadalupe Jovita Yasmin Perez Vega, Xaq Pitkow, Anthony Ramos, Jacob Reimer, Zachary M. Sauter, Fabian H. Sinz, Cameron L. Smith, Zheng H. Tan, Andreas S. Tolias, Edgar Y. Walker, Dimitri Yatsenko & Fei Ye Research and Exploratory Development Department, Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA Caitlyn A. Bishop, William Gray-Roncal, Justin Ellis-Joyce, Lindsey M. Kitchell, Jordan Matelsky, Patricia K. Rivlin, Victoria Rose, Brock A. Wester & Daniel Xenes Allen Institute for Brain Science, Seattle, WA, USA Agnes L. Bodor, Derrick Brittain, JoAnn Buchanan, Daniel J. Bumbarger, Forrest Collman, Nuno Maçarico da Costa, Bethanny Danskin, Leila Elabbady, Tim Fliss, Clare Gamlin, Emily Joyce, Daniel Kapner, Sam Kinn, Gayathri Mahalingam, Erika Neace, R. Clay Reid, Casey M. Schneider-Mizell, Rachael Swanstrom, Shelby Suckow, Marc Takeno, Russel Torres, Grace Williams, Wenjing Yin, Rob Young & Chi Zhang Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA Brendan Celii, Xaq Pitkow & Andreas S. Tolias Sven Dorkenwald, Zhen Jia, Kai Li, Thomas Macrina, Barak Nehoran, Sergiy Popovych, H. Sebastian Seung, Nicholas L. Turner & Runzhe Yang Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA, USA Department of Computer Science, Rice University, Houston, TX, USA NSF AI Institute of Artificial and Natural Intelligence, New York, NY, USA Department of Ophthalmology, Byers Eye Institute, Stanford Bio-X, Wu Tsai Neurosciences Institute, Human-Centered Artificial Intelligence Institute, Department of Electrical Engineering, Stanford University, Stanford, CA, USA School of Applied and Engineering Physics, Cornell University, Ithaca, NY, USA Writing, review and editing: J.A.B., M.A.C., A.H., Z.J., C.J., N.K., K. Lee, K. Li, R.L., E. Mitchell, S. Mu, S.S.M., B.N., O.O., S.P., W.S., N.L.T., R.W., W.W., J.W., R.Y., A.B., D.B., J.B., M.T., R.T., G.M., D.B., W.Y., L.E., D.K., T.F., E.F., S.P., C.X., T.W., E.C., C.L.S., A.R., T.M., P.K.R., J.J., D.X., C.A.B. Correspondence to Forrest Collman, Nuno Maçarico da Costa, Xaq Pitkow, R. Clay Reid, Jacob Reimer, H. Sebastian Seung or Andreas S. Tolias. S. Seung and T. Macrina disclose a competing interest in ZettaAI; J. Reimer and A. S. Tolias disclose a competing interest in Vathes. The other authors declare no competing interests. Nature thanks Costas Anastassiou, Aravinthan Samuel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Location and distribution of test subvolumes (x = 5.5 µm, y = 5.5 µm, z = 5.5 µm) throughout the whole subvolume 65 that were used for validation of automated synaptic contact segmentation. Identification and annotation of synaptic contacts (n = 8,611 synapses) were performed manually within each subvolume and compared with automated results to calculate subvolume and combined precision (96%), recall (89%), and F1 scores (92%), with test subvolume F1 scores visualized by color within each plot. Distribution of neurons with complete dendritic proofreading highlighted in blue, and neurons with clean axons in orange. Cells with complete dendritic proofreading have often had some axon edits as well, so this is an upper bound on the number of edits required to fully extend dendrites. Most cells have had very little proofreading and have been mostly touched by automated methods. Note plot is on a log-log scale. For all the clean axons for which we have cell type annotations, the number of edits versus the number of outputs is plotted on a log log scale. Data points are colored with respect to their broad cell class. Generally, more extensively reconstructed axons have more edits, but there are also strong cell and cell-type specific effects. Note that this shows the number of neurons per ID, which means that non-neuron somas are not counted. This figure was derived using the soma classification table: nucleus_ref_neuron_svm. Note that a small number of multi-soma IDs were skipped during APL proofreading because they contain low quality neurons merged to myelinated axons or they were falsely classified as neuronal (e.g. blood vessels); (B) Spatial distribution of multi-neuron ID soma centers (soma locations of merged cells containing ≥ 2 neuronal nuclei) before APL proofreading and after. Both are a lateral view of the volume that shows distribution across layers, from pia (top) to white matter (bottom). a) The number of matched neuronal EM nuclei by session/ scan b) Schematic of the residual and separation score metric. c) 2D histogram of separation score and residual. d) Schematic of in vivo signal correlation analysis (see Methods). f) Scatter plot of signal correlations for all matched units (y-axis) vs the signal correlations for the nearest unit controls (x-axis) and colored by oracle score. Note that each matched unit pair has two data points on the plot for each of the two control correlations. g) Same as in f) restricted to matched units with oracle >0.2. a) Precision-recall curves showing performance relative to manual matches (used as the ground truth) across residual (left) and separation percentiles (right) for fiducial-based, vessel-based, and fiducial-vessel agreement automatch methods. b) Heatmaps of max residual percentile and min separation percentile colored by precision relative to manual matches, for the fiducial-based (left), vessel-based (middle) and fiducial-vessel agreement (right) automatches. Max residual percentile represents the threshold below which matches were included, while min separation percentile represents the threshold above which matches were included. c) Heatmaps of max residual percentile and min separation percentile colored by the number of neurons remaining after thresholds were applied, for the fiducial-based (left), vessel-based (middle) and fiducial-vessel agreement (right) automatches. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Functional connectomics spanning multiple areas of mouse visual cortex. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Retroelements have a critical role in shaping eukaryotic genomes. For instance, site-specific non-long terminal repeat retrotransposons have spread widely through preferential integration into repetitive genomic sequences, such as microsatellite regions and ribosomal DNA genes1,2,3,4,5,6. Despite the widespread occurrence of these systems, their targeting constraints remain unclear. Here we use a computational pipeline to discover multiple new site-specific retrotransposon families, profile members both biochemically and in mammalian cells, find previously undescribed insertion preferences and chart potential evolutionary paths for retrotransposon retargeting. We identify R2Tg, an R2 retrotransposon from the zebra finch, Taeniopygia guttata, as an orthologue that can be retargeted by payload engineering for target cleavage, reverse transcription and scarless insertion of heterologous payloads at new genomic sites. We enhance this activity by fusing R2Tg to CRISPR–Cas9 nickases for efficient insertion at new genomic sites. Through further screening of R2 orthologues, we select an orthologue, R2Tocc, with natural reprogrammability and minimal insertion at its natural 28S site, to engineer SpCas9H840A–R2Tocc, a system we name site-specific target-primed insertion through targeted CRISPR homing of retroelements (STITCHR). STITCHR enables the scarless, efficient installation of edits, ranging from a single base to 12.7 kilobases, gene replacement and use of in vitro transcribed or synthetic RNA templates. Inspired by the prevalence of nLTR retrotransposons across eukaryotic genomes, we anticipate that STITCHR will serve as a platform for scarless programmable integration in dividing and non-dividing cells, with both research and therapeutic applications. This is a preview of subscription content, access via your institution Get Nature+, our best-value online-access subscription Receive 51 print issues and online access Prices may be subject to local taxes which are calculated during checkout High-throughput sequencing data have been deposited in the NCBI Sequencing Read Archive database under accession PRJNA1223444. Expression plasmids are available from Addgene under the UBMTA; support information and computational tools are available at https://www.abugootlab.org/. All other data are available from the corresponding authors upon reasonable request. Goodier, J. L. & Kazazian, H. H. Jr Retrotransposons revisited: the restraint and rehabilitation of parasites. & Fujiwara, H. The wide distribution and change of target specificity of R2 non-LTR retrotransposons in animals. Kojima, K. K. & Fujiwara, H. Long-term inheritance of the 28S rDNA-specific retrotransposon R2. Fujiwara, H. et al. Introns and their flanking sequences of Bombyx mori rDNA. Roiha, H., Miller, J. R., Woods, L. C. & Glover, D. M. Arrangements and rearrangements of sequences flanking the two types of rDNA insertion in D. melanogaster. Kojima, K. K. & Fujiwara, H. Evolution of target specificity in R1 clade non-LTR retrotransposons. Burke, W. D., Malik, H. S., Lathe, W. C. III & Eickbush, T. H. Are retrotransposons long-term hitchhikers? Malik, H. S., Burke, W. D. & Eickbush, T. H. The age and evolution of non-LTR retrotransposable elements. Eickbush, T. H. in Mobile DNA II (eds Craig, N. L. et al.) 813–835 (ASM, 2002). Fujiwara, H. in Mobile DNA III (eds Chandler, M. et al.) 1147–1163 (ASM, 2015). Christensen, S. M. & Eickbush, T. H. R2 target-primed reverse transcription: ordered cleavage and polymerization steps by protein subunits asymmetrically bound to the target DNA. Han, J. S. Non-long terminal repeat (non-LTR) retrotransposons: mechanisms, recent developments, and unanswered questions. Harnessing eukaryotic retroelement proteins for transgene insertion into human safe-harbor loci. Targeted gene knockin in zebrafish using the 28S rDNA-specific non-LTR-retrotransposon R2Ol. & Fujiwara, H. Sequence-specific retrotransposition of 28S rDNA-specific LINE R2Ol in human cells. Chen, Y. et al. All-RNA-mediated targeted gene integration in mammalian cells with rationally engineered R2 retrotransposons. Wilkinson, M. E., Frangieh, C. J., Macrae, R. K. & Zhang, F. Structure of the R2 non-LTR retrotransposon initiating target-primed reverse transcription. & Mantovani, B. Non-LTR R2 element evolutionary patterns: phylogenetic incongruences, rapid radiation and the maintenance of multiple lineages. Kojima, K. K. Structural and sequence diversity of eukaryotic transposable elements. Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Yang, J., Malik, H. S. & Eickbush, T. H. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. & Eickbush, T. H. End-to-end template jumping by the reverse transcriptase encoded by the R2 retrotransposon. Processing and translation initiation of non-long terminal repeat retrotransposons by hepatitis delta virus (HDV)-like self-cleaving ribozymes. Precise genome editing without exogenous donor DNA via retron editing system in human cells. & Fraser, H. B. Bacterial retrons enable precise gene editing in human cells. & Margolis, R. L. Prolonged arrest of mammalian cells at the G1/S boundary results in permanent S phase stasis. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Zheng, C. et al. Template-jumping prime editing enables large insertion and exon rewriting in vivo. Yarnall, M. T. N. et al. Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases. The zinc fingers of HIV nucleocapsid protein NCp7 direct interactions with the viral regulatory protein Vpr. Kojima, K. K. & Fujiwara, H. An extraordinary retrotransposon family encoding dual endonucleases. Edgar, R. C. Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Eddy, S. R. Accelerated profile HMM searches. & Tyson, G. W. OrfM: a fast open reading frame predictor for metagenomic data. Mistry, J. et al. Pfam: the protein families database in 2021. Lu, S. et al. CDD/SPARCLE: the conserved domain database in 2020. A. et al. New and continuing developments at PROSITE. Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Steenwyk, J. L., Buida, T. J. III, Li, Y., Shen, X.-X. & Rokas, A. ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – approximately maximum-likelihood trees for large alignments. Yu, G. Using ggtree to visualize data on tree-like structures. Hu, J. et al. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification–mediated high-throughput genome-wide translocation sequencing. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. The Sequence Alignment/Map format and SAMtools. Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. We thank P. Reginato, D. Weston and E. Boyden for support with MiSeq instrumentation; K. Holden for Synthego sgRNAs; S. Levine and the MIT BioMicro Center for Pacific Biosciences sequencing library preparation and sequencing; PhoenixBio for providing primary human hepatocytes (PXB cells); X. D. Chen for retrotransposon analysis; N. Willis and S. Khoramian Tusi for Southern blot advice and protocols; R. Desimone and J. Crittenden for support and discussions; and members of the Abudayyeh–Gootenberg lab for support and advice. is supported by a grant from the Simons Foundation International to the Simons Center for the Social Brain at MIT. is supported by a Swiss National Science Foundation postdoc mobility fellowship. H.N is supported by JSPS KAKENHI grant 21H05281, the Takeda Medical Research Foundation and the Inamori Research Institute for Science. M.H is supported by JSPS KAKENHI grant 23K14133, the Takeda Medical Research Foundation and JST, ACT-X grant JPMJAX232F. are supported by NIH grants 1R21-AI149694, R01-EB031957, R01-AG074932 and R56-HG011857; the McGovern Institute Neurotechnology (MINT) program; the K. Lisa Yang and Hock E. Tan Center for Molecular Therapeutics in Neuroscience; the G. Harold & Leila Y. Mathers Charitable Foundation; the NHGRI Technology Development Coordinating Center Opportunity Fund; the MIT John W. Jarve (1978) Seed Fund for Science Innovation; Impetus Grants; a Cystic Fibrosis Foundation pioneer grant; Google Ventures; FastGrants; the Harvey Family Foundation; Winston Fu; and the McGovern Institute. These authors contributed equally: Christopher W. Fell, Lukas Villiger, Justin Lim, Masahiro Hiraizumi These authors jointly supervised this work: Omar O. Abudayyeh, Jonathan S. Gootenberg Department of Medicine, Division of Engineering in Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Christopher W. Fell, Dario Tagliaferri, Kaiyi Jiang, Alisan Kayabolen, Cian Schmitt-Ulms, Harsh Ramani, Omar O. Abudayyeh & Jonathan S. Gootenberg Gene and Cell Therapy Institute, Mass General Brigham, Cambridge, MA, USA Christopher W. Fell, Dario Tagliaferri, Kaiyi Jiang, Alisan Kayabolen, Cian Schmitt-Ulms, Harsh Ramani, Omar O. Abudayyeh & Jonathan S. Gootenberg Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA Christopher W. Fell, Dario Tagliaferri, Kaiyi Jiang, Alisan Kayabolen, Cian Schmitt-Ulms, Harsh Ramani, Omar O. Abudayyeh & Jonathan S. Gootenberg McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA, USA Christopher W. Fell, Lukas Villiger, Justin Lim, Dario Tagliaferri, Matthew T. N. Yarnall, Anderson Lee, Kaiyi Jiang, Alisan Kayabolen, Rohan N. Krajeski, Cian Schmitt-Ulms, Sarah M. Yousef, Omar O. Abudayyeh & Jonathan S. Gootenberg Integrated DNA Technologies, Coralville, Iowa, USA Structural Biology Division Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan Inamori Research Institute for Science, Kyoto, Japan You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar You can also search for this author inPubMed Google Scholar conceived the study and participated in the design, execution and analysis of experiments. developed computational pipelines for retrotransposon discovery. performed computational analysis of sequencing experiments. participated in the analysis of biochemical experiments. wrote the manuscript with help from all authors. was decided by a coin toss. Correspondence to Omar O. Abudayyeh or Jonathan S. Gootenberg. are co-founders of Terrain Biosciences, Doppler Bio and Transit Therapeutics. All other authors declare no competing interests. Nature thanks Todd Macfarlan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. a) Schematic of computational pipeline used to discover and classify site-specific nLTR retrotransposon systems. b) Size distribution of the ORFs from the first methionine for each of the 5 families of RLE containing nLTR retrotransposons. c) Distribution of distances from candidate retrotransposons to detected Rfam annotation or tandem repeat targets for each of the 5 families of RLE containing nLTR retrotransposons. d) Distribution of the predicted 5′ and 3′ UTR sizes for all nLTR RLE-containing retrotransposons. UTR sizes are predicted based on the distance from the ORF and nearest predicted target site. Box plots are shown with the median, 25th percentile, 75th percentile, and whiskers that are 1.5x the interquartile range. All outliers are shown as individual points. n(5'UTR) = 10,033; n(3'UTR) = 7642. e) Distribution of the lengths of observed non-coding conservation regions flanking the 5′ and 3′ ends of the retrotransposon ORF. Box plots are shown with the median, 25th percentile, 75th percentile, and whiskers that are 1.5x the interquartile range. All outliers are shown as individual points. n(5'UTR) = 3307; n(3' UTR) = 6472. f) Schematic of typical nLTR retrotransposon insertion sites with target sites consistent on both sides of the retrotransposon. a) DNA sequence alignments of nLTR families with divergent target preferences in the non-coding areas surrounding the nLTR ORFs. Identified Rfam annotations in the surrounding locus are highlighted. b) Multiple sequence alignment of different nLTR retrotransposons using MUSCLE, with Pfam domain schematic above as determined by HHpred. c) Analysis of sequence identity similarity of chosen nLTR retrotransposon family members using the MUSCLE protein alignment from Extended Data Fig. b) Schematic of payload homology and target sites used to evaluate nLTR1Mbr insertion. c) Gluc payload insertion by nLTR1Mbr into a panel of luciferase reporters, as quantified by luciferase production, with R2Tg targeting the R2 28S sequence as control. Reporters with either similarity to the R2 28S region, or with similarity to the 28S homology region in the nLTR1Mbr locus are used for evaluation of alternative insertion sites. d) Phylogenetic tree of nLTR retrotransposons zoomed in on the R2Tocc system and surrounding orthologs. Tree branches corresponding to avian genomes are highlighted in blue and orthologs used in this study are labeled. e) Heatmap of 28S luciferase reporter assay, testing integration by R2Bm, R2Tg, R2Mes and R2TgRTmut (x axis) using RNA payloads containing UTRs from different retrotransposon ortholog systems (y axis). Synthetic eblocks containing editing and unedited DNA sequences were mixed at defined ratios (x-axis) and measured by NGS (y-axis). Agreement between the known editing percentage (x-axis) and measured editing percentage was calculated by linear regression and is shown as an inset. Schematic above shows the relationship of the three NGS primers to the inserted sequence where one forward primer is in the genomic sequence upstream and there is one reverse primer in the insert and one reverse primer in the downstream genomic sequence. g) Timecourse of biochemical TPRT by R2Tg into 28S DNA with or without RNA payloads with different incubation times, as indicated. h) NGS insertion quantification of TPRT shown in Extended Data Fig. i) Electrophoretic Mobility Shift Assay gel showing the shift of the 28S DNA target due to binding of the R2Tg protein alone or R2Tg-RNA complex. The bottom strand is 5′ labeled. j) Biochemical insertion of Gluc sequence into the 28S target with a payload containing only homology arms to the 28S locus and no UTRs with +/–payload RNA, +/– 28S DNA, +/– R2 protein, +/– Mg2+ and +/– dNTPs, as indicated. Above, NGS quantitation of insertion efficiency and schematic of the used RNA payloads. Insertion frequency is quantified by NGS. l) Biochemical TPRT by R2Tg into 28S DNA using RNA payloads with or without 5′ cap and/or 3′ poly-A tail modifications as well as no RNA payload control. m) NGS insertion quantification of TPRT shown in Extended Data Fig. n) Primer extension assay by WT R2Tg, RLE inactivated R2TgD1275A, and no protein, where 28S RNA payload and complementary primer were hybridized and extended by reverse transcription activity of the R2Tg protein. Error bars represent mean +/− (c, e) s.e.m. a) Gluc payload insertion by R2Tg reverse transcriptase domain deletions, RLE inactivation mutants (D1275A), and reverse transcriptase mutations (R2TgF876A/A877L/D878A/D879A/L880A/V881A/L882A, RTmut), at the 28S locus luciferase reporter target, as quantified by luciferase activity. Luciferase activity was assayed in HEK293FT cells. b) Gluc payload insertion by R2Tg RT domain mutations, including R2TgF876A/A877L/D878A/D879A/L880A/V881A/L882A (RTmut), R2TgD878R/D879R, and R2TgD878H/D879H, and the RLE inactivation mutant (D1275A) at the 28S locus luciferase reporter, as quantified by luciferase. Luciferase activity was assayed in HEK293FT cells. c) Uncropped version of the gels shown in Fig. 2g; Above, RNA payload insertion into a 28S plasmid reporter by wild type R2Tg, RLE inactivated, RT inactivated, and complemented RT and RLE inactivated proteins +/– RNA payload, as indicated. NT, non-targeting RNA templates that have homology to the NOLC1 target instead of the 28S locus. Below, R2Tg insertion into human 28S endogenous locus with payloads containing 100, 50, 30 or 0 homology to the 28S target site. d) Luciferase assay of Gluc insertion of an IVT RNA payload with variable 3′ tail length into a 28S reporter target by WT R2Tg and RLE-inactivated R2TgD1275A. Luciferase activity was assayed in Huh-7 cells. e) Sanger sequencing of 5′ and 3′ insertion junctions at the 28S target for additional selected payload designs after R2Tg integration. Payload numbers correspond to those in Fig. f) Example indels at the 5′ junction for R2Tg insertion at the 28S target for selected payloads. Non-templated Cs from reverse transcription in the bottom strand (G in the top strand) are highlighted with red boxes. g) Example indels at the WT 28S locus target for selected payloads. Non-templated Cs from reverse transcription in the bottom strand (G in the top strand) are highlighted with red boxes. Error bars represent mean +/− (a,b,d) s.e.m. a) Gaussia luciferase exon 2 (Gluc) payload insertion by wild type and domain inactivated mutants of R2Tg into a 28S plasmid reporter, with editing outcomes profiled by NGS at the upstream (left) junction. Mutants tested are WT R2Tg and R2TgD1275A (RLE mutant) and outcomes are classified as perfect insertions, insertions with indels, or WT locus indels. b) Schematic of additional payload variant with internal homology arms against the 28S target. c) Gaussia luciferase exon 2 (Gluc) payload insertion by wild type R2Tg into a 28S plasmid reporter with payload variants shown in part B, with editing outcomes profiled by NGS at the upstream (left) junction. Outcomes are classified as perfect insertions, insertions with indels, or WT locus indels. d) Size analysis by gel electrophoresis of 5′ and 3′ insertion junctions at the 28S target reporter for payload designs from part (b) and (c) after R2Tg integration. Payload numbers correspond to those in B. e) Gluc exon 2 payload insertion by WT R2Tg, R2TgD1275A, or the RT domain deletion R2TgΔ(875-885) into a 28S plasmid reporter with payloads containing 28S or AAVS1 targeting homology arms, profiled by NGS. Statistics were calculated using unpaired t-test. f) Biochemical retrotransposition of different RNA payloads into the AAVS1 DNA target with the R2Tg protein, dNTPs, and MgCl2, as indicated. Either no payload was used or the following two payloads were used: 1) payload with a 5′ UTR targeting AAVS1 and containing a Gluc insert, or 2) a payload with 5′ and 3′ UTRs targeting NOLC1 and containing an EGFP insert. Labels on the gel indicate the specific TPRT product, DNA target band, and R2Tg produced nicked fragments. g) Validation of AAVS1 NGS method. Synthetic eblocks containing editing and unedited DNA sequences were mixed at defined ratios (x-axis) and measured by NGS (y-axis). Agreement between the known editing percentage (x-axis) and measured editing percentage was calculated by linear regression and is shown inset. h) Validation of the NOLC1 3-primer NGS assay using mixes of genomic DNA from unedited or heterozygously inserted cells at NOLC1, as measured by ddPCR. Shown is the known pre-mixed ratio of edited and unedited gDNA (x-axis) vs the measured editing rate by NGS (y-axis). Inset, coefficient of determination between values on x- and y-axes. i) Schematic of payload engineering for R2Tg reprogramming to the NOLC1 locus. j) EGFP payload insertion at human endogenous NOLC1 locus by natural reprogrammed wild-type R2Tg as well as R2TgD1275A and R2TgRTmut. k) Payload insertion by SpCas9H840A-R2TgΔ1-183 or SpCas9H840A-R2TgΔ1-183,D1275A into the endogenous NOLC1 locus, mediated by dual guides or non-targeting guides and quantified by ddPCR. Inset shows payload design and locus schematic with homology arms colored and top guide in red and bottom guide in blue. l) Secondary structure analysis of the 5′ UTR of R2Tg, including the full length, 15 nt truncated variant, and the 15 nt truncated variant with the 50 nt 28S homology sequence upstream. m) Validation of the 3-primer NGS assay for analysis of AAVS1 integration via the left insertion junction. Standards consist of edited and WT amplicons that are mixed in the listed ratios (x-axis) and the measured editing is determined by the 3-primer NGS assay (y-axis). n) Gluc integration at the endogenous AAVS1 locus via the SpCas9H840A-R2TgΔ1-183 fusion using payloads with the full length or 15-nt truncated 5′ UTR, an upstream 28S 50 nt sequence, and internal AAVS1 homology arms. Integration is quantified by next-generation sequencing (left) and ddPCR (right). a) Biochemical retrotransposition of an RNA payload into the NOLC1 DNA target with and without withdrawal of R2Tg protein, RNA, dNTPs, SpCas9/guides, or MgCl2, as indicated. Above, NGS quantification of insertion efficiency and a schematic of the RNA payload used. Gel is stained with SYBR gold for visualization of nucleic acid. b) Biochemical retrotransposition of an RNA payload into the NOLC1 DNA target with and without withdrawal of R2Tg protein, RNA, dNTPs, SpCas9/guides, or MgCl2, as indicated. The DNA top strand is Cy5 labeled (red) and bottom strand is FAM labeled (green), allowing for visualization by fluorescence. Labels on the gel indicate the specific TPRT product, DNA target band, and R2Tg produced nicked fragments. c) Reprogrammed biochemical retrotransposition by R2Tg into the NOLC1 DNA target, using a homologous IVT NOLC1 payload (N) with +/– 5′ cap and 3′ tail modifications compared to EMX1 (E)- or 28-homologous (28S) payloads (i.e. non-homologous to NOLC1). Integration is quantified by NGS. d) Reprogrammed biochemical retrotransposition of an IVT RNA payload containing the optimized 5′ and 3′ UTR and homology regions into the AAVS1 DNA target by R2Tg +/– DNA target, +/– RNA, +/– Cas9-assisted nicking, and +/– R2Tg, as indicated. The blue arrow denotes the cleaved DNA band generated by R2Tg protein alone reprogrammed by its payload RNA. f) Integration efficiencies, quantified by NGS, of reprogrammed biochemical TPRT of an RNA payload by R2Tg into varying amounts of NOLC1 DNA target compared to no RNA controls. g) Integration efficiencies, quantified by NGS, of reprogrammed biochemical TPRT by R2Tg using NOLC1 RNA payloads incorporating either different single-base mismatches or insertions into the NOLC1 DNA, as indicated. Either in vitro transcribed mRNA or synthetic RNA templates were used as the payloads. h) Biochemical retrotransposition of an RNA payload into the AAVS1 DNA target with and without withdrawal of R2Tg protein, RNA, dNTPs, or MgCl2, as indicated. Above, NGS quantification of insertion efficiency and a schematic of the RNA payload used. Labels on the gel indicate the specific TPRT product, DNA target band, and R2Tg produced nicked fragments. i) Schematic of DNA cleavage end detection using ligation and NGS. Ligation adaptor primers (shown in black) are used in combination with anchored primers on either the left (red) or right end (blue) are used to read out the variable R2Tg cleavage sites. j) Cleavage end detection by next-generation sequencing of the R2Tg generated nicks on the AAVS1 target from Extended Data Fig. 6h in the condition without dNTPs. k) Cleavage end detection by next-generation sequencing of the R2Tg generated nicks on the AAVS1 target from Extended Data Fig. l) Biochemical retrotransposition of an RNA payload into the NOLC DNA target with and without withdrawal of R2Tg protein, RNA, dNTPs, or MgCl2, as indicated. Labels on the gel indicate the specific TPRT product, DNA target band, and R2Tg produced nicked fragments. m) Cleavage end detection by next-generation sequencing of the R2Tg generated nicks on the NOLC1 target from Extended Data Fig. 6l in the condition without dNTPs but with RNA template. Below each plot is a schematic of the NOLC1 target (black) and the homology arms of the payload template (beige and gray). n) Cleavage end detection by next-generation sequencing of the R2Tg generated nicks on the NOLC1 target from Extended Data Fig. 6l in the condition without RNA template. Below each plot is a schematic of the NOLC1 target (black) and the homology arms of the payload template (beige and gray). a) Schematic of SpCas9H840A fused to N- and C-terminal truncations of R2Tg at different amino acid positions. Not all tested constructs are shown. b) Gluc payload insertion by different SpCas9H840A-R2TgΔ1-183 fusions, according to the schematic in (a), into the endogenous AAVS1 locus quantified by NGS. N-term and C-term denote either N-terminal or C-terminal fusions of the full length R2Tg protein. Denoted residue positions indicate the starting amino acid position of N-terminal R2Tg truncations that are fused to the C-terminal of SpCas9H840A. c) Gluc integration at the endogenous AAVS1 target by SpCas9H840A-R2TgΔ1-183, SpCas9H840A-R2TgΔ1-183,F876A/A877L/D878A/D879A/L880A/V881A/L882A (RTmut), and SpCas9H840A-R2TgΔ1-183,Δ(875-885), and SpCas9H840A alone. Editing rates were quantified by NGS (left) and ddPCR (right). d) TPRT activity in HEK293FT cells with SpCas9H840A alone or fused to R2Tg, R2TgΔ1-183,F876A/A877L/D878A/D879A/L880A/V881A/L882A (RTmut), or R2TgΔ1-183,Δ875-885 into the NOLC1 genomic target with dual guides. EGFP payload contains the full 5′ and 3′ UTRs for R2Tg. e) Gluc payload insertion into a 28S plasmid reporter in HEK293FT cells by selected nLTR retrotransposons fused to SpCas9H840A, with either targeting or non-targeting guides, quantified by Gluc production normalized to a control Cluc. f) Gluc payload insertion into the endogenous AAVS1 locus in HEK293FT cells by selected nLTR retrotransposons fused to SpCas9H840A, with either targeting or non-targeting guides, profiled by next generation sequencing. Editing outcomes are quantified as perfect insertions, insertions with indels, and indels at the unmodified WT target site. g) Gluc payload insertion into the endogenous AAVS1 locus in HEK293FT cells by selected nLTR retrotransposons fused to so SpCas9H840A and an AAVS1-targeting or non-targeting sgRNA control, quantified by ddPCR h) Validation of the AAVS1 3-primer NGS assay using mixes of genomic DNA from unedited or heterozygously inserted cells at AAVS1, as measured by ddPCR. Shown is the known pre-mixed ratio of edited and unedited gDNA (x-axis) vs the measured editing rate by NGS (y-axis). Inset, coefficient of determination between values on x- and y-axes. i) R2Tocc retrotransposition of a synthetic RNA payload into top- and bottom-strand labeled 28S DNA. The top strand is FAM labeled (red); the bottom strand is Cy5 labeled (green). Above, a reference sequence consisting of correct insertion of the Gluc payload into AAVS1 DNA. Below, a schematic of the different insertion outcomes found by sequencing, the raw number of reads and % of total reads which these correspond to. Error bars represent mean +/− (b,g) s.d. or (d, e, f, h) s.d. a) Expression of wild type and mutant R2Tg orthologs (x-axis), quantified by luciferase signal. b) Schematic of STITCHR insertion using intron-containing templates in the following subpanels. An EGFP STITCHR payload containing an interrupting intron is expressed by a CAG promoter. After RNA splicing and TPRT, it is inserted into the genome as an uninterrupted EGFP ORF. Shown are the GFP cargo (green bar, approximately 500 bp), interrupting intron (USF1, 245 bp, tetrahymena self-splicing intron, 399 bp), homology sequences (yellow bar, 50 bp each), poly-A tail, genomic sequence (grey bar), external F and R NGS primers (black) and internal reverse primer (blue). c) NGS evaluation of insertion at AAVS1 (left) and EMX1 (right) loci after delivering a plasmid template containing GFP with an interrupting self-splicing tetrahymena intron. Shown is the % insertion of GFP lacking the interrupting intron (i.e. spliced insertion) by SpCas9H840A-R2ToccΔ1-169 or SpCas9H840A. d) ddPCR evaluation of AAVS1 insertion after delivering a plasmid template containing an interrupting USF1 intron, which interrupts in two locations in the payload. The ddPCR assay used detects spliced insertion only. e) Gluc reconstitution by correction of a 20 bp deletion by delivering plasmid or synthetic RNA payloads, quantified by Gluc expression normalized to control Cluc. f-g) Gluc reconstitution by R2Tg mutants with synthetic RNA payloads extended off the guide RNA as quantified by NGS (f) and Gluc (g) expression normalized to control Cluc. h) STITCHR 20 bp payload insertion on a luciferase reporter plasmid from a synthetic RNA lacking the 5′ UTR and containing a Cas9 guide scaffold (fused) or a synthetic RNA delivered in trans containing the 5′ UTR and the correction sequence (trans). Editing is with or without a Cas9 nicking guide that allows for initiation of TPRT for the trans template. Integration is quantified by NGS and is represented as perfect insertions or insertions with indels. WT indels are also shown which are defined as indels at the unintegrated Gluc locus. i) STITCHR 22 bp payload insertion on an EGFP reporter plasmid from a synthetic RNA lacking the 5′ UTR and containing a Cas9 guide scaffold (fused) or a synthetic RNA delivered in trans containing the 5′ UTR and the correction sequence (trans). Editing is with or without a Cas9 nicking guide that allows for initiation of TPRT for the trans template. Integration is quantified by NGS and is represented as perfect insertions or insertions with indels. WT indels are also shown which are defined as indels at the unintegrated plasmid reporter. j) STITCHR 20 bp payload insertion on a luciferase reporter plasmid from a synthetic RNA lacking the 5′ UTR and containing a Cas9 guide scaffold (fused), a synthetic RNA delivered in trans containing the 5′ UTR and the correction sequence (trans with UTR), and a synthetic RNA delivered in trans containing the correction sequence without a UTR (trans without UTR). SpCas9H840A-R2TgΔ1-183 and SpCas9H840A-R2TgΔ1-183,RTmut are compared to each other and editing is performed +/− a Cas9 nicking guide that allows for initiation of TPRT for the trans template. Integration is quantified by NGS. k) STITCHR 38 bp payload insertion at the endogenous LMNB1 locus from a synthetic RNA lacking the 5′ UTR and containing a Cas9 guide scaffold. l) STITCHR 700 bp EGFP payload insertion in Huh-7 cells at the endogenous NOLC1 locus from an in vitro transcribed mRNA, insertion is quantified by ddPCR. m) Indels or substitutions found in the sequencing reads of NOLC1 insertion experiment shown in Extended Data Fig. Above, a reference sequence consisting of correct insertion of the Gluc payload into AAVS1 DNA. Below, a schematic of the different insertion outcomes found by sequencing, the raw number of reads and % of total reads which these correspond to. n) Insertion of a GFP payload delivered as an IVT mRNA without UTRs into the human endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169 in Huh-7 cells. Insertion is quantified by NGS and is represented as perfect insertions or insertions with indels. WT indels are also shown which is defined as indels at the unintegrated NOLC1 locus. o) Insertion of a GFP payload delivered as an IVT mRNA with UTRs and other variable modifications, as indicated, into the human endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169 in HEK293FT cells. Insertion is quantified by NGS. p) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus, with combinations of single and dual guides, compared to a non-targeting guide control and quantified by NGS. q) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous LMNB1 locus, with combinations of single and dual guides, compared to a non-targeting guide control and SpCas9H840A alone. Editing was quantified by digital droplet PCR (ddPCR). r) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus, with combinations of single and dual guides, compared to a non-targeting guide control and profiled by ddPCR. s) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous EMX1 locus, with combinations of single and dual guides, compared to a non-targeting guide control and SpCas9H840A alone. Editing quantified by digital droplet PCR (ddPCR). t) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous AAVS1 locus, with combinations of single and dual guides, compared to SpCas9H840A alone, SpCas9H840A-R2ToccΔ1-169,RTmut, and SpCas9. u) Comparison of ddPCR and NGS quantification of EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous AAVS1 locus, with combinations of single and dual guides. Error bars represent mean +/− (a, f, g, h, i, l, n, o, p, q, s, t, u) s.e.m or (c, d, e, j, k) s.d. Editing is quantified by ddPCR. c) GFP insertion by SpCas9H840A-R2ToccΔ1-169 (WT), SpCas9H840A-R2ToccΔ1-169,RLEmut, and SpCas9H840A at the endogenous NOLC1 target site. d) EGFP payload insertion by SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus, using payloads with 50 nt homology arms targeting NOLC1 or AAVS1 targets, or without homology. Payloads are evaluated with single, dual, or non-targeting guides and are compared to SpCas9H840A. f) GFP insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus in HepG2 cells, compared to SpCas9H840A. g) STITCHR EGFP insertion at endogenous EMX1, NOLC1 and two AAVS1 loci in Huh-7 cells by SpCas9H840A-R2ToccΔ1-169 compared to SpCas9H840A-R2ToccΔ1-169,RTmut. h) STITCHR EGFP insertion at endogenous EMX1 and NOLC1 loci in HepG2 cells by SpCas9H840A-R2ToccΔ1-169 compared to SpCas9H840A-R2ToccΔ1-169,RTmut. i) EGFP insertion at endogenous NOLC1 by STITCHR, delivered by different adenovirus amounts to HEK293FT cells. Shown is a comparison of insertion efficiency when delivering STITCHR machinery with one vector and guides and template with the other, compared to delivery of guides and template only as a control. j) EGFP insertion by SpCas9H840A-R2ToccΔ1-169 at NOLC1 in quiescent primary human hepatocyte cells compared to SpCas9H840A control. 1.4e11 viral copies was used in the dual vector condition; half of that for the single vector payload only condition. k) EGFP payload insertion by STITCHR at the NOLC1 endogenous locus in HEK293FT cells, comparing editing efficiencies with and without PAM elimination. l) STITCHR EGFP insertion at the endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169 and SpCas9D10A-R2ToccΔ1-169. m) PacBio sequencing of a 700 bp EGFP insertion at the endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169. Reads are aligned to the expected reference sequences of scarless NOLC1 insertion. Gray bases indicate a match to the reference sequence; red or black indicate mismatched. n) PacBio sequencing of a 280 bp Gluc payload insertion at the endogenous AAVS1 locus by SpCas9H840A-R2ToccΔ1-169. Reads are aligned to the corresponding expected reference sequences of scarless AAVS1 inseriton. Gray bases indicate a match to the reference sequence; red or black indicate mismatched. o) Additional analysis of the PacBio long read sequencing for complete, incomplete, and concatemeric insertions at the respective sites. p) Schematic of the cross-junction ddPCR assay. Primers amplify across the central junction of a hypothetical concatemeric GFP insertion. Above, a single GFP insert in which the primers are facing in opposite directions and will not amplify. Below, a hypothetical concatemeric insertion in which the primers are facing each other across the concatemer junction, producing amplification. q) Cross-junction ddPCR readout of concatemers generated by STITCHR with SpCas9H840A-R2ToccΔ1-169 and a 700 bp GFP payload into the endogenous NOLC1 locus, benchmarked against synthetic standards r) Cross-junction ddPCR of genomic DNA standards generated from mixtures of genomic DNA from a heterozygous clone containing a 2x concatemeric GFP insertion at NOLC1 with gDNA from WT cells. gDNA was mixed at ratios corresponding to editing efficiencies ranging from 0.01% to 1% editing. Above, 4 possible insertion outcomes of GFP insertion at AAVS1 by STITCHR: a single insert, tail-to-head and two tail-to-tail concatemeric insertions. Other outcomes are possible that are not depicted (e.g. head-to-head, partial concatemers, >2x concatemers) but would still be detected by the ddPCR design. Shown are primers (black) and the probe (pink box) used in the assay, plus the site of the restriction enzyme Xho1 which separates any concatemers (below), detected as increasing positive droplet concentration. t) CNV ddPCR assay depicted in s), of 10 HEK293FT clones containing a monoallelic, scarless STITCHR insertion of GFP at AAVS1 (indicated with a dotted line), a HEK293FT clone (22n115) containing a tail-to-head 2x GFP insertion at NOLC1 and a negative control (22n22) containing no insertion. Each sample was assayed +/− Xho1 digestion. * = p < 0.05, statistics calculated with unpaired t-test. u) Design of two Southern blots detecting STITCHR inserts at AAVS1 and their expected outcomes. Shown are two designs: an internal probe (left) which hybridizes to the GFP insertion and an external probe (right) which hybridizes outside the insert. For both, shown are 3 possible editing outcomes and their expected sizes: a 2x monoallelic insertion, a 1x monoallelic insertion and no insertion. Other outcomes are possible that are not depicted but will alter the expected band sizes (e.g. 3x insertion, insertion with unexpected insertions/deletions). v) Southern blots of 10 HEK293FT clones containing a scarless GFP insert by STITCHR at AAVS1 and a negative clone (WT), utilizing an internal (above) or external (below) probe. Expected band sizes are indicated with a red (inserted) or black (uninserted) asterisk. (a, c, f, g, j, k, r, t) or s.e.m. a-b) Circos plots depicting genome-wide insertion sites of payloads by SpCas9H840A-R2ToccΔ1-169 using sgRNAs and payload homologies to a) AAVS1 (chr19) and b) NOLC1 (chr10). Counts are defined as the number of mapped reads occurring within a 5 kb window. c) Schematic of STITCHR using SpCas9H840A-R2ToccΔ1-169 to insert EGFP as a scarless in-frame fusion at the N-terminus of the human NOLC1 gene. The EGFP template is transcribed in a reverse complement manner to minimize background expression in the absence of insertion with 50 nt homology arms. d) STITCHR-mediated EGFP tagging of NOLC1, visualized by confocal microscopy, and compared to immunofluorescence staining of NOLC1. White scale bar denotes 10 µm. e) Therapeutically relevant payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous AAVS1 locus, with sizes and identities of payload panel members shown and 100 nt homology arms. Integration is quantified by ddPCR and compared to SpCas9H840A. For the NGS, shown are the number of total left junction inserts, left junction inserts containing indels and the WT locus containing indels. f) Evaluation of different sized edits using STITCHR at the NOLC1 locus using either SpCas9H840A-R2ToccΔ1-169 or SpCas9H840A. Inset shows payload design and locus schematic with homology arms colored and top guide in red and bottom guide in blue. g) PacBio HiFi long-read sequencing of a 12.7 kb insert at the endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169 using primers that land externally on the 5' side and within the insert on the 3' side. Reads are aligned to the corresponding expected reference sequences of scarless insertion at the NOLC1 locus. h) Additional analysis of the PacBio long read sequencing for the 12.7 kb insert at NOLC1 from Extended Data Fig. 10g showing complete, incomplete, and concatemeric insertions at the respective sites. i) Short read sequencing of the right junction of the same sample containing a 12.7 kb insert at NOLC1 used in Extended Data Fig. 10g, showing complete insertion of the right junction. j) Installation of small edits and insertions using STITCHR at the NOLC1 locus, using a U6 promoter for payload expression. k) SpCas9-mediated HDR editing of the EMX1 gene in cells treated with varying concentrations of aphidicolin. Genome editing is quantified by NGS. l) EGFP payload insertion efficiencies at endogenous NOLC1 locus by homology-directed repair (HDR), using SpCas9, at different concentrations of the cell cycle inhibitor aphidicolin or DMSO control. m) EGFP payload insertion (50 nt homology arms) by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous AAVS1 locus in cells treated with cell cycling inhibitor Mirin or double thymidine. Integration is quantified by NGS and compared to SpCas9H840A. n) SpCas9-mediated HDR editing of the EMX1 gene in cells treated with cell cycling inhibitor Mirin or double thymidine. Genome editing is quantified by NGS. Top guide is shown in red and the bottom guide in blue. p) Evaluation of STITCHR-replace at the NOLC1 locus using a single guide and homology arms spaced 50–150 bp apart on the genome. R2ToccRTmut corresponds to the RT inactivation mutant: F811A/A812L/D813A/D814A/L815A/V816A/L817A. q) Example sequencing reads of the EGFP insertion site at NOLC1 for STITCHR replace, showing the desired 50–150 bp deletions. r) ddPCR quantification of multiplexed gene integration by STITCHR with SpCas9H840A-R2ToccΔ1-169 at NOLC1 and AAVS1 sites. EGFP payload insertion at NOLC1 is quantified by ddPCR, and Gluc insertion at AAVS1 is quantified by NGS. Targeting conditions are compared to non-targeting guide controls. Error bars represent mean +/− s.e.m (d, e, k, l, m, n) or s.d. A full list of the mined R2 ORFs from Fig. 1, showing NCBI accession numbers, the species, the ORF protein sequence, Rfam annotations and distances to preferred insertion sites. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Fell, C.W., Villiger, L., Lim, J. et al. Reprogramming site-specific retrotransposon activity to new DNA sites. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.
While out on a stroll, the woman happened upon a roughly 900-year-old stash of more than 2,150 medieval silver coins known as denarii. “It was probably placed in its place during the first quarter of the 12th century, at a time of internal political instability,” he said. And the container certainly did it's job, even if the owner was never able to return for them—the coins weren't recovered for another 900 years. “But it was a huge amount, unimaginable for an ordinary person and at the same time unaffordable. It can be compared to winning a million in the jackpot.” The experts claim that a large collection of coins found in such a place could mean that they were originally meant to pay wages for soldiers, or were some sort of “war booty.” Early analysis of the haul shows both that the coins were minted in several places throughout the Kutnohorsk Region, and that they were likely created under the rule of three different Přemysl leaders (likely between 1085 and 1107): King Vratislav II and princes Břetislav II and Bořivoje II. The coins are made from an silver alloy that included copper, lead, and trace amounts of other metals. While we may never know the true intentions—or provenance—of the coin collection, experts still plan to puzzle out as much as possible. Mazačová said that museum staff will now register all the pieces of the collection, clean and restore the coins, and subject them to X-ray imaging and spectral analysis to determine their specific material composition. Tim Newcomb is a journalist based in the Pacific Northwest. He covers stadiums, sneakers, gear, infrastructure, and more for a variety of publications, including Popular Mechanics. A Student Sniffed Out an Ancient Circle of Stones A Metal Detectorist Dug Up Two Ancient Daggers
Scientists May Have Finally Found the Mysterious Animal Hosts of Mpox Now, an international team of scientists suggests that it has an answer: the fire-footed rope squirrel (Funisciurus pyrropus), a forest-dwelling rodent found in West and Central Africa. If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. By identifying the sources, scientists could work with local communities to design strategies to shield people from infection—for instance, safe handling of wild-animal meat. He and others who spoke to Nature, however, aren't sure that the study definitively establishes F. pyrropus as a monkeypox reservoir, but they applaud the long-term wildlife-surveillance work. Although mpox has affected Africa for decades, it captured headlines worldwide in 2022 when the virus sparked a global outbreak, fueled by human-to-human transmission. Last August, the World Health Organization declared another global emergency after a worrisome strain of the virus spread to previously unaffected African countries. As these outbreaks have become more common, one question on researchers' minds has been their animal sources. In late January that year, Carme Riutord-Fe, a disease ecologist at the Swiss Centre of Scientific Research in Abidjan, noticed an infant mangabey with red skin lesions on its forehead, chest and legs. The fluid-filled lesions, characteristic of mpox, quickly spread across its body, and it died two days later. For most outbreak investigations, scientists begin collecting animal samples weeks or months after the first reported cases. This makes it difficult to pinpoint disease origins, he says. His team has been monitoring several populations of free-living, non-human primates in the Taï forest on a daily basis since 2001 to better understand pathogens relevant to humans. When mpox struck in 2023, archived samples of the mangabeys' urine and faeces, as well as tissues and swabs from dead animals found in the forest, proved invaluable. Monkeypox virus showed up in faecal samples collected as early as 6 December 2022 from a mangabey called Bako—the mother of the infant that first drew researchers' attention. The first was that they observed mangabeys hunt and eat F. pyrropus. And finally, they identified F. pyrropus DNA in the earliest positive faecal sample from Bako. “It's unbelievable how well things fit together,” Leendertz says. Although scientists had occasionally found monkeypox virus in squirrels, this was the first evidence for cross-species transmission. However, Délia Doreen Djuicy, a disease ecologist at Centre Pasteur of Cameroon in Yaoundé, says that the jury is still out on whether fire-footed rope squirrels are a reservoir host, or whether they are merely a susceptible species that occasionally contracts monkeypox and transmits it. To prove that a species is a reservoir host, Djuicy says, there must be evidence that most of the animals can maintain and shed the virus without getting sick. But there is not yet proof of this for F. pyrropus, she adds. Other rodent species, such as pouched rats (Cricetomys spp. ), have been implicated in monkeypox transmission, too, Mbala says. Leendertz says his team will next investigate both ongoing and past monkeypox infections in small mammals, including squirrels, in the national forest. They will study how these animals use the forest habitat and interact with humans. Consuming wild animals is popular in many parts of Africa for complex reasons, including tradition, subsistence, civil unrest and commercial demand, Leendertz says. This article is reproduced with permission and was first published on April 8, 2025. Jane Qiu is an award-winning independent science writer in Beijing. She has won a Knight Science Journalism Fellowship, as well as awards from prestigious groups such as the American Association for the Advancement of Science and the Association of British Science Writers. First published in 1869, Nature is the world's leading multidisciplinary science journal. Nature publishes the finest peer-reviewed research that drives ground-breaking discovery, and is read by thought-leaders and decision-makers around the world.
If you ever bought a bottle of vitamins, you've probably seen supplements touting the benefits of royal jelly—a substance worker bees secrete from their glands—on the shelf nearby. And it turns out, there is a reason for the hype. While it is uncertain whether taking royal jelly capsules or slathering it on your face will slow down the aging process, we do know that queen bees can live up to 20 times longer than workers. Despite having identical DNA to worker bees, queen bees live longer, and humans want in on it, which is the reason all those products exist—and why a new research project is buzzing. The UK's Advanced Research and Invention Agency (ARIA) is funding deeper investigations into how queen bees are able to outlive generations of workers. Gorging on royal jelly isn't going to make you immortal, but the ways in which it affects the biology of queen bees may someday be applied to us. “If we're able to disentangle, and to reverse engineer, how nature has solved these challenges for them, that can be transformative for pausing aging, human fertility, transport of organs and provide new means of fighting disease,” Yannick Wurm, a newly appointed program director who will join seven others in this endeavor, said in a press release. This isn't the first time queen bees will be in the spotlight (like most royals), but it will build on previous studies that determined some potential reasons why queens live longer than anyone else in the hive. A 2024 study by researchers from the College of Animal Science and Technology at Shandong Agricultural University in Shandong, China, found that microbes in the gut of a queen bee allow her to live long past her workers because they inhibit insulin signaling. “One of the mechanisms by which queen bees live longer than worker bees would be reducing the degree of oxidative damage by upregulating antioxidant genes' expressions via inhibiting [insulin signaling],” the research team said in that study, published in Applied and Environmental Microbiology. The insulin signaling pathway is a metabolic pathway—a series of linked chemical reactions that allows insulin to increase the uptake of glucose, or how much goes into fat and muscle cells. Since Queens survive on royal jelly, they're not eating nearly as much sugar. Insulin signaling and antioxidant pathways were also found to be related. Royal jelly contains antioxidants, which reduce oxidative stress, or cell damage from free radicals—highly unstable and reactive oxygen molecules that can break down parts of DNA, potentially causing cancer and other diseases. In another 2024 study, published in Scientific Reports, a different team of researchers observed honeybee queens and saw that older queens had larger gut microbes, which suggested that there was a relationship between their gut microbiome and immune health. Her work has appeared in Popular Mechanics, Ars Technica, SYFY WIRE, Space.com, Live Science, Den of Geek, Forbidden Futures and Collective Tales. She lurks right outside New York City with her parrot, Lestat. When not writing, she can be found drawing, playing the piano or shapeshifting. We Totally Missed a Big Part of Our Immune System Humans May Be Able to Grow New Teeth in 6 Years
This More Than 380-Year-Old Trick Can Crack Some Modern Encryption Hardly anyone is interested in my tax return—there's not much to it. And that's a good thing, given that an attacker might have fairly easily intercepted the encrypted communication between my laptop and printer when I printed the return in recent years. In early 2022 information technology security researcher Hanno Böck discovered that some of these encryptions could be cracked in a process that he went on to describe in a 2023 preprint paperposted to the International Association for Cryptologic Research's Cryptology ePrint Archive. His method can be traced back to one developed by the French scholar Pierre de Fermat in the 17th century. Fermat—most famous for his mysterious “last theorem,” which vexed experts for decades—contributed all kinds of useful things to the world of science in his lifetime. For example, he laid the foundations for the theory of probability and also worked a lot on prime numbers—those values that are only divisible by 1 and themselves. If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. Mathematicians suspected they could use Fermat's work to break encryption—and Böck demonstrated that case. Modern encryption systems are based on difficult math problems. Prime numbers are often referred to as the atoms of number theory—indivisible building blocks from which the natural numbers are constructed. Any other number can be written as a unique product of primes, for example 15 = 3 × 5 or 20 = 2 × 2 × 5. For small values, it is easy to determine the prime divisors. So far, no computer program can quickly calculate the prime divisors of arbitrarily large numbers. This limitation is precisely what RSA cryptography exploits. To understand how that kind of protocol works, consider a simplified example, where RSA is used to encrypts data with the help of large numbers. Suppose a person wants to send the word SCIENCE, which consists of seven letters, to a recipient in encrypted form. To do this, they use a large seven-digit number such as 6,743,214 and shift each letter of SCIENCE by the respective digit—so S shifts six letters over to become Y, C shifts seven letters to become J, and so on. A sender can now dispatch this to another person without a listener being able to decode the message. Security is guaranteed by the fact that the sender and recipient each secretly use large prime numbers, which they multiply together, and only send each other the results of this calculation. But because that person can only intercept the products and cannot factorize them, the eavesdropper is helpless. Nearly four centuries ago, Fermat was working on related problems. He did this purely out of mathematical curiosity—at the time, no cryptographic methods for secure key exchange were known. And indeed, Fermat found a way to factorize even large numbers that are the product of two prime numbers. His method is not complicated; you can do it with a calculator (though Fermat, incidentally, did not have one). Fermat factorization works as follows: You take the number n, in this case 2,027,651,281, and take the root of it. Now you have to check whether the result is a square number. This time you add 2 to 45,030 and square the result, from which you subtract the original value n: 45,0322 – 2,027,651,281 = 229,743. Again, this is not a square number. Fermat must have had a lot of patience. In his example, you have to carry out the procedure a total of 12 times until you find a square number: 45,0412 – 2,027,651,281 = 1,040,400 = 1,0202. The equation y2 – n = x2 can be rearranged as y2 – x2 = n. The left-hand side corresponds to an equation known as the third binomial formula, (y – x)·(y + x) = n. This automatically factorizes the number n into two numbers y – x and y + x. In fact, this factorization method always works for odd n. But computers can only perform it fast enough if the two prime factors of n are not too far apart. And this was precisely the problem that Böck discovered in a program library used by various companies at the time. The prime numbers generated for encryption were not random enough, and the program often selected two prime numbers that were close to each other. This means that Fermat's factorization method can be used to circumvent the encryption. Böck realized that the printers of certain companies used such inadequate encryption. They used RSA cryptography, for example, to protect confidential documents that were sent to the printer via a network. After his finding in 2022, these companies issued alerts and fixes to address the problem. We can only hope that other companies have closed such security gaps. Fermat would never have dreamed that more than 380 years after his discovery, computers that rely on complicated principles of quantum mechanics for their calculations might make use of it. This article originally appeared in Spektrum der Wissenschaft and was reproduced with permission. Manon Bischoff is a theoretical physicist and an editor at Spektrum der Wissenschaft, the German-language sister publication of Scientific American.
“Everything points to the catastrophic end of a military operation.” Yikes. When construction workers started churning up skeletal remains, a project to renovate a soccer field outside Vienna, Austria, morphed into an archaeological dig, and it wasn't long before experts realized this field was actually a major discovery in Roman warfare history. The limbs were intertwined with those of other individuals. This indicates a hasty covering of the dead with earth, i.e. not an orderly burial.” With every individual examined a male, most between the age of 20 and 30 years, and in good health combined with the causes of death due to injuries from blunt and sharp weapons, including spears, daggers, swords, and iron bolts, “the variety of injuries indicates a battle and not an execution site.” With cremation burial common in the European Roman Empire, finds of Roman skeletons from this period are considered rare, leading to this being one of the most significant Roman war discoveries in Central Europe. Instead, everything points to the catastrophic end of a military operation.” While most of the dead were robbed of their weapons and equipment, the team still discovered a Roman iron dagger with inlays of silver wire, several scales of armor that show distinct differences from known varieties, the metal cheek piece of a Roman helmet, two iron spearheads (one stuck in a hip bone, mind you), and hobnails from shoes that were made with leather and studded with nails, a type of footwear used by Roman soldiers. Hasenleitengasse may therefore mark the beginning of Vienna's urban history.” Tim Newcomb is a journalist based in the Pacific Northwest. He covers stadiums, sneakers, gear, infrastructure, and more for a variety of publications, including Popular Mechanics. A Student Sniffed Out an Ancient Circle of Stones A Metal Detectorist Dug Up Two Ancient Daggers Experts Find Missing Piece of Ramesses II Statue
JWST Spots Giant Spiral Galaxy Shockingly Early in Cosmic History A giant spiral galaxy, nicknamed “Big Wheel,” as seen by the James Webb Space Telescope from some two billion years after the big bang. A newfound object uncovered in the early universe by NASA's James Webb Space Telescope (JWST) is challenging long-held ideas about how galaxies form. Dubbed the “Big Wheel,” it's a galaxy much like our own Milky Way—a humongous, spiraling disk of stars, gas and cosmic dust. But Big Wheel is even bigger than our home galaxy; it's some five times more massive and covers twice as much area. JWST has seen it from when the universe was only about two billion years old, which is remarkably young for a galaxy of such grandeur. Compared with its much smaller, more nascent contemporaries in that bygone era, “you can clearly see Big Wheel is a true outlier,” he says. If you're enjoying this article, consider supporting our award-winning journalism by subscribing. The discovery is part of a broader trend in astronomy, as ever-larger and more capable telescopes look deeper into the universe, gathering light from cosmic vistas further and further back in time. Using JWST and other powerful facilities, observers have been able to glimpse some early galaxies just a few hundred million years after the big bang, says Vadim Semenov, a postdoctoral researcher at the Center for Astrophysics | Harvard & Smithsonian. But the neighborhood where Big Wheel lives is packed with an exceptional overabundance of matter even for that already-enriched cosmic epoch. That's probably where it got to indulge in a “heavy breakfast,” says Chuck Steidel, study co-author and an astronomer at the California Institute of Technology. It looks like an adult galaxy at a time when there were only supposed to be children around.” The giant spiral didn't look like it belonged there, so Wang assumed it was an interloper from much later in cosmic time that just happened to be in JWST's field of view. Upon further analysis, however, Wang and his colleagues were able to gauge Big Wheel's true cosmic distance, and they suddenly realized that the object they were seeing was in fact a faraway galaxy “that has grown really, really fast since the beginning of the universe,” he says. “At the moment, I have to say it's a mystery—a complete mystery,” Cantalupo admits. Perhaps, he says, Big Wheel's crowded environment may have allowed for “some previously unknown physical mechanisms that [help] galaxies to grow.” “To determine whether existing models can explain such galaxies, we need detailed theoretical and numerical studies that capture both the extreme galaxies and the extreme environments they inhabit.” So, given that we've seen Big Wheel as it was some 12 billion years ago, what can we say about its status today, in our current cosmic era? Not very much of certainty, Steidel says—but its heavyweight status and population-dense environs hint that the outsized object may have eventually morphed into another, more familiar cosmic form. When multiple large galactic mergers occur, this type of galaxy usually forms as a result. “One of the really fun things about astronomy is that you often find things that you were not looking for, and they turn out to be sometimes even more interesting than what you were trying to do,” Steidel says. A philosopher turned journalist, originally from South Korea, Lee's interests lie in finding unexpected connections between life and science, particularly in theoretical physics and mathematics.
Bacterial vaginosis is an irritating overgrowth of pathogenic bacteria. A new study has found that some cases of the condition should be treated like a sexually transmitted infection. It involves an imbalance in the microbes that grow in the vagina, with pathogenic strains beating out healthier bacteria. It's not usually a serious condition, but it can put people at higher risk for contracting HIV and other sexually transmitted infections. Now some researchers are arguing that BV itself should be treated like an STI. Lenka Vodstrcil is a senior research fellow at Monash University's Melbourne Sexual Health Center. Catriona Bradshaw is a professor of sexual health medicine at Monash University and Alfred Hospital. If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. Before we dive into our conversation it's important to note that while we'll be discussing treating BV as an STI, people can be diagnosed with BV even if they've never had sex. It's an imbalance of vaginal bacteria—and one we don't really understand very well at that—so there are probably multiple ways it can come about. The point of this new research wasn't to figure out how people acquire BV but rather to understand whether transmission between partners can make treatment more difficult. Thank you both so much for coming on to chat today. So in women an optimal vaginal microbiome actually is characterized by these bacteria called lactobacilli that secrete lactic acid, and we have a low bacterial diversity and a really acidic pH, so it's actually the opposite of the gut. And these bacteria secrete chemicals called amines that produce this smell, and they actually form a biofilm, so they create a little sort of scaffold that they all live in to protect themselves from host responses and antibiotics. And the problem with BV is that we still haven't found a single infectious cause ... Bradshaw: That is only present in women with BV and [is] absent in women without, so it's really what we call a polymicrobial infection. But it's common—it affects, really, one in four women globally. Feltman: Hmm, and so your new study suggests that BV could be considered a sexually transmitted infection. Bradshaw: So the acquisition of BV is associated with exposure to new sexual partners in a lot of studies, it's associated with lack of condom use, and in fact, it has the incubation period that's quite typical of a bacterial STI, so looking like it's about three to four days. If you follow women for three, six months, 12 months, you see more than 50 percent get their BV back again. So it really speaks to a woman being reinfected. So this profile was really evident to me for many years as a clinician but then evident to us when we did our studies, and every single treatment strategy we tried that is directed solely at women—which is globally what is recommended: just treat women—really failed to improve cure. And then when Lenka did all the detailed analysis of our trials, this one factor just kept popping out each time: regular partner was very much driving treatment failure. Feltman: So, Lenka, could you tell me a little bit about the study design and how you were able to show in the first place that BV had this profile that resembled an STI? So we've conducted many studies over multiple years to acquire this body of evidence that Cat has just told you about. So we thought it was time to revisit partner treatment, and there were partner-treatment trials that had been conducted before, predominantly in the '80s and '90s, and all of these partner-treatment trials, very few of them improved cure for women. But in around 2012 there was a great review written by Supriya Mehta, and that really highlighted that the failure of these trials was likely due to the limitations of the trials and shouldn't be taken as evidence that sexual transmission isn't occurring. Vodstrcil: And another big thing in these trials, and another one since then that was well-designed, was that they all used an oral antibiotic in the male partners of women with BV. But since these trials have been conducted, there was a big body of molecular evidence showing—and this is where they use genetic sequencing, DNA sequencing—and what researchers, including us, have found is that the bacteria that are associated with BV are located in two sites on the male penis as well: so inside the urethra, which is the tube men pee through, and also on ... So we decided that we needed to try two different antibiotics to target the two different sites of carriage of these BV organisms. So in our trial—and we ran a couple of pilot studies before we did the main trial—we used this dual-therapy approach for men, and we recruited women with BV that were in monogamous relationships with a male partner, and we used this concurrent dual-therapy approach for couples and treated them for a week at the same time. And in our pilot trials we found that this had an effect on the BV bacteria in men and women. Then we had to conduct a randomized control trial to further strengthen the evidence that we were seeing. And this is where we randomized couples to either getting partner treatment or to the current standard practice, which is female-only treatment. But in fact, after we had 150 couples recruited, the data were viewed by an independent data safety monitoring board, and what they told us is that we could stop the trial because one of the two groups was what we call inferior, or superior, to the other group. And in fact we then analyzed the data and showed that the partner-treatment group was—significantly improved cure for women. Feltman: Yeah, so I know that sometimes labeling something as an STI can be kind of controversial—there's so much stigma around them. I mean, back when we had our big mpox outbreak here in the U.S., it was mostly spreading via sex between men, and there were a lot of think pieces about whether it would be harmful or helpful to start talking about it and labeling it as an STI. Bradshaw: What I would say, as a clinician and from, also, we have—our group has conducted quite a lot of qualitative studies, is that bacterial vaginosis is not an insignificant condition. So far women are told that this is just an imbalance of their bacteria, and they are given cycle after cycle of antibiotics. It's resulted, for many women, in a lot of distress and frustration with the health care profession. It is doing their partners an enormous disservice to withhold that information. And so it is important to actually call out transmission of BV, to be brave enough to do it.” And I use the word “brave” because there is pushback about this. We are not saying that, for women in a situation with highly recurrent BV, that sexual transmission is solely responsible for their ongoing bacterial vaginosis. We know that for some women, once they've acquired it through a transmission event that they actually fail to clear it. It has also confirmed the results from studies and meta-analysis that condoms are protective against BV, which is a very helpful, empowering message for women and their partners in terms of prevention. But it is important to deliver that information in a sensitive way. So we talk about exchanging good bugs and exchanging bad bugs, and we often start with analogies like: sharing a glass of water or a drink bottle, shaking hands, kissing and having sex all results in the exchange of good and bad bugs between humans. This is a dynamic process that happens all the time, and BV bacteria are some of the less optimal bacteria that can get exchanged during sex. Men can carry these bacteria in the absence of obvious symptoms, and there is no test for men, so how would a man know that they had those bacteria? So we try and pull out all the blame and talk about this being a shared responsibility, to bring everyone on that journey so that we really try to remove the stigma of: “This is an STI. I didn't have this until you came along.” What else are we still looking to understand about BV? Vodstrcil: Yeah, so Cat just alluded to this: we still don't know what the actual cause of BV is—so whether there is one kind of founder, or first, organism that has to be present before other organisms can come in and that becomes the polymicrobial, or multi-organism, infection that we see with BV. And getting a better treatment for that persistent biofilm or dense infection is something that we also need to develop. When we find that out, we can improve the diagnostic for BV and also make the treatment more specific to the bug that we can then attribute to BV rather than using sort of what we call broad-spectrum, or broad, antibiotics. And one other thing is: we've just focused this study on women who have sex with men, but we know from, again, the body of literature and also from our past studies that women and other gender-diverse individuals with a vagina can share these same BV-causing bacteria. Vodstrcil: So we recognize that partner treatment in this group, it is sort of integrated into clinical guidelines, where if someone has a female partner, they're encouraged to go and get tested and treated. But we're also conducting studies to try and inform guidelines in this space as well. Bradshaw: I think, just in terms of our messaging, this is a very big change to clinical practice. We just want to make this as simple and accessible as possible for people so that they can access it wherever they are. Feltman: Yeah, we'll definitely link to those resources in our show notes. Thank you both so much for coming on to talk us through this. It's been really interesting and hopefully helpful for some of our listeners. Vodstrcil: Yes, thank you so much for having us. We've really enjoyed that opportunity to talk with you. We'll be back on Friday with a fascinating story about how certain prenatal tests can inadvertently detect cancer in pregnant people. Science Quickly is produced by me, Rachel Feltman, along with Fonda Mwangi, Kelso Harper, Naeem Amarsy and Jeff DelViscio. Shayna Posses and Aaron Shattuck fact-check our show. Our theme music was composed by Dominic Smith. Subscribe to Scientific American for more up-to-date and in-depth science news. Rachel Feltman is former executive editor of Popular Science and forever host of the podcast The Weirdest Thing I Learned This Week. Fonda Mwangi is a multimedia editor at Scientific American. She previously worked as an audio producer at Axios, The Recount and WTOP News. He has worked on projects for Bloomberg, Axios, Crooked Media and Spotify, among others.