Scaling the foundations of scverse
Year of award: 2024
Grantholders
Prof dr Fabian Theis
Helmholtz Zentrum München, Germany
Project summary
scverse is an established ecosystem of machine learning methods and visualization tools for large scale biomedical data analysis, but data structures aren't cloud-ready. With upcoming broadly accessible cell atlases across organs and species as well as foundation models of single-cell variation, there's an urge for single-cell processing at scale. We propose extending the scverse core to scale out to the ever increasing size of single-cell and spatial-omics data while maintaining core functionality. This work has two aspects: the core data structures and user-facing analysis/visualization frameworks.
We will adapt AnnData, MuData and SpatialData objects to facilitate both remote data access patterns and out-of-core processing (i.e. working with data too large to fit in memory). Remote-friendly “lazy loading” and local out-of-core processing rely on Dask and Zarr to fetch only the required parts of a stored object for visualization and analysis.
We will solicit user feedback through existing scverse community channels including meetings, forums, chats and Github. After establishing the foundation a hackathon on building out features in scverse ecosystem packages will be held.
This work integrates well with other CZI grants, and extends on previously funded projects, such as scanpy (cycle 1) and scvi-tools (cycle 4).