Introduction to Biomedical Imaging
TileDB offers a specialized product, called TileDB-BioImaging, specifically designed for managing and analyzing digital histopathology images.
History
Biomedical imaging has been a transformative tool in the life sciences, allowing researchers and clinicians to visualize biological processes and structures in unprecedented detail. Its history traces back to the late nineteenth century with the discovery of X-rays by Wilhelm Röntgen in 1895. This marked the advent of radiography, the first biomedical imaging modality. Later innovations expanded the field, including ultrasound (introduced in the 1940s), magnetic resonance imaging (MRI) in the 1970s, and advanced techniques like computed tomography (CT) and positron emission tomography (PET).
Usage
Biomedical imaging is pivotal in both research and clinical settings. It enables non-invasive visualization of anatomical structures and physiological functions, supporting applications from disease diagnosis and treatment planning to basic biological research. Imaging modalities such as MRI and CT scans are routine in clinical diagnosis, while advanced techniques like electron microscopy and optical imaging are essential in cell biology and molecular research.
Public datasets
Many public datasets have been curated to support research in biomedical imaging. Prominent examples include the following:
- The Cancer Imaging Archive (TCIA): A repository of medical imaging data, primarily focused on cancer research.
- Camelyon: Histopathological slides for cancer metastasis detection and segmentation.
- UK Biobank Imaging Dataset: Contains a large collection of multimodal imaging data tied to extensive health records.
- Ischemic Stroke Lesion Segmentation (ISLES) dataset: Widely used for stroke segmentation research.
- Brain Tumor Segmentation (BraTS) dataset: Widely used for brain tumor segmentation research.
Importance in life sciences
Biomedical imaging is integral to the life sciences, providing insights that drive understanding and innovation. It aids in visualizing complex biological systems, monitoring disease progression, and developing new therapies. The integration of imaging data with computational analysis and artificial intelligence has further amplified its potential, enabling breakthroughs in personalized medicine, drug development, and predictive modeling of diseases.
This convergence of imaging and data science is shaping the future of the life sciences, offering tools that are crucial for unraveling the complexities of living systems.
TileDB-BioImaging
TileDB provides support for importing, visualizing, analyzing, and exporting multi-resolution, whole-slide microscopy images. TileDB’s bioimaging features include the following:
- Integrated ingestion, viewer, data management, access control, and computation for bioimaging datasets within the TileDB user interface.
- Support for fast batch ingestion of large image sets from Amazon S3 or any supported storage system using TileDB Task Graphs.
- Python APIs for ingesting images to TileDB-BioImaging arrays, slicing them with NumPy array semantics, or reading them via an OpenSlide Python-compatible API. Unlike the canonical OpenSlide Python API implementation, you can use the TileDB drop-in API with image assets stored in TileDB on any supported object store.
Specification
The OME-NGFF (Open Microscopy Environment - Next-Generation File Format) specification represents a major step forward in managing and analyzing large, multidimensional microscopy datasets. It builds on the legacy of the Open Microscopy Environment (OME), which has been developing open standards and software for biological imaging since the early 2000s. The limitations of existing file formats and a need to address those limitations drove its development, particularly as advances in microscopy technologies and computational workflows created new challenges:
- Explosion of data volumes.
- Multi-Scale and multi-dimensional imaging.
- Limitations of legacy formats.
- Interoperability and open science.
- Need for metadata-rich standards.
- Emergence of cloud-based and high-performance computing.
- Growing complexity of analysis pipelines.
TileDB-BioImaging is built around the OME-NGFF specification and inherently addresses these challenges through the following:
- Support for chunked, compressed data storage, making it scalable and efficient for massive datasets.
- Fast access to both fine and coarse scales of multi-resolution pyramid structures and supporting workflows that require specific dimensions.
- A cloud-native format, optimized for distributed computing, supporting parallel data processing and remote data access.
- Compatibility with a wide range of analysis tools as well as legacy formats and promoting data sharing.
- Functionality to embed rich metadata schemas based on OME standards and ensuring consistent data descriptions across imaging modalities.
- Governance, access control, and logging.
The detailed TileDB-BioImaging format is covered in the Storage Format Spec section.
Section organization
This rest of the Biomedical Imaging section is organized as follows:
Quickstart: This is the best way to get started with TileDB-BioImaging. You will learn how to install TileDB-BioImaging and run basic examples.
Foundation: This contains all the background information and internal mechanics of TileDB-BioImaging.
Tutorials: This is a series of tutorials covering all aspects of TileDB-BioImaging, from basic ingestion to massively scalable computations. Running those tutorials can help users start without any prior knowledge of TileDB-BioImaging and become power users.
You can run each of the tutorials in this section in one of two ways, which is specified in the beginning of each tutorial:
- Locally on your machine.
- Through a TileDB workspace.