Abstract
AbstractThe National Cancer Institute (NCI) Image Data Commons (IDC) offers publicly available cancer radiology collections for cloud computing, crucial for developing advanced imaging tools and algorithms. Despite their potential, these collections are minimally annotated; only 4% of DICOM studies in collections considered in the project had existing segmentation annotations. This project increases the quantity of segmentations in various IDC collections. We produced high-quality, AI-generated imaging annotations dataset of tissues, organs, and/or cancers for 11 distinct IDC image collections. These collections contain images from a variety of modalities, including computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET). The collections cover various body parts, such as the chest, breast, kidneys, prostate, and liver. A portion of the AI annotations were reviewed and corrected by a radiologist to assess the performance of the AI models. Both the AI’s and the radiologist’s annotations were encoded in conformance to the Digital Imaging and Communications in Medicine (DICOM) standard, allowing for seamless integration into the IDC collections as third-party analysis collections. All the models, images and annotations are publicly accessible.
Topics

No keywords indexed for this article. Browse by subject →

References
66
[1]
Fedorov, A. et al. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics 43 (2023). 10.1148/rg.230180
[2]
The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository

Kenneth Clark, Bruce Vendt, Kirk Smith et al.

Journal of Digital Imaging 10.1007/s10278-013-9622-7
[3]
Albertina, B. et al. The Cancer Genome Atlas Lung Adenocarcinoma Collection (TCGA-LUAD) (Version 4) [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.JGNIHEP5 (2016). 10.7937/k9/tcia.2016.jgnihep5
[4]
Kirk, S. et al. The Cancer Genome Atlas Lung Squamous Cell Carcinoma Collection (TCGA-LUSC) (Version 4) [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.TYGKKFMQ (2016). 10.7937/k9/tcia.2016.tygkkfmq
[5]
Li, P. et al. A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis (Lung-PET-CT-Dx) [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/TCIA.2020.NNC2-0461 (2020). 10.7937/tcia.2020.nnc2-0461
[6]
Madhavi, P., Patel, S. & Tsao, A. S. Data from Anti-PD-1 Immunotherapy Lung [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/tcia.2019.zjjwb9ip (2019). 10.7937/tcia.2019.zjjwb9ip
[7]
Muzi, P., Wanner, M. & Kinahan, P. Data From RIDER Lung PET-CT. The Cancer Imaging Archive https://doi.org/10.7937/k9/tcia.2015.ofip7tvm (2015). 10.7937/k9/tcia.2015.ofip7tvm
[8]
Gevaert, O. et al. Non–Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers by Leveraging Public Gene Expression Microarray Data—Methods and Preliminary Results. Radiology 264, 387–396 (2012). 10.1148/radiol.12111607
[9]
Bakr, S. et al. Data for NSCLC Radiogenomics (Version 4) [Data set]. The Cancer Imaging Archive (2017).
[10]
Bakr, S. et al. A radiogenomic dataset of non-small cell lung cancer. Sci Data 5, 180202 (2018). 10.1038/sdata.2018.202
[11]
Kinahan, P., Muzi, M., Bialecki, B., Herman, B. & Coombs, L. Data from the ACRIN 6668 Trial NSCLC-FDG-PET (Version 2) [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/tcia.2019.30ilqfcl (2019). 10.7937/tcia.2019.30ilqfcl
[12]
Machtay, M. et al. Prediction of Survival by [18F]Fluorodeoxyglucose Positron Emission Tomography in Patients With Locally Advanced Non–Small-Cell Lung Cancer Undergoing Definitive Chemoradiation Therapy: Results of the ACRIN 6668/RTOG 0235 Trial. Journal of Clinical Oncology 31, 3823–3830 (2013). 10.1200/jco.2012.47.5947
[13]
Li, X. et al. Data From QIN-Breast (Version 2) [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.21JUEBH0 (2016). 10.7937/k9/tcia.2016.21juebh0
[14]
Li, X. et al. Multiparametric Magnetic Resonance Imaging for Predicting Pathological Response After the First Cycle of Neoadjuvant Chemotherapy in Breast Cancer. Invest Radiol 50, 195–204 (2015). 10.1097/rli.0000000000000100
[15]
Akin, O. et al. The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma Collection (TCGA-KIRC) (Version 3) [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.V6PBVTDR (2016). 10.7937/k9/tcia.2016.v6pbvtdr
[16]
Litjens, J. B., Debats, O., Barentsz, J., Karssemeijer, N. & Huisman, H. SPIE-AAPM-NCI PROSTATEx Challenges. The Cancer Imaging Archive https://doi.org/10.7937/K9TCIA.2017.MURS5CL (2017). 10.7937/k9tcia.2017.murs5cl
[17]
Litjens, G., Debats, O., Barentsz, J., Karssemeijer, N. & Huisman, H. Computer-Aided Detection of Prostate Cancer in MRI. IEEE Trans Med Imaging 33, 1083–1092 (2014). 10.1109/tmi.2014.2303821
[18]
Erickson, B. J. et al. The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection (TCGA-LIHC) (Version 5) [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ (2016). 10.7937/k9/tcia.2016.immqw8uq
[19]
Digital Imaging and Communications in Medicine (DICOM). in NEMA Publications PS 3.1-PS 3.12. (The National Electrical Manufacturers Association, Rosslyn, VA, 1992).
[20]
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 2020 18:2 18, 203–211 (2020).
[21]
Murugesan, G. K. et al. Evaluating the Effect of Multilabel and Single Label Models on Prostate Cancer Lesion Segmentation in Ga-68 PSMA-11 PET/CT. (2023).
[22]
Wasserthal, J. et al. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. 5, https://doi.org/10.1148/ryai.230024 (2023). 10.1148/ryai.230024
[23]
Gatidis, S. & Kuestner, T. A whole-body FDG-PET/CT dataset with manually annotated tumor lesions (FDG-PET-CT-Lesions) [Dataset]. The Cancer Imaging Archive https://doi.org/10.7937/gkr0-xv29 (2022). 10.7937/gkr0-xv29
[24]
Gatidis, S. et al. A whole-body FDG-PET/CT Dataset with manually annotated Tumor Lesions. Sci Data 9, 601 (2022). 10.1038/s41597-022-01718-3
[25]
Gatidis, S., Kustner, T., Ingrisch, M., Cyran, C. & Kleesiek, J. Automated Lesion Segmentation in Whole-Body FDG- PET/CT - Domain Generalization. Preprint at https://doi.org/10.5281/zenodo.7845727 (2023). 10.5281/zenodo.7845727
[26]
Murugesan, G. K. et al. Automatic Whole Body FDG PET/CT Lesion Segmentation using Residual UNet and Adaptive Ensemble. bioRxiv 2023.02.06.525233 https://doi.org/10.1101/2023.02.06.525233 (2023). 10.1101/2023.02.06.525233
[27]
Wasserthal, J. et al. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiol Artif Intell 5 (2023). 10.1148/ryai.230024
[28]
Pretrained model for 3D semantic image segmentation of the FDG-avid lesions from PT/CT scans. https://doi.org/10.5281/ZENODO.8290055. 10.5281/zenodo.8290055
[29]
Fedorov, A. et al. Standardized representation of the TCIA LIDC-IDRI annotations using DICOM. The Cancer Imaging Archive https://doi.org/10.7937/TCIA.2018.h7umfurq (2018). 10.7937/tcia.2018.h7umfurq
[31]
Fedorov, A. et al. DICOM re‐encoding of volumetrically annotated Lung Imaging Database Consortium (LIDC) nodules. Med Phys 47, 5953–5965 (2020). 10.1002/mp.14445
[32]
Pretrained model for 3D semantic image segmentation of the lung from ct scan. https://doi.org/10.5281/ZENODO.8290168. 10.5281/zenodo.8290168
[33]
Pretrained model for 3D semantic image segmentation of the lung nodules from CT scans. https://doi.org/10.5281/ZENODO.8290146. 10.5281/zenodo.8290146
[34]
Aerts, H. J. W. L. et al. Data From NSCLC-Radiomics (version 4) [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2015.PF0M9REI (2014). 10.7937/k9/tcia.2015.pf0m9rei
[35]
Aerts, H. J. W. L. et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 5, 4006 (2014). 10.1038/ncomms5006
[36]
Bakr, S. et al. Data descriptor: A radiogenomic dataset of non-small cell lung cancer. Sci Data 5, (2018). 10.1038/sdata.2018.202
[37]
Heller, N. et al. The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT. Preprint at (2023).
[38]
Heller, N. et al. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 challenge. Med Image Anal 67 (2021).
[39]
Heller, N. et al. The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes. (2019).
[40]
Pretrained model for 3D semantic image segmentation of the kidney from CT scans. https://doi.org/10.5281/ZENODO.8277846. 10.5281/zenodo.8277846
[41]
Schindele, D. et al. High Resolution Prostate Segmentations for the ProstateX-Challenge [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/TCIA.2019.DEG7ZG1U (2020). 10.7937/tcia.2019.deg7zg1u
[42]
Meyer, A. et al. Anisotropic 3D Multi-Stream CNN for Accurate Prostate Segmentation from Multi-Planar MRI. Comput Methods Programs Biomed 200, 105821 (2021). 10.1016/j.cmpb.2020.105821
[43]
Meyer, A. et al. PROSTATEx Zone Segmentations [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/TCIA.NBB4-4655 (2020). 10.7937/tcia.nbb4-4655
[44]
Meyer, A. et al. Towards Patient-Individual PI-Rads v2 Sector Map: Cnn for Automatic Segmentation of Prostatic Zones From T2-Weighted MRI. in 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) 696–700, https://doi.org/10.1109/ISBI.2019.8759572 (IEEE, 2019). 10.1109/isbi.2019.8759572
[45]
Saha, A. et al. The PI-CAI Challenge: Public Training and Development Dataset. https://doi.org/10.5281/ZENODO.6624726 (2022). 10.5281/zenodo.6624726
[46]
Cuocolo, R., Stanzione, A., Castaldo, A., De Lucia, D. R. & Imbriaco, M. Quality control and whole-gland, zonal and lesion annotations for the PROSTATEx challenge public dataset. Eur J Radiol 138, 109647 (2021). 10.1016/j.ejrad.2021.109647
[47]
Cuocolo, R. et al. Deep Learning Whole-Gland and Zonal Prostate Segmentation on a Public MRI Dataset. Journal of Magnetic Resonance Imaging 54, 452–459 (2021). 10.1002/jmri.27585
[48]
Bressem, K., Adams, L. & Engel, G. Prostate158 - Training data (version 1) [Data set]. In Computers in Biology and Medicine 148, 105817 (2022). 10.1016/j.compbiomed.2022.105817
[49]
Bloch, N. et al. NCI-ISBI 2013 Challenge: Automated Segmentation of Prostate Structures. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2015.zF0vlOPv (2015). 10.7937/k9/tcia.2015.zf0vlopv
[50]
Pretrained model for 3D semantic image segmentation of the prostate from T2 MRI scans. https://doi.org/10.5281/ZENODO.8290093. 10.5281/zenodo.8290093

Showing 50 of 66 references

Metrics
6
Citations
66
References
Details
Published
Oct 23, 2024
Vol/Issue
11(1)
License
View
Cite This Article
Gowtham Krishnan Murugesan, Diana McCrumb, Mariam Aboian, et al. (2024). AI-Generated Annotations Dataset for Diverse Cancer Radiology Collections in NCI Image Data Commons. Scientific Data, 11(1). https://doi.org/10.1038/s41597-024-03977-8
Related

You May Also Like

The FAIR Guiding Principles for scientific data management and stewardship

Mark D. Wilkinson, Michel Dumontier · 2016

16,917 citations

MIMIC-III, a freely accessible critical care database

Alistair E.W. Johnson, Tom J. Pollard · 2016

5,732 citations

Present and future Köppen-Geiger climate classification maps at 1-km resolution

Hylke E. Beck, Niklaus E. Zimmermann · 2018

5,248 citations

Climatologies at high resolution for the earth’s land surface areas

Dirk Nikolaus Karger, Olaf Conrad · 2017

3,757 citations