ARCH Dataset Resources
BIG DATA PORTALS FOR BIOMEDICINE
NF Data Portal
The NF Data Portal serves as a comprehensive resource hub for Neurofibromatosis (NF) research, aiming to facilitate open exploration and sharing of datasets, analysis tools, resources, and publications related to neurofibromatosis and schwannomatosis. It supports the NF Open Science Initiative (NF-OSI), encouraging contributions from the community.
Model AD Explorer
The Model AD Explorer offers a comprehensive platform to delve into gene expression and pathology data from cutting-edge mouse models of Alzheimer's disease developed by the MODEL-AD consortium. This consortium involves two research centers working together to create new mouse models that closely mimic Alzheimer's disease as seen in humans, utilizing standardized measures across neuropathology, 'omics, and behavior for phenotyping.
ELITE
The Exceptional Longevity Translational Resources (ELITE) Portal is an innovative platform created to facilitate the exploration and utilization of multi-omic data, analytical tools, and resources produced by studies supported by the National Institute on Aging (NIA).
Digital Health Knowledge Portal
The dHealth Digital Health Knowledge Portal is a comprehensive platform designed to facilitate the exploration and utilization of digital and mobile health data, tools, benchmarked outcomes, and digital biomarkers. Supported by SAGE BIONETWORKS, the portal provides access to a variety of collections, data, tools, and publications derived from studies leveraging digital health technologies.
CRI iAtlas
The iAtlas portal is an innovative tool dedicated to immuno-oncology data exploration and analysis. It underscores the importance of considering the immune system in developing new cancer therapies. With the tumor microenvironment's composition—specifically the types and amounts of immune cells—being a key indicator of clinical outcomes and treatment responses, iAtlas provides essential resources for researchers aiming to understand patient responses to improve immunotherapy methods.
Cancer Complexity Knowledge Portal
The Cancer Complexity Knowledge Portal, powered by the NCI Division of Cancer Biology, is a treasure trove for scientists delving into basic and translational cancer research.
BSMN
The Brain Somatic Mosaicism Network (BSMN) Portal is a comprehensive resource sponsored by the National Institute of Mental Health (NIMH), dedicated to understanding the role of brain somatic mosaicism in neuropsychiatric diseases. This multi-site effort brings together research on brain somatic mutations across various mental health disorders such as Autism Spectrum Disorder, Bipolar Disorder, Schizophrenia, and Tourette Syndrome, among others.
ARK
The ARK Portal is a comprehensive platform that hosts a wealth of data related to Arthritis, Autoimmune, and Related Diseases. Established by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), the portal is part of the Accelerating Medicines Partnership® (AMP®).
Agora
Agora is a unique platform dedicated to enhancing the understanding of genes associated with Alzheimer's Disease (AD). It presents a curated collection of evidence supporting the association of various genes with AD, alongside a compilation of over 600 nascent drug targets nominated by AD researchers.
BIG DATA SOURCES FOR BIOMEDICINE
Registry of Open Data
The Registry of Open Data on AWS, now integrated with AWS Data Exchange, is a hub for discovering and sharing datasets hosted on AWS, featuring over 500 datasets from various fields and organizations such as Allen Institute for Artificial Intelligence, Digital Earth Africa, and NASA.
Pacific Data Hub
The Pacific Data Hub is a comprehensive resource that aggregates a wide array of data and knowledge to propel sustainable development in the Pacific region. It features over 825 datasets from various categories including structured, semi-structured, and spatial data, along with links to online databases and web services. In addition, it houses nearly 11,878 publications ranging from scientific papers and policy documents to manuals and handbooks.
Include
The INCLUDE Data Coordinating Center (DCC) is revolutionizing how scientists and the Down syndrome community collaborate. Utilizing cutting-edge technology and shareable resources, the Data Hub is a cornerstone for research aimed at improving healthcare and the quality of life for people with Down syndrome.
Allen Brain Atlas
The Allen Institute for Brain Science offers a variety of anatomical reference atlases, serving both as stand-alone resources and as frameworks for various datasets like in situ hybridization, cell projection maps, and in vitro cell characterization.
Human Cell Atlas
The Human Cell Atlas (HCA) is a groundbreaking global consortium initiative aimed at creating detailed reference maps of all human cells, the fundamental units of life, to enhance the understanding of human health and improve the diagnosis, monitoring, and treatment of diseases.
The Human Protein Atlas
The Human Protein Atlas offers a comprehensive open-access resource for exploring human proteins. This digital platform allows users to search for specific genes or proteins across 12 detailed sections, including tissue, brain, single cell types, pathology, disease, immune cell, blood protein levels, subcellular localization, cell line, structural data, and interactions.
Human Tumor Atlas Network
The Human Tumor Atlas Network (HTAN), a Cancer MoonshotSM initiative funded by the National Cancer Institute (NCI), is dedicated to creating comprehensive three-dimensional atlases detailing the cellular, morphological, and molecular changes in human cancers as they progress from precancerous conditions to advanced stages of the disease. As of the latest data release, HTAN has compiled atlases covering 71 organs across 1,897 cases, resulting in over 7,414 biospecimens.
Gray BRCA Pre-Cancer Atlas
The Gray BRCA Pre-Cancer Atlas, sponsored by the Gray Foundation, is pioneering the way we understand, detect, and treat cancers associated with BRCA1/2 mutations, focusing on breast and ovarian cancers. These mutations not only significantly elevate the risk of hereditary cancer but also influence the therapeutic approach for sporadic cancer cases. By exploring the earliest stages of cancer development, the Atlas aims to facilitate interventions before cancer metastasizes, employing novel research techniques and profiling methods.
TCGA
The Cancer Genome Atlas Program (TCGA) stands as a monumental effort in the field of cancer genomics, having molecularly characterized over 20,000 primary cancer and matched normal samples across 33 cancer types.
GTEx
The Genotype-Tissue Expression (GTEx) Portal serves as an extensive public resource for exploring tissue and cell-specific gene expression and regulation across individuals, developmental stages, and species. It encapsulates data from three National Institutes of Health (NIH) projects, offering a broad range of datasets including Adult GTEx, with dGTEx and NHP-dGTEx slated for future availability.
GEO
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is a public repository that archives and freely distributes comprehensive functional genomics data sets.
recount
recount3 offers a comprehensive online resource featuring RNA-seq data for genes, exons, and exon-exon junction counts along with coverage bigWig files for a substantial number of studies across human and mouse species.
IGSR
The International Genome Sample Resource (IGSR) serves as a comprehensive platform for maintaining and sharing human genetic variation resources developed by the 1000 Genomes Project.
Data.gov
data.gov serves as the primary platform for the U.S. government's open data, offering a vast repository of over 291,000 datasets for public use.
European Data
data.europa.eu serves as the official portal for European data, offering an expansive digital platform where users can explore, analyze, and share high-quality data. With over 1.7 million datasets from 183 catalogues across 35 countries, the portal stands as a comprehensive resource for a wide array of users, including researchers, policymakers, and the general public.
Awesome Public Datasets
The Awesome Public Datasets repository is a meticulously curated collection of high-quality, topic-specific public data sources.
Datasets from Microsoft Research
The Microsoft Research website offers a comprehensive index dedicated to researcher tools, encompassing a wide array of datasets, SDKs, APIs, and other open-source code developed by Microsoft researchers.
Datasets from kaggle.com
Kaggle's datasets webpage is a comprehensive platform for exploring, analyzing, and sharing high-quality data across various fields and applications.
Datasets from paperswithcode.com
The collection highlighted from PapersWithCode.com includes a wide variety of datasets essential for the progress and evaluation of artificial intelligence (AI) technologies. .
Datasets from huggingface.co
Hugging Face offers an extensive and diverse collection of over 128,235 datasets, supporting a wide range of tasks in machine learning and artificial intelligence, including natural language processing, computer vision, audio processing, and more.
UC Irvine Machine Learning Repository
The UCI Machine Learning Repository offers a rich collection of databases, domain theories, and data generators. It serves as a fundamental empirical analysis tool for machine learning algorithms, utilized globally by students, educators, and researchers.
cBioPortal
The cBioPortal provides a comprehensive resource for visualization and analysis of cancer genomics data sets, with a strong emphasis on facilitating research in oncology. It features a vast collection of data across various cancer studies, including PanCancer, Pediatric, Immunogenomic studies, and data from cell lines, among others. Users can explore specific studies related to different cancer types such as Adrenal Gland, Ampulla of Vater, Biliary Tract, Bladder/Urinary Tract, Bone, Bowel, Breast, CNS/Brain, Cervix, and many others.