Data Resources
Data Consortia
Data consortia and related resources for ML training.
ML Training Data Sources
| wdt_ID | wdt_last_edited_at | Category | Consortium / Resource | Acronym / Affiliation | URL | Description | Access / Pricing | Status (verified Jun 2026) | Comments |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 17/06/2026 07:01 PM | AI / Drug Discovery Partnership | Accelerating Therapeutics for Opportunities in Medicine (ATOM) | ATOM | https://atomscience.org | Public-private partnership using AI and high-performance computing to accelerate drug discovery from target ID to clinical candidate. | Large-institution collaborators; no public pricing; access via industry partnership. | Active | Government-funded AI drug discovery platform (not an open data source). |
| 2 | 17/06/2026 07:01 PM | AI / Healthcare Initiative | AI4Health Consortium | AI4Health | https://www.ai4health.eu | European initiative promoting AI in healthcare and drug discovery; integrates data, modeling, and AI solutions. | Publicly/privately funded; project-based participation, not open membership. | Verify directly | Several EU 'AI4Health' branded efforts exist; confirm the specific entity before relying on it. |
| 3 | 17/06/2026 07:01 PM | Population / Health Dataset | All of Us Research Program | All of Us | https://allofus.nih.gov | U.S. NIH program building one of the largest, most diverse health databases (health, genomic, environmental data) for precision medicine. | Free; registration and approval required for Researcher Workbench data access. | Active | Strong real-world + genomic training data. Researcher Workbench is cloud-based. |
| 4 | 17/06/2026 07:01 PM | Biomarkers | Biomarker Enterprise to Advance Personalized Medicine (BEAM) | BEAM | https://www.personalizedmedicinecoalition.org | Collaboration fostering biomarker discovery, development, validation, and standardization across the life sciences. | Operates under the Personalized Medicine Coalition; membership varies by org type. | Verify directly | Confirm program is still active under PMC; biomarker programs change names frequently. |
| 5 | 17/06/2026 07:01 PM | Regulatory Science / Data Standards | Critical Path Institute | C-Path | https://c-path.org | Non-profit accelerating drug development via data standards and public databases; biomarkers, clinical trial modeling, regulatory science. | Grant- and sponsor-funded consortia; join individual consortia rather than a general membership. | Active | Operates many disease-specific data consortia worth exploring individually. |
| 6 | 17/06/2026 07:01 PM | Bioinformatics Infrastructure | ELIXIR | ELIXIR | https://elixir-europe.org | European intergovernmental org uniting life-science tools, data, and standards for open access; integrates national bioinformatics infrastructure. | Membership for academia and industry; variable fees by service. | Active | Umbrella over many core EU resources (e.g., EBI databases). |
| 7 | 17/06/2026 07:01 PM | Genomics Dataset | ENCODE (Encyclopedia of DNA Elements) | ENCODE | https://www.encodeproject.org | Collaborative project identifying all functional elements in the human genome; large-scale datasets and analysis tools for gene regulation. | Free public access to all data. | Active | Well-curated, ML-friendly functional genomics data. |
| 8 | 17/06/2026 07:01 PM | Biomarkers / Neuro | ERP Biomarkers | ERP Biomarkers | https://erpbiomarkers.org | Identification/validation of electrophysiological (ERP) biomarkers, especially for neuropsychiatric conditions. | Membership for research institutions and pharma; fees by org type. | Verify directly | Niche; confirm site is live and program active before citing. |
| 9 | 17/06/2026 07:01 PM | Biomarkers | FNIH Biomarkers Consortium | Biomarkers Consortium (FNIH) | https://fnih.org/our-programs/biomarkers-consortium/ | Public-private partnership identifying and qualifying biomarkers for drug development and precision medicine across diseases. | Partner-funded; no general membership/pricing. | Active | URL updated to current FNIH program path. |
| 10 | 17/06/2026 07:01 PM | Rare Disease / Patient Registries | Genetic Alliance | Genetic Alliance | https://geneticalliance.org | Consortium linking patient registries to enable research and drug discovery from rare disease patient communities. | Membership for patient orgs and research institutions; variable fees. | Active | Registry data; consent/governance constraints apply for ML use. |
| 11 | 17/06/2026 07:01 PM | Population / Genomics Dataset | Genomics England (100,000 Genomes Project) | Genomics England | https://www.genomicsengland.co.uk | UK initiative sequencing genomes from NHS patients (rare disease, cancer); datasets available for approved research. | Access via approved research applications (Research Environment); no public price list. | Active | 100K project complete; now part of broader NHS genomics + larger newborn/diverse-data programs. |
| 12 | 17/06/2026 07:01 PM | Standards / Data Sharing | Global Alliance for Genomics and Health (GA4GH) | GA4GH | https://www.ga4gh.org | International effort developing frameworks and standards for sharing genomic and health data globally. | Free to participate; some collaborations may involve fees. | Active | Standards body (not a dataset) — important for interoperable ML pipelines. |
| 13 | 17/06/2026 07:01 PM | Cloud Data Platform | Google Cloud Public Datasets (BigQuery) | https://cloud.google.com/datasets | Public datasets hosted on Google Cloud, queryable in BigQuery and integrable into tools. | Free to query within BigQuery free tier; compute billed beyond that. | Active | URL updated to current public-datasets hub (old /bigquery/public-data path redirects). | |
| 14 | 17/06/2026 07:01 PM | Cloud Data Platform | Google Cloud Life Sciences Public Datasets | https://cloud.google.com/batch | Formerly hosted curated life-science public datasets and the Cloud Life Sciences API for pipelines. | N/A — service retired. | DEPRECATED | Cloud Life Sciences API was deprecated and removed after July 8, 2025; Google directs users to Cloud Batch. Recommend removing or replacing this entry. | |
| 15 | 17/06/2026 07:01 PM | Single-Cell / Atlas | Human Cell Atlas (HCA) | HCA | https://www.humancellatlas.org | Global consortium building a reference map of all human cells; cutting-edge single-cell datasets. | Free data access; membership for collaboration. | Active | Data Portal at data.humancellatlas.org. Large single-cell training corpus. |
| 16 | 17/06/2026 07:01 PM | Data Platform / Competitions | Kaggle | Kaggle | https://www.kaggle.com | Platform to explore, analyze, and share datasets; hosts competitions and some synthetic datasets for training. | Free; now part of Google. | Active | Hosted several major life-science ML competitions (e.g., BELKA from Leash Bio, 2024). |
| 17 | 17/06/2026 07:01 PM | AI / Federated Learning | MELLODDY Consortium | MELLODDY | https://www.melloddy.eu | EU IMI project that used federated machine learning across 10 pharma companies' private datasets to improve predictive drug-discovery models. | Was EU Horizon 2020 + industry funded; no public membership. | Concluded (legacy) | Project ended ~2022; site/tooling (MELLODDY-Tuner, MELLODDY-TUDA) now mainly a methods reference. Federated-learning blueprint, not a live data source. |
| 18 | 17/06/2026 07:01 PM | Database Index / Reference | NAR Database Issue & Online Molecular Biology Database Collection | Nucleic Acids Research | https://www.oxfordjournals.org/nar/database/c/ | NAR's annual curated catalog of molecular biology databases by category (1,900+ databases). | Freely available on the NAR website. | Active | Best single index for discovering domain databases; refreshed annually in the NAR Database Issue. |
| 19 | 17/06/2026 07:01 PM | Chemistry / Reactions Dataset | Open Reaction Database | ORD | https://docs.open-reaction-database.org/en/latest/ | Open-access chemical reaction database supporting ML for reaction prediction, synthesis planning, and experiment design. | Open source / open access. | Active | Led by Connor Coley (MIT). |
| 20 | 17/06/2026 07:01 PM | Industry Alliance / Standards | Pistoia Alliance | Pistoia Alliance | https://www.pistoiaalliance.org | Global non-profit lowering barriers to life-science R&D innovation: digital transformation, AI, real-world data, interoperability. | Org membership; fees by level/size. Individual membership historically ~$250. | Active | Offers many free virtual meetings on data topics. Good networking/standards angle. |
| Category | Consortium / Resource | Acronym / Affiliation | URL | Description | Access / Pricing | Status (verified Jun 2026) | Comments |