OpenBind releases first open AI-ready dataset to accelerate drug discovery
Lizbé Koekemoer, Team leader at CMD, University of Oxford, and Jasmin Aschenbrenner, researcher at Diamond Light Source, reviewing a molecular structure in the Diamond laboratory. Credit: Stuart March-DNDi
Frank von Delft, Principal Beamline Scientist, at the crystallography beamline at Diamond Light Source. Credit: Stuart March-DNDi

News

OpenBind releases first open AI-ready dataset to accelerate drug discovery

15 May, 2026


The UK-led OpenBind initiative has released its first publicly accessible dataset and predictive artificial intelligence model, marking a significant step towards industrial-scale generation of standardised experimental data for therapeutic discovery and next-generation computational drug design


The UK-led OpenBind initiative has reached a major milestone with the release of its first publicly available dataset and predictive artificial intelligence (AI) model, a development expected to accelerate medicines discovery through wider access to high-quality experimental binding data and machine learning tools.

The release included both a standardised experimental dataset and a predictive model known as OpenBind v1, which are now freely available to researchers worldwide for immediate use in therapeutic discovery and computational model development. The initiative aims to address one of the principal barriers that has limited the broader impact of AI in pharmaceutical research – the shortage of reliable, large-scale experimental data that captures, at atomic resolution, how drug molecules interact with disease-associated proteins.

While AI has transformed protein structure prediction in recent years, particularly following breakthroughs in computational structural biology, its influence on practical drug discovery has remained comparatively limited. Researchers have repeatedly identified the absence of sufficiently large and consistent experimental binding datasets as a major constraint on model training and predictive performance.

Coordinated by Diamond Light Source and supported during its foundation phase by the UK Department for Science, Innovation and Technology, OpenBind has brought together structural biologists, chemists, biophysicists and AI specialists to establish what organisers described as the first openly accessible industrial-scale pipeline designed specifically to generate AI-ready protein-drug interaction data.

Diamond Light Source is the UK’s national synchrotron science facility where a large particle accelerator produces extremely intense beams of light, ranging from infrared through to X-rays, which researchers can use to examine matter at atomic and molecular scale. Located on the Harwell Science and Innovation Campus in Oxfordshire, the facility operates as a major multidisciplinary research centre that supports work across structural biology, chemistry, materials science, engineering, environmental science, archaeology and pharmaceutical development.

The first release demonstrated that the platform’s integrated workflow is now operational. According to the consortium, OpenBind has generated 800 high-quality experimental measurements within seven months. Historically, datasets of comparable scale frequently required several years to produce, then validate before their release publicly.

The operational pipeline integrated automated chemistry platforms, quantitative protein-ligand binding measurements and high-throughput crystallography conducted through the XChem Fragment Screening facility at Diamond Light Source. The project also incorporated engineered data-release infrastructure and AI model training through the UK’s Isambard-AI compute cluster – the UK’s most powerful AI supercomputer and one of the fastest supercomputers in the world. It was developed and is operated by the University of Bristol.

Researchers involved in the collaboration stated that the initiative could establish the foundation for substantial advances in therapeutic discovery, particularly in disease areas where rapid treatment development remains essential. Future releases are expected to focus on targets linked to coronavirus disease 2019, malaria, dengue, Zika virus infection and cancer.

“AlphaFold2 revolutionised protein structure prediction by leveraging decades of experimental data on protein structures in the Protein Data Bank,” said Dr. Mohammed AlQuraishi of Columbia University.

“The equivalent of such a dataset for protein-drug complexes does not yet exist, but OpenBind aims to create it, and in the process create the next generation of computational tools for modelling interactions between drugs and proteins,” he said.

The initial release also provided insight into lessons learned during OpenBind’s early operational cycles. Consortium members stated that standardised workflows, detailed metadata practices and extensive automation proved essential to ensure the reproducibility and consistency required for effective AI training. The early phases also identified opportunities to streamline future data handling and increase release frequency.

“High-quality experimental data is essential for developing novel and improved AI models and this first data release shows that OpenBind now has this foundation in place,” said Dr. Fergus Imrie of the University of Oxford.

“We’re enabling AI to improve model performance and guide future experiments, helping to accelerate discovery. The lessons from these early cycles are already helping us improve the speed, consistency, and reproducibility of the pipeline, which will be critical as OpenBind grows,” he added.

The collaboration built upon experience gained through previous open-science initiatives, including the so-called ‘COVID Moonshot’ during the SARS-CoV-2 pandemic and the international research collaboration Aligning Science Across Parkinson’s – also known as ASAP. The consortium stated that it continues to work closely with global health organisations to prioritise therapeutic targets associated with diseases such as malaria and tuberculosis.

“We couldn’t have made such rapid progress without the contributions of our consortium members and operational team,” Dr. said Frank von Delft, principal beamline scientist at Diamond Light Source.

“Their expertise and commitment have enabled us to reach this ambitious milestone. We will now implement the lessons from this foundation phase to ramp up a long-term operation that links high-volume production of AI data with active discovery projects,” he said.

The consortium stated that the next stages of OpenBind will include expansion towards larger chemical series, additional biological targets and deeper experimental datasets. Planned community blind-challenge exercises will also assess how effectively AI models can predict outcomes for freshly generated experimental data.

Ultimately, organisers stated that the programme aims to establish a global open-data infrastructure capable of support for faster, more accurate and more equitable therapeutic development.

OpenBind’s consortium includes specialists in high-throughput X-ray crystallography, automated microscale chemistry, protein-ligand binding assays, machine learning model development and all under an infrastructure built using the FAIR principles of data curation.

Researchers stated that this multidisciplinary integration remains central to the initiative’s ability to generate reproducible, high-quality datasets suitable for large-scale AI applications in biomedical research.


Lab Asia 33.2 April

Explore our Digital Edition

Discover the latest news and research

Digital edition

Explore Our Other Sites

Envirotech Online
Rack-mountable FTIR gas analyser for integrated multi-gas analysis in fixed measurement systems
Explore more Arrow
Pollution Solutions Online
Queen’s University Belfast leads £2.9 million international project to combat Malaysia’s growing e-waste crisis
Explore more Arrow
Petro Online
Free webinar: enhancing accuracy and efficiency in renewable fuel laboratory testing
Explore more Arrow
Chromatography Today
Chromatography and XFEL imaging reveal critical point behind water’s behaviour
Explore more Arrow