AI model could generate 1060 chemically valid novel molecules for drug discovery
[From left] Roger Guimerà, Manuel Ruiz-Botella and Marta Sales, from the department of chemical engineering at the Universitat Rovira i Virgili, Tarragona, Spain, who led the research. Credit: URV

Research news

AI model could generate 1060 chemically valid novel molecules for drug discovery

18 May, 2026


Researchers at Universitat Rovira i Virgili have developed CoCoGraph, an AI tool that can generate millions of plausible molecular structures and could help to accelerate future drug discovery


Researchers in Spain have developed an artificial intelligence (AI) tool capable of generating millions of plausible molecules that are unknown to science but comply with the laws of chemistry.

The system – called CoCoGraph – has been developed by a team at the Universitat Rovira i Virgili in Tarragona, Spain. The researchers said the model could represent an important step towards AI systems which are able to propose bespoke molecules for drug discovery, pharmacology and materials science.

To identify and develop useful molecules remains one of the central challenges of modern chemistry. Novel medicines, greener industrial processes and more sustainable materials all depend on the discovery of atomic structures with valuable properties. However, the range of possible molecular structures is vast. Researchers estimate that there could be as many as 10⁶⁰ – a novemdecillion – possible molecules, a number greater than the number of water molecules in all the oceans of planet Earth combined. By contrast, the number of molecules already known to science accounts for only a minute fraction of that total, according to estimates made on PubChem amounting to only 322 million substances.

CoCoGraph has been designed to explore this largely uncharted chemical space by producing molecules that are not merely artificial outputs but chemically credible structures. The model works in a way that resembles generative AI systems used to create text or images – such as ChatGPT or DALL·E – but applies the principle to molecular structures.

“These models create novel content that looks very much like the real thing. Our algorithm does the same [except] with molecules,” said Dr. Roger Guimerà, an ICREA research professor in the department of chemical engineering at URV.

The model does not yet respond to detailed scientific instructions, such as requests for a molecule with a specified biological activity or material property. Instead, it performs a more fundamental task of generating molecular structures that obey chemical rules. That basic capability is significant because a model that produces chemically impossible structures has limited practical value.

Even this initial task is computationally demanding. If the system is given a single molecular formula, such as that of paracetamol, it can construct a vast range of possible atomic arrangements. Only a small proportion of those arrangements will be chemically viable. The challenge therefore lies not simply in producing large numbers of molecules but in identifying structures that could exist in reality.

CoCoGraph uses a diffusion model, a form of AI often used to generate images. In this case, the system was trained through a process in which a real molecule was progressively disordered, with chemical bonds broken and replaced at random. The model then learnt how to reverse that process to reconstruct coherent molecular structures.

“We start with a real molecule, break the bonds and create novel ones at random. The model learns to reverse this process and reconstruct coherent structures,” said Professor Marta Sales-Pardo, of the department of chemical engineering at URV who also took part in the study.

Molecules present a particular mathematical challenge because they are discrete structures, rather than continuous image fields. A picture can be altered pixel by pixel and still remain visually interpretable. A molecule, by contrast, must obey strict rules of valence, connectivity and chemical feasibility. A single invalid bond can render the structure impossible.

The researchers said one of CoCoGraph’s main innovations was its direct incorporation of basic chemical constraints. Each atom maintains the correct number of bonds, which means that all molecules generated by the model are chemically valid. This is an important distinction from some other models, which can produce structures that appear molecule-like but could not exist under standard chemical rules.

The team also reported that CoCoGraph was more efficient than comparable systems. It used fewer parameters, required less computing power and generated molecules more rapidly. Those features may become important if such systems are to be used at scale in pharmaceutical or materials research.

To assess the quality of the generated molecules, the researchers compared CoCoGraph with other state-of-the-art models and analysed 36 physicochemical properties, including solubility and structural complexity. For approximately two-thirds of those properties, the molecules generated by CoCoGraph were judged to be more chemically realistic than those produced by rival systems.

The team also tested whether trained chemists could distinguish between real molecules and those generated by the AI model. In an experiment involving 121 chemistry experts from the university, each participant was shown 20 pairs of molecules. Each pair contained one real molecule and one generated by CoCoGraph, and the experts were asked to identify the real one.

The results showed that the experts were wrong in approximately four out of 10 cases. The researchers interpreted this as evidence that many of the generated molecules were sufficiently plausible to resemble known chemistry.

“This means that many of the molecules we generate are very convincing,” said Sales-Pardo.

Although CoCoGraph cannot yet design a molecule for a defined function, the researchers have already reported promising preliminary tests. They identified molecules with properties similar to paracetamol from among the millions generated by the model. They also explored methods to modify an existing molecule partially, a type of chemical refinement that could produce variants with similar characteristics.

Such approaches could eventually support the optimisation of drug candidates, the search for safer or more effective compounds, or the development of materials with tailored physical and chemical properties. The researchers emphasised, however, that the present system is still an early step rather than a complete molecular design platform.

“For the moment, we are only generating molecules. The next step will be to apply specific objectives to this process,” said Manuel Ruiz-Botella, a doctoral candidate who also participated in the research.

The longer-term goal is to allow researchers to ask an AI system for a molecule with specified attributes, such as solubility, low toxicity and suitability for a particular application. If that capability can be achieved, such systems could reduce the time and cost required to explore chemical space and provide researchers with credible starting points for laboratory synthesis and testing.

The study highlights both the promise and the limits of AI in chemistry. CoCoGraph does not remove the need for experimental validation, nor does it yet replace expert chemical judgement. It does, however, offer a route to explore a molecular universe so large that traditional trial-and-error approaches can address only a small part of it.


For further reading please visit: 10.1038/s42256-026-01229-5


ILM Guide 2026/27

Explore our Digital Edition

Discover the latest news and research

Digital edition

Explore Our Other Sites

Envirotech Online
Rack-mountable FTIR gas analyser for integrated multi-gas analysis in fixed measurement systems
Explore more Arrow
Pollution Solutions Online
Next-generation reverse osmosis membranes for more efficient and cost-effective seawater desalination
Explore more Arrow
Petro Online
Free webinar: enhancing accuracy and efficiency in renewable fuel laboratory testing
Explore more Arrow
Chromatography Today
Chromatography and XFEL imaging reveal critical point behind water’s behaviour
Explore more Arrow