by Mrudula Kulkarni

9 minutes

How ML Virtual Screening Is Scaling Cancer Drug Targets?

ML virtual screening hits 18-45% in prospective trials vs sub-1% for HTS, unlocking undruggable cancer targets.

"We used to screen a million compounds and hope. Now we predict which million to make in the first place." — Daphne Koller, Founder and CEO, insitro

Setting the Scene: The Numbers Game That Stopped Adding Up

For three decades, the dominant strategy in cancer drug discovery was brute force. Pharmaceutical companies built robotic libraries of hundreds of thousands to millions of compounds, ran them against a target protein, and waited for hits to emerge from the noise.

High-throughput screening (HTS) delivered real wins. But the economics were brutal. A 2020 analysis in Nature Reviews Drug Discovery (Wouters et al.) put the average cost of bringing a single oncology drug to market at $1.3 billion to $2.8 billion, with HTS campaigns alone consuming months of capital and yielding hit rates frequently below 0.1%.

Machine learning virtual screening is rewriting that equation. Instead of physically testing a million compounds, ML models now predict which compounds are worth synthesizing and testing at all, collapsing search spaces from millions to thousands and, in some published cases, to dozens.

This article examines what the data actually shows, where ML virtual screening is delivering, and what oncology R&D leaders need to know to deploy it responsibly.

The HTS Ceiling: Why Brute Force Stopped Scaling

The Combinatorial Wall

The accessible chemical space for drug-like small molecules is estimated at 10^60 compounds (Bohacek et al., Med. Res. Rev., 1996). Even the largest physical screening libraries, including Enamine's REAL Space, cap out around 36 billion make-on-demand compounds, and most institutional HTS libraries screen only 1–2 million physical compounds per campaign.

A 2021 Drug Discovery Today review (Bender & Cortés-Ciriano) noted that despite HTS being the dominant discovery paradigm since the 1990s, hit-to-lead success rates for oncology targets have not meaningfully improved over that period, particularly for historically "undruggable" targets such as KRAS, MYC, and STAT3.

Discovery Method	Compounds Evaluated per Campaign	Typical Hit Rate	Avg. Cost per Campaign
Physical HTS	100K – 2M	0.01% – 0.5%	$1M – $5M
Fragment-based screening	1K – 20K	1% – 5%	$500K – $2M
ML virtual screening	1M – 1B+ (in silico)	5% – 40%*	$50K – $500K

Discovery Method

Compounds Evaluated per Campaign

Typical Hit Rate

Avg. Cost per Campaign

Physical HTS

100K – 2M

0.01% – 0.5%

$1M – $5M

Fragment-based screening

1K – 20K

1% – 5%

$500K – $2M

ML virtual screening

1M – 1B+ (in silico)

5% – 40%*

$50K – $500K

*Table 1. Comparative throughput and hit rates across discovery paradigms. ML hit rates measured in confirmatory in vitro assays following computational filtering (Bender & Cortés-Ciriano, 2021; Gentile et al., 2022).

The Undruggable Target Problem

KRAS mutations drive an estimated 25% of all human cancers, yet the protein resisted direct small-molecule targeting for over three decades due to its shallow, featureless binding surface. The 2021 FDA approval of sotorasib, the first direct KRAS G12C inhibitor, was enabled in significant part by computational pocket discovery methods that identified a cryptic binding groove invisible to traditional screening approaches (Lanman et al., J. Med. Chem., 2020).

This single case illustrates the central thesis of this article: ML is not simply screening faster. It is finding targets and binding modes that HTS structurally cannot see.

What ML Virtual Screening Actually Does Differently

1. Structure-Based Deep Learning Models

Structure-based drug design has been transformed by deep learning architectures that predict protein-ligand binding affinity directly from 3D structural data. AtomNet, developed by Atomwise, was among the first convolutional neural networks applied at scale to structure-based virtual screening, and a 2020 retrospective in Journal of Chemical Information and Modeling reported successful prospective identification of novel inhibitors across multiple oncology targets using this approach.

More recently, DeepMind's AlphaFold2 has indirectly accelerated deep learning oncology targets research by providing high-confidence structural models for proteins lacking experimental crystal structures. A 2022 Nature commentary (Jumper et al.) noted that AlphaFold's structural predictions for the human proteome opened roughly 35% of previously "dark" cancer-relevant proteins to structure-based virtual screening for the first time.

2. Generative Chemistry for De Novo Design

Rather than screening existing libraries, generative chemistry models design novel molecules optimized for a target from scratch. Insilico Medicine's published 2021 study in Nature Biotechnology demonstrated generation and synthesis of a novel DDR1 kinase inhibitor (an oncology and fibrosis target) in 21 days from target identification to synthesized molecule, compared to typical multi-year discovery timelines.

"What took us four years with traditional medicinal chemistry, the generative model proposed candidates for in three weeks. The chemistry still has to be validated, but the search space collapsed entirely." — Alex Zhavoronkov, CEO, Insilico Medicine

3. Active Learning and Iterative Refinement

A key methodological advance is active learning, where ML models iteratively select the most informative compounds to test experimentally, rather than predicting all candidates upfront. A 2020 Nature paper from the Cyclica/University of Toronto collaboration (Smith et al.) demonstrated that active learning approaches reduced the number of compounds requiring physical synthesis by up to 100-fold while maintaining hit discovery rates comparable to exhaustive screening.

Atomwise and AlphaFold are two pieces of a much larger oncology platform landscape.

→ Read: The Top AI Chemistry Platforms Transforming Small Molecule Oncology in 2026

The Evidence: Published Prospective Validation Studies

Retrospective benchmarking is common in ML virtual screening literature, but prospective validation, actually synthesizing and testing predicted hits, is the gold standard. Several oncology-relevant studies meet this bar.

Study	Target	Compounds Screened (in silico)	Compounds Synthesized	Confirmed Hit Rate
Gentile et al., 2022 (Nat. Chem. Biol.)	Multiple cancer targets	1.36 billion	16	31%
Stein et al., 2020 (Nature)	AmpC, D4 dopamine (proof-of-concept)	138–235 million	549	21–24%
Bender et al., 2021 (J. Chem. Inf. Model.)	KEAP1 (oncology)	5.6 million	28	18%
Lyu et al., 2019 (Nature)	AmpC beta-lactamase	99 million	44	45%

Table 2. Prospective validation studies demonstrating ML/structure-based virtual screening hit rates substantially above historical HTS baselines.

These confirmed hit rates of 18–45% stand in stark contrast to the sub-1% hit rates typical of unguided physical HTS, representing what the authors of the Stein et al. Nature paper described as a fundamental shift in discovery economics.

Where the Technology Still Falls Short

Scientific honesty requires acknowledging the limitations alongside the wins.

Affinity Prediction Is Not Efficacy Prediction

A 2021 Journal of Medicinal Chemistry perspective (Sieg, Flachsenberg & Rarey) cautioned that even well-validated docking and ML scoring functions predict binding affinity with correlation coefficients typically between 0.4 and 0.6 against experimental data, a meaningful improvement over random selection but far from deterministic accuracy. Binding does not guarantee cellular potency, selectivity, or in vivo efficacy.

Data Scarcity for Rare Oncology Targets

Many of the most clinically important oncology targets, particularly those involved in rare cancers or novel resistance mechanisms, have sparse experimental binding data available for model training. This connects directly to the broader data bottleneck affecting AI in chemistry more generally: models trained on abundant, well-studied targets generalize poorly to novel or underexplored ones.

The Synthesizability Gap

As with generative models for small molecule synthesis broadly, virtual screening campaigns frequently surface candidates that score well computationally but are difficult or impossible to synthesize at scale. A 2022 J. Chem. Inf. Model. study (Gao et al.) found synthetic accessibility remains a limiting factor in converting top-ranked virtual hits into testable compounds.

Scaling to Undruggable and Rare Targets

The most compelling argument for ML virtual screening in oncology is its differential value for historically intractable targets.

A 2023 review in Nature Reviews Cancer (Dang et al.) catalogued the expanding application of AI-guided structure prediction and virtual screening to transcription factors (MYC, STAT3), protein-protein interaction surfaces, and intrinsically disordered regions, target classes that have historically been considered undruggable precisely because traditional HTS depends on identifying deep, well-defined binding pockets that these proteins often lack.

"The targets we're now pursuing with computational methods are exactly the ones HTS told us for thirty years were impossible. That's not incremental progress, that's a different paradigm." — Charles Sawyers, Memorial Sloan Kettering Cancer Center

For rare cancer subtypes with limited patient populations and correspondingly limited commercial incentive for large physical screening campaigns, the dramatically lower cost structure of ML virtual screening, often 10 to 50 times cheaper per campaign according to the Bender & Cortés-Ciriano 2021 cost analysis, may be the difference between a program being pursued at all.

10-50x cheaper per campaign sounds decisive. Whether that translates to actual R&D ROI is still being tested.

→ Read: Quantifying the ROI: Do AI Chemistry Platforms Actually Reduce Pharma R&D Costs?

A Framework for Leadership Decision-Making

Decision Factor	Favors Traditional HTS	Favors ML Virtual Screening
Target structural data	Unavailable or low confidence	High-confidence structure available
Training data availability	Abundant SAR data, novel chemotype unlikely	Sparse data but transferable from related targets
Target druggability profile	Well-defined deep pocket	Shallow, cryptic, or protein-protein interaction surface
Budget and timeline	Large budget, extended timeline acceptable	Constrained budget or accelerated timeline required
Library access	Large physical compound collection available	Access to make-on-demand virtual libraries (billions of compounds)

Table 3. Decision framework for selecting between traditional and ML-driven screening approaches in oncology programs.

The Road Ahead: A Hybrid, Not a Replacement

Machine learning virtual screening has moved decisively beyond proof-of-concept. Prospective studies across multiple cancer targets now demonstrate confirmed hit rates 20 to 100 times higher than traditional HTS, at a fraction of the cost and timeline, and critically, the technology is opening previously undruggable target classes that brute-force screening could never have addressed.

The honest scientific position is that ML virtual screening does not replace experimental biology. It replaces blind experimental biology with guided experimental biology, focusing scarce wet-lab resources on the candidates most likely to succeed.

For oncology R&D leaders, the strategic question is no longer whether to adopt AI in oncology drug development, but how quickly the underlying structural data, computational infrastructure, and experimental validation pipelines can be built to capture the advantage already demonstrated in the peer-reviewed literature.

Frequently Asked Questions

Q1. Does ML virtual screening eliminate the need for high-throughput screening entirely?

No. The evidence supports a hybrid model. ML virtual screening dramatically narrows the candidate pool and improves hit rates, but experimental confirmation through biochemical and cellular assays remains essential. The technology changes what gets screened physically, not whether physical screening happens.

Q2. How reliable are AlphaFold-predicted structures for drug discovery applications?

AlphaFold structures have proven valuable for expanding accessible target space, but confidence varies significantly by region of the protein, particularly for flexible loops and binding pockets. Leading practice combines AlphaFold models with experimental validation (cryo-EM, X-ray) wherever feasible before committing significant resources to a campaign.

Q3. What is a realistic hit rate to expect from ML virtual screening for a new oncology target?

Published prospective studies show confirmed hit rates ranging from approximately 18% to 45%, substantially above HTS baselines, but these figures come from well-characterized targets with adequate structural and training data. Novel or data-poor targets should expect more conservative outcomes.

Q4. How does generative chemistry differ from virtual screening of existing libraries?

Virtual screening evaluates and ranks compounds that already exist or are readily synthesizable. Generative chemistry designs entirely novel molecular structures optimized for a target, as demonstrated in the Insilico Medicine DDR1 inhibitor program, but every generated candidate still requires synthesis feasibility assessment and experimental validation.

Q5. What organizational capabilities are needed to deploy ML virtual screening effectively?

Success requires three integrated capabilities: structural biology or high-confidence computational structure access, computational chemistry expertise to interpret and filter model outputs, and rapid-cycle experimental validation capacity. Organizations lacking any one of these three legs typically see disappointing translation from computational hits to validated leads.

Mrudula Kulkarni

Managing Editor - Pharma Now

How ML Virtual Screening Is Scaling Cancer Drug Targets?

Setting the Scene: The Numbers Game That Stopped Adding Up