[Paper] Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

Published: 3 weeks ago (January 14, 2026 at 01:45 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.09693v1

Overview

The paper introduces ConGLUDe (Contrastive Geometric Learning for Unified Computational Drug Design), a single neural model that learns from both protein structures and ligand activity data at the same time. By marrying a geometric protein encoder with a fast ligand encoder and training them with contrastive objectives, the authors demonstrate a unified framework that can predict binding pockets, perform virtual screening, and even “fish” for targets—all without needing pre‑defined pocket annotations.

Key Contributions

Unified architecture that jointly processes whole‑protein 3D geometry and ligand chemistry, eliminating the need for separate structure‑based and ligand‑based pipelines.
Contrastive geometric learning that aligns ligand embeddings with (i) a global protein representation and (ii) multiple candidate binding‑site embeddings, enabling pocket‑agnostic training.
Ligand‑conditioned pocket prediction: the model can suggest likely binding sites given a ligand, a capability rarely available in existing tools.
Zero‑shot virtual screening: achieves state‑of‑the‑art performance on benchmark datasets where no pocket information is supplied at inference time.
Target‑fishing superiority: outperforms prior methods on a challenging dataset that requires matching a ligand to its correct protein among many candidates.
Scalable training on a mix of high‑resolution protein‑ligand complexes and massive bioactivity (e.g., ChEMBL‑like) datasets, paving the way toward foundation models for drug discovery.

Methodology

Protein Encoder – A geometric deep‑learning network (e.g., a graph transformer) ingests the full 3D coordinates of a protein, producing:
- a global protein embedding, and
- local embeddings for a set of candidate binding sites generated on‑the‑fly (no pre‑defined pockets required).
Ligand Encoder – A lightweight, message‑passing graph neural network converts SMILES or 3D ligand conformers into a fixed‑size vector.
Contrastive Objective – During training, the model receives pairs of (protein, ligand) that are known binders. It maximizes the similarity between the ligand vector and:
- the global protein vector, and
- the embedding of the correct binding site among the candidates.
  Simultaneously, it pushes apart embeddings from mismatched protein–ligand pairs, encouraging the network to learn discriminative, geometry‑aware representations.
Joint Data Regime – The loss is applied both on curated protein‑ligand complex structures (high‑resolution) and on large‑scale bioactivity tables where only the protein identifier and ligand activity are known. This hybrid training lets the model benefit from the richness of structural data while scaling to millions of activity measurements.
Inference Modes –
- Virtual screening: rank a library of ligands for a target protein using only the global protein embedding.
- Target fishing: rank proteins for a query ligand using the ligand embedding.
- Pocket prediction: given a ligand, select the most compatible site among the candidate pockets.

Results & Findings

Task	Benchmark	Metric (higher = better)	ConGLUDe vs. Prior Art
Zero‑shot virtual screening (no pocket)	DUD‑E, LIT‑PCBA	ROC‑AUC ↑ 5–12 %	Sets new SOTA
Target fishing (ligand → protein)	GPCR‑Bioactivity set	Top‑1 accuracy ↑ 8 %	Beats DeepAffinity, GraphDTA
Ligand‑conditioned pocket selection	Binding‑MOE dataset	Recall@5 ↑ 7 %	Competitive with pocket‑specific models

Key takeaways:

The model retains strong performance even when the pocket is unknown at test time, confirming that the global protein embedding captures enough structural context.
Joint training on heterogeneous data yields a noticeable boost over models trained on either structural or activity data alone.
The contrastive alignment learns a shared latent space where proteins and ligands that truly interact are close together, which is the core reason for the cross‑task success.

Practical Implications

Accelerated hit discovery – Researchers can run a single virtual‑screening pass on a protein of interest without first defining binding pockets, saving weeks of manual pocket detection.
Rapid repurposing – The target‑fishing capability lets drug‑repositioning teams query a ligand against thousands of proteins in one forward pass, facilitating quick hypothesis generation.
Integrated pipelines – Companies can replace separate structure‑based docking and ligand‑based QSAR modules with ConGLUDe, reducing engineering overhead and data duplication.
Foundation‑model potential – Because the architecture scales to massive bioactivity corpora, it could serve as a pre‑trained backbone for downstream tasks such as ADMET prediction, de‑novo ligand generation, or protein‑protein interaction modeling.
Resource efficiency – The ligand encoder is lightweight, and the protein encoder works on whole‑protein graphs, meaning inference can be run on a single GPU for libraries of millions of compounds, fitting comfortably into existing high‑throughput screening workflows.

Limitations & Future Work

Dependence on high‑quality 3D structures – While the model can operate with predicted structures (e.g., AlphaFold), performance degrades when the input geometry is noisy.
Candidate pocket generation – The current heuristic for proposing sites may miss cryptic or highly flexible pockets; integrating dynamic pocket detection could improve coverage.
Interpretability – The contrastive latent space is powerful but opaque; future work could add attention‑based visualizations to explain why a ligand is matched to a particular site.
Scaling to ultra‑large libraries – Though inference is fast, training on billions of activity points may require distributed training strategies and memory‑efficient graph representations.

Overall, ConGLUDe marks a significant step toward a single, versatile model that bridges the long‑standing divide between structure‑based and ligand‑based drug design, opening new avenues for faster, more integrated discovery pipelines.

Authors

Lisa Schneckenreiter
Sohvi Luukkonen
Lukas Friedrich
Daniel Kuhn
Günter Klambauer

Paper Information

arXiv ID: 2601.09693v1
Categories: cs.LG, stat.ML
Published: January 14, 2026
PDF: Download PDF

[Paper] Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management