Introduction

Examples of current issues in existing datasets that impede the progress of applying AI in small molecule drug discovery:

Current Issues

WelQrate Evaluation Framework

WelQrate Evaluation Framework establishes a foundation for benchmarking by focusing on three key aspects:

WelQrate Evaluation Framework

WelQrate Dataset Collection Curation Pipeline

The WelQrate dataset collection was meticulously curated by reviewing experimental data hosted on PubChem and applying multiple filters along with hierarchical curation to ensure the high quality.
(a) Curation pipeline overview
(b) An example of the hierarchical curation with AID2258.

Curation Pipeline Overview

WelQrate Dataset Collection Statistics

Dataset Statistics

Statistics of our 9 datasets in WelQrate dataset collection, which has coverage of various important drug targets, challenging but realistic low active percentages. * Indicates additional experimental measurements are available for those datasets.

WelQrate Evaluation Metrics

In practice, only the top-ranked predicted molecules are selected for experimental validation. The WelQrate evaluation framework proposes four metrics to align with this practical approach.

Evaluation Metrics

WelQrate Data Split

WelQrate provides two data split scheme, each with five different splits per dataset. Reported formance should be averaged across five splits to ensure robustness.

Data Split

Our Team

Universities

Vanderbilt University Leipzig University

Labs

NDS Meiler Lab