WelQrate

Introduction

Examples of current issues in existing datasets that impede the progress of applying AI in small molecule drug discovery:

WelQrate Evaluation Framework

WelQrate Evaluation Framework establishes a foundation for benchmarking by focusing on three key aspects:

WelQrate Dataset Collection Curation Pipeline

The WelQrate dataset collection was meticulously curated by reviewing experimental data hosted on PubChem and applying multiple filters along with hierarchical curation to ensure the high quality.
(a) Curation pipeline overview
(b) An example of the hierarchical curation with AID2258.

WelQrate Dataset Collection Statistics

Statistics of our 9 datasets in WelQrate dataset collection, which has coverage of various important drug targets, challenging but realistic low active percentages. * Indicates additional experimental measurements are available for those datasets.

WelQrate Evaluation Metrics

In practice, only the top-ranked predicted molecules are selected for experimental validation. The WelQrate evaluation framework proposes four metrics to align with this practical approach.

WelQrate Data Split

WelQrate provides two data split scheme, each with five different splits per dataset. Reported formance should be averaged across five splits to ensure robustness.