Curation Pipeline

(a) Curation pipeline overview
(b) An example of the hierarchical curation with AID2258. image
image
Statistics of our WelQrate, consisting of 9 datasets. WelQrate features in the coverage of various important drug targets, realistic but challenging low active percentage, and different task types. All datasets are carefully curated to remove experimental artifacts. The AID in PubChem can look up the original experimental details. BC stands for binary classification, R stands for regression.


image
Performance of different models with our WelQrate dataset AID1798, along with a control dataset. The control dataset is created with actives and inactives in the primary screen, without going through any data preposessing described in the paper.


Our Team

Yunchao (Lance) Liu
Yunchao (Lance) Liu

Vanderbilt University

Ha Dong
Ha Dong

Amherst College

Xin (Allen) Wang
Xin (Allen) Wang

Vanderbilt University

Rocco Moretti
Rocco Moretti

Vanderbilt University

Yu Wang
Yu Wang

University of Oregon

Zhaoqian (Joshua) Su
Zhaoqian (Joshua) Su

Vanderbilt University

Jiawei Gu
Jiawei Gu

MD Anderson Cancer Center

Bobby Bodenheimer
Bobby Bodenheimer

Vanderbilt University

Charles David Weaver
Charles David (Dave) Weaver

Vanderbilt University

Tyler Derr
Tyler Derr

Vanderbilt University

Jens Meiler
Jens Meiler

Vanderbilt University

Leipzig University

Meiler Lab
NDS

VU
LU