Modeling process

Preparation of the chemicals and AD-related proteins

We used BindingDB database as our training datasets. BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules. For AD-related proteins, activity data were filtered to keep only activity end-point points that had half-maximum inhibitory concentration (IC50), half-maximum effective concentration (EC50) or Ki values. Herein, to ensure that enough number of molecules could be used in model building, we previously selected those targets with larger than 200 biological activity data. Following this procedure, 109061 compounds associated with 161 AD-related proteins remained with 115257 activity end-points, which were used for model building.

Preparation of the positive and negative set

For those compounds with more than one activity values, we took the mean value of their activity values as the final activity value. A compound was considered active when the mean activity value was below 10 uM. All compounds higher than 10 uM are considered inactive. Following this split, maybe some AD-related proteins have very little number of negative samples. To balance the number between positive samples and negative samples for each AD-related protein, we randomly selected certain number of compounds from other AD-related proteins to generate the negative samples for these AD-related proteins. The number of these selected negative samples together with inactive samples should be basically equal to the number of the active samples for these AD-related proteins. These prepared positive set and negative set were used as the subsequent model building.

Model training and validation

A series of high confidence QSAR models were built by Naïve Bayes and different fingerprint representations for 161 proteins. The Naïve Bayes method for predicting the associations between AD-related proteins and chemicals was chosen as it provided both good performance for noisy data sets and a high speed of calculation. Herein, to obtain the best model performance, we compared 6 types of molecular fingerprints when establishing the prediction models, including FP2, MACCS, Daylight, ECFP2-2048, ECFP4-2048, and ECFP6-2048. To obtain the better prediction ability, we also ensemble all fingerprint models to obtain the average output. For each model, we applied five-fold cross validation and external validation to evaluate the prediction performance of models. For 5-fold cross validation, the data set is split into 5 roughly equal-sized parts firstly, and then we fit the model to four parts of the data and calculate the error rate of the other part. The process is repeated 5 times so that every part can be predicted as a validation set. To observe the stability of models, we repeated the cross validation program 10 times to report standard deviations of each statistics. For the external validation, the data were split in two parts for the validation step: compounds were clustered and assigned a cluster number. Clusters with an odd number were assigned to the test set, and the clusters with an even number were assigned to the training set. Models were built with the training set, and the test set was scored. Finally, a model was built with all data and scored against itself – the training set and whole set should provide similar validation statistics. Statistics on the performance of the models were reported, including commonly used ones in classification schemes: accuracy, sensitivity, specificity, AUC, and F-score values. ROC provides an overall score and does not need to specify a cut-off for distinguishing active from inactive compounds. The area under the receiver operating characteristic (ROC) curve provides an indication of the ability of the model to prioritize active compounds over inactive compounds. The ROC curve is the plot of the true positive versus the false positive rate.

  Model Performance


ROC curves
  All targets
click the links below to view all corresponding targets

Models using different types of fingerprints

The 161 targets
Uniprot_ID Protein Details
O00519 Fatty-acid amide hydrolase 1 View
O14672 Disintegrin and metalloproteinase domain-containing protein 10 View
O95263 High affinity cAMP-specific and IBMX-insensitive 3',5'-cyclic phosphodiesterase 8B View
P00488 Coagulation factor XIII A chain View
P00491 Purine nucleoside phosphorylase View
P00519 Tyrosine-protein kinase ABL1 View
P00747 Plasminogen View
P00749 Urokinase-type plasminogen activator View
P00750 Tissue-type plasminogen activator View
P00797 Renin View
P01275 Glucagon View
P01375 Tumor necrosis factor View
P02766 Transthyretin View
P03372 Estrogen receptor View
P03956 Interstitial collagenase View
P04035 3-hydroxy-3-methylglutaryl-coenzyme A reductase View
P04049 RAF proto-oncogene serine/threonine-protein kinase View
P04062 Glucosylceramidase View
P04156 Major prion protein View
P04439 HLA class I histocompatibility antigen, A-3 alpha chain View
P04626 Receptor tyrosine-protein kinase erbB-2 View
P04629 High affinity nerve growth factor receptor View
P04798 Cytochrome P450 1A1 View
P05067 Amyloid beta A4 protein View
P05091 Aldehyde dehydrogenase, mitochondrial View
P05093 Steroid 17-alpha-hydroxylase/17,20 lyase View
P05121 Plasminogen activator inhibitor 1 View
P05164 Myeloperoxidase View
P05230 Fibroblast growth factor 1 View
P05362 Intercellular adhesion molecule 1 View
P06213 Insulin receptor View
P06239 Tyrosine-protein kinase Lck View
P06241 Tyrosine-protein kinase Fyn View
P06276 Cholinesterase View
P06493 Cyclin-dependent kinase 1 View
P07339 Cathepsin D View
P07384 Calpain-1 catalytic subunit View
P07550 Beta-2 adrenergic receptor View
P07858 Cathepsin B View
P08069 Insulin-like growth factor 1 receptor View
P08172 Muscarinic acetylcholine receptor M2 View
P08183 Multidrug resistance protein 1 View
P08253 72 kDa type IV collagenase View
P08254 Stromelysin-1 View
P08311 Cathepsin G View
P08473 Neprilysin View
P08908 5-hydroxytryptamine receptor 1A View
P08913 Alpha-2A adrenergic receptor View
P09874 Poly [ADP-ribose] polymerase 1 View
P09917 Arachidonate 5-lipoxygenase View
P10275 Androgen receptor View
P10635 Cytochrome P450 2D6 View
P11086 Phenylethanolamine N-methyltransferase View
P11229 Muscarinic acetylcholine receptor M1 View
P11274 Breakpoint cluster region protein View
P11473 Vitamin D3 receptor View
P11511 Aromatase View
P12821 Angiotensin-converting enzyme View
P13569 Cystic fibrosis transmembrane conductance regulator View
P14174 Macrophage migration inhibitory factor View
P14324 Farnesyl pyrophosphate synthase View
P14555 Phospholipase A2, membrane associated View
P14780 Matrix metalloproteinase-9 View
P15559 NAD(P)H dehydrogenase [quinone] 1 View
P16050 Arachidonate 15-lipoxygenase View
P16083 Ribosyldihydronicotinamide dehydrogenase [quinone] View
P16109 P-selectin View
P17612 cAMP-dependent protein kinase catalytic subunit alpha View
P17787 Neuronal acetylcholine receptor subunit beta-2 View
P18054 Arachidonate 12-lipoxygenase, 12S-type View
P19438 Tumor necrosis factor receptor superfamily member 1A View
P19525 Interferon-induced, double-stranded RNA-activated protein kinase View
P21397 Amine oxidase [flavin-containing] A View
P21728 D(1A) dopamine receptor View
P21730 C5a anaphylatoxin chemotactic receptor 1 View
P21917 D(4) dopamine receptor View
P22102 Trifunctional purine biosynthetic protein adenosine-3 View
P22303 Acetylcholinesterase View
P23219 Prostaglandin G/H synthase 1 View
P23443 Ribosomal protein S6 kinase beta-1 View
P24385 G1/S-specific cyclin-D1 View
P25025 C-X-C chemokine receptor type 2 View
P25098 Beta-adrenergic receptor kinase 1 View
P27338 Amine oxidase [flavin-containing] B View
P27695 DNA-(apurinic or apyrimidinic site) lyase View
P27986 Phosphatidylinositol 3-kinase regulatory subunit alpha View
P28223 5-hydroxytryptamine receptor 2A View
P28335 5-hydroxytryptamine receptor 2C View
P28482 Mitogen-activated protein kinase 1 View
P28845 Corticosteroid 11-beta-dehydrogenase isozyme 1 View
P29274 Adenosine receptor A2a View
P29466 Caspase-1 View
P29474 Nitric oxide synthase, endothelial View
P29475 Nitric oxide synthase, brain View
P30291 Wee1-like protein kinase View
P30556 Type-1 angiotensin II receptor View
P30926 Neuronal acetylcholine receptor subunit beta-4 View
P31645 Sodium-dependent serotonin transporter View
P31749 RAC-alpha serine/threonine-protein kinase View
P32297 Neuronal acetylcholine receptor subunit alpha-3 View
P34972 Cannabinoid receptor 2 View
P34998 Corticotropin-releasing factor receptor 1 View
P35228 Nitric oxide synthase, inducible View
P35354 Prostaglandin G/H synthase 2 View
P35462 D(3) dopamine receptor View
P35610 Sterol O-acyltransferase 1 View
P35869 Aryl hydrocarbon receptor View
P36544 Neuronal acetylcholine receptor subunit alpha-7 View
P37231 Peroxisome proliferator-activated receptor gamma View
P39900 Macrophage metalloelastase View
P41231 P2Y purinoceptor 2 View
P42261 Glutamate receptor 1 View
P42262 Glutamate receptor 2 View
P42345 Serine/threonine-protein kinase mTOR View
P42574 Caspase-3 View
P43681 Neuronal acetylcholine receptor subunit alpha-4 View
P48039 Melatonin receptor type 1A View
P48147 Prolyl endopeptidase View
P49682 C-X-C chemokine receptor type 3 View
P49768 Presenilin-1 View
P49810 Presenilin-2 View
P49840 Glycogen synthase kinase-3 alpha View
P49841 Glycogen synthase kinase-3 beta View
P49862 Kallikrein-7 View
P50052 Type-2 angiotensin II receptor View
P50406 5-hydroxytryptamine receptor 6 View
P51677 C-C chemokine receptor type 3 View
P51681 C-C chemokine receptor type 5 View
P52732 Kinesin-like protein KIF11 View
P55055 Oxysterols receptor LXR-beta View
P55210 Caspase-7 View
P55211 Caspase-9 View
P55212 Caspase-6 View
P56817 Beta-secretase 1 View
P78536 Disintegrin and metalloproteinase domain-containing protein 17 View
Q00535 Cyclin-dependent-like kinase 5 View
Q01959 Sodium-dependent dopamine transporter View
Q02750 Dual specificity mitogen-activated protein kinase kinase 1 View
Q03181 Peroxisome proliferator-activated receptor delta View
Q05586 Glutamate receptor ionotropic, NMDA 1 View
Q07343 cAMP-specific 3',5'-cyclic phosphodiesterase 4B View
Q07869 Peroxisome proliferator-activated receptor alpha View
Q13224 Glutamate receptor ionotropic, NMDA 2B View
Q13526 Peptidyl-prolyl cis-trans isomerase NIMA-interacting 1 View
Q13627 Dual specificity tyrosine-phosphorylation-regulated kinase 1A View
Q13936 Voltage-dependent L-type calcium channel subunit alpha-1C View
Q14289 Protein-tyrosine kinase 2-beta View
Q14790 Caspase-8 View
Q15078 Cyclin-dependent kinase 5 activator 1 View
Q15303 Receptor tyrosine-protein kinase erbB-4 View
Q16572 Vesicular acetylcholine transporter View
Q16620 BDNF/NT-3 growth factors receptor View
Q16739 Ceramide glucosyltransferase View
Q5S007 Leucine-rich repeat serine/threonine-protein kinase 2 View
Q8WW43 Gamma-secretase subunit APH-1B View
Q92731 Estrogen receptor beta View
Q96EB6 NAD-dependent protein deacetylase sirtuin-1 View
Q99720 Sigma non-opioid intracellular receptor 1 View
Q9NR96 Toll-like receptor 9 View
Q9NZ42 Gamma-secretase subunit PEN-2 View
Q9Y5Z0 Beta-secretase 2 View

Copyright @ 2012-2014 Computational Biology & Drug Design Group,
School of Pharmaceutical Sciences, Central South University. All rights reserved.

The recommended browsers: Safari, Firefox, Chrome,IE(Ver.>8).
 E-mail: biomed@csu.edu.cn