# -*- coding: utf-8 -*- """ Created on Tue Aug 18 22:54:42 2015 @author: CBDD Group, CSU, China """ <1>. Data 1. The targets properties file: The file contains : Uniprot ID, Protein name, Protein function, PDB name, PDB url BioGrID ID, BioGrID url, DrugBank ID, DrugBank url, GuIDetoPHARMACOLOGY ID, GuIDetoPHARMACOLOGY url, PharmGKB ID, PharmGKB url, KEGG ID KEGG url, BioCyc ID, BioCyc url, EntrezGene ID, EntrezGene url, DIP ID, DIP url, STRING ID, STRING url, MINT ID, MINT url, IntAct ID, IntAct url, DMDM ID, DMDM url, BRENDA ID, BRENDA url, Reactome ID, Reactome url, SignaLink ID, SignaLink url, BindingDB, BandingDB_url. 2. The molecules about target file: The file contains molecules' CHEMbl ID, smi format, the bioactivity value about the target. 3. The model data file: The file contains the positive part and the negative part. Each part contains 161 files named by their uniprot ID. <2>. The performance of models 1. The performance of the model calculated with FP2 fingerprint: The file contains 161 targets' model evaluation inculding Accuracy, AUC, F1 score, etc. 2. The performance of the model calculated with ECFP2 fingerprint: The file contains 161 targets' model evaluation inculding Accuracy, AUC, F1 score, etc. 3. The performance of the model calculated with ECFP4 fingerprint: The file contains 161 targets' model evaluation inculding Accuracy, AUC, F1 score, etc. 4. The performance of the model calculated with ECFP6 fingerprint: The file contains 161 targets' model evaluation inculding Accuracy, AUC, F1 score, etc. 5. The performance of the model calculated with MACCS fingerprint: The file contains 161 targets' model evaluation inculding Accuracy, AUC, F1 score, etc. 6. The performance of the model calculated with Daylight fingerprint: The file contains 161 targets' model evaluation inculding Accuracy, AUC, F1 score, etc. <3>.The models The file contains 161 targets' models. Each model contains one 'pkl' file. <4>. The scripts 1. Before using these scripts, It requires some Python packages(pybel, numpy, pydpi, rdkit). 2. You can input your molecule in format of *.smi and the path of the 'pkl' file to get the prediction. 3. The prediction contains two parts.The first part is the predicted label which presents as 0 representing negative or 1 representing positive. The second part is the predicted value from 0 to 1 represents the positive posibility of the molecule.