GitHub - Cschiebroek/Digital_Chemistry

Installation

 mamba env create -f environment.yml
 conda activate chemprop

Alternatively install from sourece

git clone https://github.com/chemprop/chemprop.git
cd chemprop
conda env create -f environment.yml
conda activate chemprop

You can download the data used for the models described here from the GitHub OPERA github page https://github.com/kmansouri/OPERA/

Cleaning up repo for chemprop workflow --> Carl
Hyperparameter optimization --> Carl
Comparison with Opera models (use same splits for STL) --> Enrico
Varying the number of tasks in the multitask learning --> Enrico main (Riccardo Carl supporting)
SAMPL challenge benchmark prediction comparison (check if no overlap OPERA-SAMPL) --> Riccardo
Make notebook to make the main plot to use as a template for all the following tasks --> Enrico + Riccardo
Try to do PCA/clustering to select very few datapoints to decide what models will give best performance --> Riccardo at first then Enrico
Assess effect of variance when taking small subset --> Carl
Assess effect of training set size --> Domen

Title: "Deep learning with 20 datapoints"?
Abstract (inlcuding motivation)
Methods
Results
- External dataset plot
Conclusions with perspective outlook
- All the ideas that we don't have time to try

Meeting:

23.05.24 1pm - 2pm First sharing of results
- Comparison with SAMPL: both MTL and STL give very good results, molecules in SAMPL6 are very similar to training data
  - (for the report) get molecules from the training set that are closest to SAMPL6 molecules
  - (for the report) See how well Opera represents a more "general" chemical space by comparing the PCAs
  - (for the report) Find a dataset that IS different? (so maybe MTL will work better than STL)
28.05.24 1pm - 5pm Making the poster

Deadline: May 30th 2024 1:45pm (poster session)

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
modules		modules
.gitignore		.gitignore
1_Data.ipynb		1_Data.ipynb
2_chem_space.ipynb		2_chem_space.ipynb
3_chemprop_result_analysis.ipynb		3_chemprop_result_analysis.ipynb
OPERA_models_comparison.ipynb		OPERA_models_comparison.ipynb
README.md		README.md
environment.yml		environment.yml
logP_experimental_values.csv		logP_experimental_values.csv
low_data_SAMPL6_template.ipynb		low_data_SAMPL6_template.ipynb
low_data_different_sizes.ipynb		low_data_different_sizes.ipynb
low_data_different_sizes_variance.ipynb		low_data_different_sizes_variance.ipynb
run_stl_mtl_basic.sh		run_stl_mtl_basic.sh
run_stl_mtl_hp_opt.sh		run_stl_mtl_hp_opt.sh
test_PCA_training.ipynb		test_PCA_training.ipynb