This repository contains a Graph Neural Network (GNN) model to predict the LogP of small molecules. The model is built with PyTorch and PyTorch Lightning. The following figure shows a parity plot comparing predicted LogP values to true values on the test set.
-
Clone the repository:
git clone https://github.com/daandtu/logP-predictor.git cd logp-predictor -
Create a virtual environment and install dependencies:
python3 -m venv .venv source .venv/bin/activate pip install -e .
The command-line interface (CLI) provides several commands to download data from ChEMBL, train a model, make predictions, and evaluate performance.
First, download the LogP dataset from the ChEMBL database:
logp-predictor downloadThis will create a data/logp.csv file.
To train a new model from scratch:
logp-predictor train --epochs 100 --batch-size 64Model checkpoints will be saved in the checkpoints/ directory, and logs will be stored in logs/.
You can use a trained model to predict the LogP for a given SMILES string. Make sure to point to a valid checkpoint file.
logp-predictor predict --checkpoint checkpoints/last.ckpt --smiles "CCO"To evaluate a model on the test set and see the performance metrics:
logp-predictor evaluate --checkpoint checkpoints/last.ckptTo generate a parity plot comparing the model's predictions to the true values on the test set:
logp-predictor visualize --checkpoint checkpoints/last.ckpt --output results/parity_plot.pngThis will create an image file like the one shown at the top of this README.
