Skip to content

DS4SD/MarkushGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MarkushGenerator

This is the repository for the synthetic data generation pipeline of MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures.

Installation

  1. Create a virtual environment.
python3.10 -m venv markushgenerator-env
source markushgenerator-env/bin/activate
  1. Install MarkushGenerator.
PIP_USE_PEP517=0 pip install -e .
  1. Install Java 17.
sudo apt-get install openjdk-17-jdk
sudo update-alternatives --config 'java'
  1. Download the CDK library (version cdk-2.9.jar) from and move it to MarkushGenerator/lib/.
wget https://github.com/cdk/cdk/releases/download/cdk-2.9/cdk-2.9.jar -P ./lib/

Generation

The notebook MarkushGenerator/markushgenerator/draw.ipynb shows how to:

  1. Draw an image from a CXSMILES.

Description of the image

  1. Draw a textual definition associated with the CXSMILES.

Description of the image

Each generated sample contains the:

  • CXSMILES.
  • Optimized CXSMILES.
  • Markush structure image.
  • OCR cells, containing the position and content of text written in the images. Some characters are currently omitted such as explicit carbons and implicit hydrogens. Atoms with charges are formatted as "atom, charge, numger of charges". Superscripts and subscripts are ignored.

About

[CVPR 25] MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published