Skip to content

Several more atom mapping improvements#839

Open
kfir4444 wants to merge 7 commits intomainfrom
atom_mapping_n
Open

Several more atom mapping improvements#839
kfir4444 wants to merge 7 commits intomainfrom
atom_mapping_n

Conversation

@kfir4444
Copy link
Copy Markdown
Collaborator

This PR improves several aspects of atom mappings.
two key fixes:

  1. Stopping conformer generation in ARCSpecies._scissors. This is a major issue that improves both accuracy and performence:
    1. Performence - less RDKit calls
    2. Accuracy - Generating conformers distorts the XYZ of the scissored products.
      Moreover, this is critical for cyclic species, since debugging showed we did not include the original XYZ in the ARCSpecies, which caused errors.
  2. the identify_superimposable_candidates, which was being called $\mathcal{O}(N^2)$ times and reduces to $\mathcal{O}(N)$. More calls are not required, and does not increase accuracy. (There is some more math behind it, and it can be discussed offline @alongd).

Other fixes added, and more generalization of AM testing (manually verified)

…ns and multiplicity

When a bond in a cyclic molecule is broken (ring-opening), the resulting fragment must have new radical electrons assigned to the atoms that were part of the severed bond. This commit fixes _scissors() to:
1. Detect ring-opening cases (len(mol_splits) == 1).
2. Assign radical electrons to the cut atoms based on their missing valency.
3. Update the fragment's multiplicity.
4. Set keep_mol=True and provide final_xyz to the new ARCSpecies to ensure proper initialization.
This prevents ValueError and SpeciesError during molecule perception when mapping reactions involving ring-opening/closing.
Added a check for None results from cut_species_based_on_atom_indices in map_rxn. If scission fails for reactants or products, the function now attempts to fall back to the next dictionary template (if available)
or returns None with a logged error, preventing a TypeError when calling update_xyz on a None object.
Updated benzene scission tests to allow for multiple valid atom mappings due to the symmetry of the molecule. Replacing assertEqual with assertIn ensures the tests pass if any of the chemically equivalent valid maps are returned.
This was also verified with the NEB, achieving the same TS with both AMs.
This commit introduces several optimizations to the atom mapping algorithm:
1.  identify_superimposable_candidates: Reduced algorithmic complexity from O(N^2) to O(N) by starting graph traversal from only the first heavy atom. For connected molecular graphs, this is sufficient to find all valid mappings (including symmetries) and significantly reduces redundant DFS calls.
2.  prune_identical_dicts: Fixed a logic bug where dictionaries were incorrectly pruned if they shared a single key-value pair. Now correctly uses exact dictionary equality.
3.  pairing_reactants_and_products_for_mapping: Pre-calculates resonance structures for all reactant fragments once before the pairing loop, avoiding redundant O(R*P) computations.
4.  copy_species_list_for_mapping: Replaced the expensive spc.copy() (which uses dictionary serialization) with a lighter direct ARCSpecies instantiation to reduce overhead during the mapping process.
…erception

Modified ARCSpecies.__init__ to skip the expensive mol_from_xyz() call (which performs molecule perception from 3D coordinates) when a valid Molecule object is already provided and the keep_mol flag is set. This significantly reduces initialization overhead during scission and atom mapping operations where the molecular graph is already known.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant