Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating reactions for nitrobenzenes is very slow #2630

Open
jonwzheng opened this issue Mar 8, 2024 · 13 comments
Open

Generating reactions for nitrobenzenes is very slow #2630

jonwzheng opened this issue Mar 8, 2024 · 13 comments

Comments

@jonwzheng
Copy link
Contributor

Bug Description

There are two separate but related issues that I observed. First, an RMG website user mentioned that using RMG kinetics search for nitro-substituted benzenes causes the kinetics search to timeout when calling KineticsDatabase.generate_reactions. Here are two examples:

1. Bimolecular reaction

Finding the bimolecular reactions of a nitrobenzene with itself, including just monosubstituted nitrobenzene ([O-][N+](=O)c1ccccc1), will hang on execution.

Minimal example:

from rmgpy import settings
from rmgpy.data.rmg import RMGDatabase
from rmgpy import molecule
from rmgpy.data.kinetics import KineticsDatabase, TemplateReaction

database_directory = settings["database.directory"]
database = RMGDatabase()
database.load(database_directory, kinetics_families="all")

reactant = molecule.Molecule(smiles="[O-][N+](=O)c1ccccc1")
reactants = [reactant]
print("Generating unimolecular reaction")
database.kinetics.generate_reactions(
        reactants, None, only_families=None, resonance=True)

print("Generating bimolecular reaction")
reactants2 = [reactants[0], reactants[0]]
database.kinetics.generate_reactions(
        reactants2, None, only_families=None, resonance=True)

Following the traceback it looks like it gets stuck on Species.generate_resonance_structures() --> Molecule.is_identical().
Even if you set resonance=False, this still occurs because the traceback leads to family.calculate_degeneracy(rxn) --> ensure_independent_atom_ids(reactants, resonance=True) in which resonance is hardcoded to True.

Nitrobenzenes have several resonance structures, but is that enough to cause this issue?

2. Unimolecular reaction for trinitrobenzene

Minimal example:
Searching the unimolecular reactions of trinitrobenzene will hang:

reactant = molecule.Molecule(smiles="[O-][N+](=O)c1c(cccc1[N+](=O)[O-])[N+](=O)[O-]")
reactants = [reactant] 
database.kinetics.generate_reactions(reactants, None, 
                                     only_families=None, resonance=True)

hanging on the same step as above.

Other thoughts

I am still investigating this, and would appreciate any input if anyone has run into a similar issue working with the resonance code, or thoughts on where to troubleshoot next.

@hwpang
Copy link
Contributor

hwpang commented Mar 8, 2024

If you add the limit for maxRadicalElectrons=1, would it help?

@jonwzheng
Copy link
Contributor Author

Hi Hao-Wei, do you mean the maximum_radical_electrons for explorer jobs? I think that's different from this - this isn't network exploration, but rather, simply searching the kinetics libraries and families for reaction templates and outputting that info. It's hanging on the reaction families step.

@hwpang
Copy link
Contributor

hwpang commented Mar 8, 2024

Hi Jonathan, I meant something like this below. I haven't tested this code yet but this constraint is used in _generate_product_structures to limit the number of radicals to consider.

from rmgpy.rmg.main import RMG
from rmgpy import settings
from rmgpy.data.rmg import RMGDatabase
from rmgpy import molecule
from rmgpy.data.kinetics import KineticsDatabase, TemplateReaction

rmg = RMG()
rmg.species_constraints = {"maximumRadicalElectrons": 1, }
database_directory = settings["database.directory"]
rmg.database = RMGDatabase()
rmg.database.load(database_directory, kinetics_families="all")

reactant = molecule.Molecule(smiles="[O-][N+](=O)c1ccccc1")
reactants = [reactant]
print("Generating unimolecular reaction")
rmg.database.kinetics.generate_reactions(
        reactants, None, only_families=None, resonance=True)

print("Generating bimolecular reaction")
reactants2 = [reactants[0], reactants[0]]
rmg.database.kinetics.generate_reactions(
        reactants2, None, only_families=None, resonance=True)

@jonwzheng
Copy link
Contributor Author

Hi Hao-Wei, thank you for the suggestion. This was helpful to learn about the species constraints.
For transparency, Hao-Wei and I tried running the above code snippet including the species constraints, but were still unable to resolve the issue (the code still hangs on the generate resonance structures step).
@xiaoruiDong, wondering if you've seen this before or know who might be able to comment.

@xiaoruiDong
Copy link
Contributor

This sort of hanging is known for long time for molecules with many resonance structures (even most of them are unrepresentative). But, I don't have an immediate thought about why it hangs at Molecule.is_identical().

Do you know which exactly families or products causing it? I would suggest identifying it first (e.g., by iterating families and assign it to only_families and set a timeout for a minute for example). As it helps us to understand if this is due to issues in generating resonance structures for certain molecules. If so, we can then further investigate whether it is a bug (like stucking in or a loop) or performance bottleneck (e.g., due to combinatorial effect).

@jonwzheng
Copy link
Contributor Author

jonwzheng commented Mar 8, 2024

Thank you, Xiaorui, these are excellent ideas. I'll profile to figure out where, exactly, it is getting stuck, and for which families.

I have a follow-up concern. Even if you set resonance=False in generate_reactions, you still run into this issue. The expected behavior should be that resonance structures are not generated, but in calculate_degeneracy, resonance=True is hardcoded in. Do you know if it would break anything to allow the resonance arg to propagate through? As you say, it may be needed to sanitize the molecules or match certain templates. However, calculate_degeneracy seems to copy most of the functionality from generate_reactions_from_families in database.py, which has the flag resonance=resonance allowing resonance to propagate. Compare the code:

calculate_degeneracy:

  # Label reactant atoms for proper degeneracy calculation
  ensure_independent_atom_ids(reactants, resonance=True)
  molecule_combos = generate_molecule_combos(reactants)

generate_reactions_from_families:

  # Label reactant atoms for proper degeneracy calculation (cannot be in tuple)
  ensure_independent_atom_ids(reactants, resonance=resonance)
  combos = generate_molecule_combos(reactants)

(btw, maybe we should open another PR that cleans up these two functions. There is a lot of overlapping code in both of these two functions, that can be rewritten to several smaller functions. Also the docstring for calculate_degeneracy mentions ignoreSameReactants but this parameter does not exist...)

@WernerHerzog
Copy link

I too have been struggling in this area.

I have been trying liquid reactors to look at amination of a nitrated benzene compounds as well as looking at aspects of the Baeyer-Mills reaction (nitrosobenzene + analine -> azobenzene). Although I am new to RMG, I understand that both efforts may not work well or at all with RMG currently. But I do want to get a sense of if/how RMG can handle liquid reactions like these.

Anyway, when trying to run the default kinetics families, it gets really slow or possibly even stops. Even if it has just 200-300 species or reactions, it will make no apparent progress over night. By significantly restricting the kineticsFamilies option, it is possible to run normally, but obviously paring reaction families is a job better left to a computer. Plus, just looking at say H_abstraction does not provide much useful info. Also, it may not just be nitro-substituted benzene, because nitroso-substituted in this case seems to cause some trouble.

Some qualitative things that I notice:

  1. It seems like the stalls are often preceded by messages with " Ea raised from X to Y" where Y is large, like >1000. A couple of times I have tried to restart from a seed generated prior to the stall. In both cases, the restart from seed had a warning of ill conditioned matrix, but I could not find a similar warning during the original run.

  2. When generating a flux diagram, the depiction of the species in the pictures appears to be distorted. My thinking is that this is probably just the picture representation of the compound--that RMG generates a picture with a compact footprint to make a tidy diagram. I have yet to determine if all/most of the strange looking compounds are this way. Here is one example, an intermediate from the Baeyer-Mills reaction:
    1 O u0 p3 c-1 {3,S}
    2 N u0 p0 c+1 {3,S} {4,S} {16,S} {17,S}
    3 N u0 p1 c0 {1,S} {2,S} {5,S}
    4 C u0 p0 c0 {2,S} {6,B} {7,B}
    5 C u0 p0 c0 {3,S} {8,B} {9,B}
    6 C u0 p0 c0 {4,B} {13,B} {23,S}
    7 C u0 p0 c0 {4,B} {15,B} {27,S}
    8 C u0 p0 c0 {5,B} {10,B} {18,S}
    9 C u0 p0 c0 {5,B} {12,B} {22,S}
    10 C u0 p0 c0 {8,B} {11,B} {19,S}
    11 C u0 p0 c0 {10,B} {12,B} {20,S}
    12 C u0 p0 c0 {9,B} {11,B} {21,S}
    13 C u0 p0 c0 {6,B} {14,B} {24,S}
    14 C u0 p0 c0 {13,B} {15,B} {25,S}
    15 C u0 p0 c0 {7,B} {14,B} {26,S}
    16 H u0 p0 c0 {2,S}
    17 H u0 p0 c0 {2,S}
    18 H u0 p0 c0 {8,S}
    19 H u0 p0 c0 {10,S}
    20 H u0 p0 c0 {11,S}
    21 H u0 p0 c0 {12,S}
    22 H u0 p0 c0 {9,S}
    23 H u0 p0 c0 {6,S}
    24 H u0 p0 c0 {13,S}
    25 H u0 p0 c0 {14,S}
    26 H u0 p0 c0 {15,S}
    27 H u0 p0 c0 {7,S}
    Putting the above into the website's molecule search gives the right SMILES (confirmed on pubchem), but an odd picture.

Finally, I will add that I used RMG binary to make an oxidation mechanism for RDX. I've since upgraded to the source install. Im trying to rerun that input file to see if it can run with the source installed version.

I'm new to RMG, so I don't really know where/what to look for. If anything looks promising, I can try to better document it going forward.

@xiaoruiDong
Copy link
Contributor

The expected behavior should be that resonance structures are not generated, but in calculate_degeneracy, resonance=True is hardcoded in. Do you know if it would break anything to allow the resonance arg to propagate through?

I don't think it will "break" anything, but it will potentially make the degeneracy result less accurate, particularly for molecules with resonance structures and high symmetry. Also, if there is a unit test for such a case, it may fail if turning resonance to False. I don't think it will have a huge impact; maybe a factor of 2 - 4 impacts for the rate coefficient? I also find it challenging to compose an example that will be impacted by this, so I guess it may be kinda rare.

btw, maybe we should open another PR that cleans up these two functions. There is a lot of overlapping code in both of these two functions, that can be rewritten to several smaller functions. Also the docstring for calculate_degeneracy mentions ignoreSameReactants but this parameter does not exist...

Refactoring and making RMG more modularized is a good idea and is always welcomed if you have some spare time.

@jonwzheng
Copy link
Contributor Author

jonwzheng commented Mar 14, 2024

After fixing and allowing resonance=False to propagate through (see PR #2632), you can turn off resonance and it will allow reactions to be generated at a reasonable time. That might be a temporary work-around until the root cause is figured out.

Still checking out all the families that it gets stuck on (it's a slow process as the kernel needs to be restarted every time it crashes), but the following families encounter the problem:

  • 1,4_Linear_birad_scission
  • Singlet_Carbene_Intra_Disproportionation [sometimes? this sometimes crashes and sometimes doesn't?]

(possibly/probably more)

@jonwzheng
Copy link
Contributor Author

jonwzheng commented Mar 15, 2024

It is getting stuck due to the reverse reaction of 1,4_Linear_birad_scission. Specifically due to the double bonds:

image

This time-out issue occurs even with much simpler nitro-containing species. The simplest I've found that runs into this is C=C([N+](=O)[O-])C=C - this times out on the RMG website, and takes about 2 minutes on my local device to generate reactions for this family. I imagine there is a combinatorial explosion when aromatic species.

However, the simpler compound C=C([N+](=O)[O-]) does not have the issue.

With nitroso-containing compounds it is faster, but still kind of slow, i.e.
C=C(N=O)C=C will execute in a reasonable time, and even C=C(N=O)C=CC=C but you can see that this will generate about 30 reactions for that one family. C=C(N=O)C=CC=CC=C is slow but will eventually return a result.

@jonwzheng
Copy link
Contributor Author

Some more benchmarking:

SMILES # double bonds resonance on? time to generate 1,4_linear_birad_scission rxns (sec)
C=C([N+](=O)[O-]) 1 yes 1
C=C([N+](=O)[O-]) 1 no 0.01
C=C([N+](=O)[O-])C=C 2 yes 216
C=C([N+](=O)[O-])C=C 2 no 0.04
C=CC=C([N+](=O)[O-])C=C 3 yes 1451
C=CC=C([N+](=O)[O-])C=C 3 no 0.1
c1ccccc1 aromatic yes 0.6
c1ccccc1[N+](=O)[O-] aromatic yes 1320
c1ccccc1[N+](=O)[O-] aromatic no 2
c1ccc([N+](=O)[O-])cc1[N+](=O)[O-] aromatic yes did not finish after 2 hours
c1ccc([N+](=O)[O-])cc1[N+](=O)[O-] aromatic no 0.1

Just including the nitro group to benzene increases the generation time by a factor of 2000x. RMG says nitrobenzene has 6 resonance structures, but 6 choose 2 is only 15...

@WernerHerzog
Copy link

Could you please explain how to turn resonance=False? Thank you.

@jonwzheng
Copy link
Contributor Author

Right now the resonance flag is not exposed through the RMG interface, and also, right now it does not properly propagate if disabled. We have a pull request #2632 that will at least fix the latter issue. I would suggest waiting for that PR to be reviewed and introduced to the main branch, if your application isn't time-sensitive.

If time-sensitive, you could checkout the fix_generate_reactions branch used in the above PR, and then you could change everywhere where resonance=True to resonance=False. However I would advise waiting for the above PR to be reviewed and merged in if it can be helped. Also please be advised that disabling resonance will affect your kinetic estimates for all species with resonance forms.

We may want to implement a check to see if the # of resonance forms is very large, and if so, bypass/skip the resonance calcs (maybe set as a user flag to enable this feature if desired).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants