[Core] Python garbage collection order differes and segfaults depending on whether there is a Process or not #12207

sunethwarna · 2024-03-19T09:31:27Z

Description
We found a strange behaviour in gc in python. If a kratos simulation is created without any processes, then the order of destruction in python is:

The Kratos::Model
The Kratos::SolutionStrategy
This order is wrong, but it does not harm anyone since most of the destructors does not do anything with the Kratos::Model or 99.9% time, we have at least one process.

Kratos/applications/MeshMovingApplication/custom_strategies/strategies/structural_meshmoving_strategy.h

Lines 126 to 131 in 3052722

    
           virtual ~StructuralMeshMovingStrategy() 
        
           { 
        
             Model& owner_model = mpmesh_model_part->GetModel(); 
        
             std::string name = mpmesh_model_part->Name(); 
        
             owner_model.DeleteModelPart(name); 
        
           }

But the StructuralMeshMovingStrategy is deleting a model part which was created by it as shown above, hence if no process is used, then this causes a seg fault in the release, and sometimes a proper error in debug (Not using a process is a valid practical case, where I set the BCs for MESH_DISPLACEMENT for the model part from some where else rather than using a Process).

But if we use a single process (a dummy one even), then the python gc destruction order is correct as below.

The Kratos::Strategy
The Kratos::Model

@matekelemen and I was trying to figure out why not having a process cause this change of gc order, but was not succesful. So we are looking for some help on figuring out why this happens.

I am attaching an example with a dummy process (1_correct_gc_collection_order) which doesn't thorw an error. The second example (2_incorrect_gc_collection_order) will seg fault in release, or throw a warning in debug "Attempting to delete non-existent modelpart : test_MeshPart").

It is only with MeshMovingApplication, we tested the same with StructuralMechanicsApplication as well. This same problem is there as well. But it is not noticable, because, StructuralMechanicsApplication strategies does not do anything to the model or model part in the destructors.

Scope

Kratos applications or Core (not sure where the error originates from)

To Reproduce
Run the two test cases. 1_correct_gc_collection_order will not give any troubles, but 2_incorrect_gc_collection_order will segfault in release.
gc_example.zip

Expected behavior
2_incorrect_gc_collection_order should not segfault.

Environment

OS: Manjaro
Branch: master
Python 3.11

The text was updated successfully, but these errors were encountered:

philbucher · 2024-03-19T11:04:05Z

I think this should be refactored anyway (the meshmoving strategies)

How does the ConnnectivityPreserveModeler handle this?
Basically we could use it in the MeshMovingStrat, instead of the custom code for creating the ModelPart.

For the time being I think one can safely remove this delete

sunethwarna · 2024-03-19T11:08:05Z

@philbucher , The ConnectivityPreserveModeller does not delete any model parts it creates, so those are destrcuted when the Model is destructed. Hence, there won't be a problem.

I think this should be refactored anyway (the meshmoving strategies)

I thought the same, but I am curious why this is happening in the first place :)

roigcarlo · 2024-03-19T11:18:33Z

We had a similar problem in MPI (btw can confirm this is happening).

The problem is that we do not have any mechanism implemented in Kratos to enforce the destroy order of objects, so, in the example you send and in the strategy python decides that model has priority.

For this particular case the only clean solution I see, without having to change how Kratos works underneath is to enable the model to be used as a context. Example (taking you code, not 100% sure it works):

def main():
    with Kratos.Model() as model:
        model_part = model.CreateModelPart("test")
        model_part.ProcessInfo[Kratos.DOMAIN_SIZE] = 3
    
        params = Kratos.Parameters("""{
                "problem_data": {
                    "echo_level"   : 0,
                    "parallel_type": "OpenMP",
                    "start_time"   : 0.0,
                    "end_time"     : 1.0
                },
                "solver_settings" : {
                    "domain_size"            : 3,
                    "echo_level"             : 0,
                    "solver_type"            : "structural_similarity",
                    "model_part_name"        : "test",
                    "compute_reactions"      : false,
                    "calculate_mesh_velocity": false,
                    "model_import_settings"              : {
                        "input_type"     : "use_input_model_part"
                    },
                    "time_stepping" : {
                        "time_step"       : 1.1
                    }
                }
        }""")
        mesh_moving_analysis = MeshMovingAnalysis(model, params)
        ReadModelPart("../solid", model_part)
        mesh_moving_analysis.Initialize()
        mesh_moving_analysis.Finalize()
        mesh_moving_analysis.Clear()
    # Going out of context here ensrues that mesh_moving_analysis is deleted before model

    print("Finished everything.")

sunethwarna · 2024-03-19T12:06:19Z

Hmm.. thats a solution may be we can promote more throughout the Kratos? But this won't work on cases such as unit tests (or any other place), where we create the model somewhere else and read the model part once, and use startegies (That is where this problem orignated for me) :/ At the moment, i forcefully use del to force destruction order which is not nice :/

matekelemen · 2024-03-19T14:16:08Z

Just here to mention that this behaviour is not unique to MeshMovingStrategy, but all strategies (or at least the ones used by structural mechanics).

But this won't work on cases such as unit tests (or any other place), where we create the model somewhere else and read the model part once, and use startegies

Not particularly nice, but you could just explicitly call Model.__enter__() in the unit test's setUp, and have a matching Model.__exit__() in tearDown. That said, I'd just rather fix the ownership relations of our bindings. I'm not entirely sure how to go about it though.

sunethwarna added the Bug label Mar 19, 2024

sunethwarna assigned RiccardoRossi, pooyan-dadvand, roigcarlo, rubenzorrilla, philbucher, loumalouomega and matekelemen Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Python garbage collection order differes and segfaults depending on whether there is a Process or not #12207

[Core] Python garbage collection order differes and segfaults depending on whether there is a Process or not #12207

sunethwarna commented Mar 19, 2024 •

edited

philbucher commented Mar 19, 2024

sunethwarna commented Mar 19, 2024 •

edited

roigcarlo commented Mar 19, 2024 •

edited

sunethwarna commented Mar 19, 2024 •

edited

matekelemen commented Mar 19, 2024

[Core] Python garbage collection order differes and segfaults depending on whether there is a Process or not #12207

[Core] Python garbage collection order differes and segfaults depending on whether there is a Process or not #12207

Comments

sunethwarna commented Mar 19, 2024 • edited

philbucher commented Mar 19, 2024

sunethwarna commented Mar 19, 2024 • edited

roigcarlo commented Mar 19, 2024 • edited

sunethwarna commented Mar 19, 2024 • edited

matekelemen commented Mar 19, 2024

sunethwarna commented Mar 19, 2024 •

edited

sunethwarna commented Mar 19, 2024 •

edited

roigcarlo commented Mar 19, 2024 •

edited

sunethwarna commented Mar 19, 2024 •

edited