SIGTERM signal in ALE calculation

Dear all,

I made a calculation using the routine “cs_user_boundary_conditions_ale.f90” which allows me to move my mesh.
An error that does not seem to be directly related to my calculation stops the calculation after a while and i can’t identify why. The error appears at what seem to be a random moments and I don’t notice any particularity at this moment, either in the behaviour of the fluid or the mesh. In addition, the calculation converges appropriately.

Here you will find the different information on the error that stops the calculation :

solver script exited with status 137.

Error running the calculation.

Check code_saturne log (listing) and error* files for details.

Error in calculation stage.



Parallel code_saturne on 12 processes.

Preprocessing calculation

Starting calculation


mpiexec noticed that process rank 11 with PID 21217 on node node113 exited on signal 9 (Killed).

Post-calculation operations



SIGTERM signal (termination) received.
→ computation interrupted by environment.

Call stack:
1: 0x2b0157aa98e3 <+0x1578e3> (libopen-pal.so.20)
2: 0x2b015798fb39 <opal_progress+0xb9> (libopen-pal.so.20)
3: 0x2b015457254d <mca_pml_ucx_recv+0xdd> (libmpi.so.20)
4: 0x2b01544adeac <ompi_coll_base_allreduce_intra_recursivedoubling+0x4dc> (libmpi.so.20)
5: 0x2b01544779d3 <PMPI_Allreduce+0x173> (libmpi.so.20)
6: 0x2b01519a71ab <cs_gdot+0x4b> (libsaturne-7.0.so)
7: 0x2b015168feaa <cs_equation_iterative_solve_vector+0xb8a> (libsaturne-7.0.so)
8: 0x2b0151658dd2 <+0xffdd2> (libsaturne-7.0.so)
9: 0x2b0151774404 <navstv_+0x48a2> (libsaturne-7.0.so)
10: 0x2b01517a098a <tridim_+0x370b> (libsaturne-7.0.so)
11: 0x2b0151612d8b <caltri_+0x1c7b> (libsaturne-7.0.so)
12: 0x2b015082cddb <main+0x6eb> (libcs_solver-7.0.so)
13: 0x2b0155fc8555 <__libc_start_main+0xf5> (libc.so.6)
14: 0x401b49 <> (cs_solver)
End of stack

If you have any idea where the problem comes from and how to fix it, that would be very helpful.

Best regards,
Roxan

Hello,

What do the error* files say (see forum recommendations for list of other recommended info) ?

Regards,

Yvan

Hello,

I only have one error file with the “SIGTERM signal” error and I can’t find my problem in the user guide or other forum’s topic. The error only appears in the run_solver.log of one processor.

In my output I can see that calculation stop with status 137 but I don’t know what that’s mean.

Regards,

Roxan

Hello,

SIGTERM means killed by the environment, such as when hitting CTRL+C. It could also happen in some cases ig you run out of allocated time. Here it is surprising.

Are you running on a production build or a buid configured with --enable-debug ?

Best regards,

Yvan