I am trying to get used to Code_Saturne and Syrthes and I wanted to do the tutorial called “3disks2d”.
The first part concerns a computation with Syrthes alone and this is where I have a problem.
Actually, the setting of the computation goes well but when I launch the simulation, it is stuck at 14% at “Conduction Initialization” (see joined file).
Does someone know what is wrong? Is that a common problem?
I’ve used the files that you have uploaded to set up your case on my machine. However, it doesn’t run for which reasons it is not clear. So, I set up a new case using the mesh that you supplied. This runs okay and gives results of the temperature distribution in the three discs. The attached file contains the run on my machine which is using Syrthes V4.1 and Ubuntu 13.04.
Can you please try this case on your machine and let me know what happens.
SYRTHES4 home directory: /opt/syrthes4.1.1-ubuntu/arch/Linux_x86_64
MPI home directory: /opt/syrthes4.1.1-ubuntu/extern-libraries/opt/openmpi-1.4.3/arch/Linux_x86_64
-----------------------------------
Prepare SYRTHES execution directory
-----------------------------------
Building the executable file syrthes..
ar xv /opt/syrthes4.1.1-ubuntu/arch/Linux_x86_64/lib/libsyrthes_seq.a mainsyrthes.o
x - mainsyrthes.o
gcc -o syrthes -O3 -D _FILE_OFFSET_BITS=64 -D_FILE_OFFSET_BITS=64 \
-I/opt/syrthes4.1.1-ubuntu/arch/Linux_x86_64/include -I/opt/syrthes4.1.1-ubuntu/arch/Linux_x86_64/bib_material_syrthes -D _FILE_OFFSET_BITS=64 *.o \
/opt/syrthes4.1.1-ubuntu/arch/Linux_x86_64/lib/libsyrthes_seq.a -lm
***** SYRTHES compilation and link completed *****
SyrthesCase summary:
Name = SYR
Data file = Test_renuda.syd
Update Data file = True
Do preprocessing = True
Debug = False
Case dir. = /home/stefan/test/3disks2D/solid
Execution dir. = /home/stefan/test/3disks2D/solid
Data dir. = /home/stefan/test/3disks2D/solid
Source dir. = /home/stefan/test/3disks2D/solid
Post dir. = /home/stefan/test/3disks2D/solid/POST
Conduction mesh dir. = /home/stefan/test/3disks2D/solid/
Conduction mesh name = 3rond2d.syr
Total num. of processes = 1
Logfile name = /home/stefan/test/3disks2D/solid/listing_syrthes
Echo = True
Parallel run = False
Do preprocessing = True
SyrthesParam summary
Param file name = Test_renuda.syd
Conduction mesh name = 3rond2d.syr
Radiation mesh name = None
Result prefix. = 3tond2d
Restart = False
Coupling = False
Interpreted functions = False
---------------------------
Start SYRTHES preprocessing
---------------------------
Updating the mesh file name..
-> OK
-------------------------
Start SYRTHES computation
-------------------------
Execution of SYRTHES..
-> number of processors for conduction = 1
Segmentation fault (core dumped)
Error while running syrthes
Stop Syrthes execution.
Is there a way to compile syrthes with debug on to see where the crash occurs?
Yes, in the setup.ini of Syrthes, it is possible to add debug options (I have not done it recently, but have done it in the past).
Otherwise, even without a debug version, running under Valgrind (if the code is small enough) or under a debugger should at least provide a stack trace, without the line numbers, but a least with the source file names, which is a start, and may provide some insight into the crash.
To find the exact calling command (which you need to adapt for a debugger), use the “run_solver” script from the execution directory.
==13130== Memcheck, a memory error detector
==13130== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==13130== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==13130== Command: ./syrthes -d tmp.data --log /home/stefan/test/3disks2D/solid/listing_syrthes
==13130==
==13130== Invalid read of size 1
==13130== at 0x442994: rep_listint (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x437A19: decode_prophy (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x43B563: lire_donnees (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x402157: syrthes (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x401611: main (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== Address 0x6d9000 is not stack'd, malloc'd or (recently) free'd
==13130==
==13130==
==13130== Process terminating with default action of signal 11 (SIGSEGV)
==13130== Access not within mapped region at address 0x6D9000
==13130== at 0x442994: rep_listint (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x437A19: decode_prophy (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x43B563: lire_donnees (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x402157: syrthes (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x401611: main (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== If you believe this happened as a result of a stack
==13130== overflow in your program's main thread (unlikely but
==13130== possible), you can try to increase the size of the
==13130== main thread stack using the --main-stacksize= flag.
==13130== The main thread stack size used in this run was 8388608.
==13130==
==13130== HEAP SUMMARY:
==13130== in use at exit: 198,726 bytes in 36 blocks
==13130== total heap usage: 4,574 allocs, 4,538 frees, 284,926 bytes allocated
==13130==
==13130== 16 bytes in 1 blocks are definitely lost in loss record 6 of 27
==13130== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13130== by 0x43340A: verif_maill (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x43388C: lire_maill (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x402020: syrthes (in /home/stefan/test/3disks2D/solid/syrthes)
==13130== by 0x401611: main (in /home/stefan/test/3disks2D/solid/syrthes)
==13130==
==13130== LEAK SUMMARY:
==13130== definitely lost: 16 bytes in 1 blocks
==13130== indirectly lost: 0 bytes in 0 blocks
==13130== possibly lost: 0 bytes in 0 blocks
==13130== still reachable: 198,710 bytes in 35 blocks
==13130== suppressed: 0 bytes in 0 blocks
==13130== Reachable blocks (those to which a pointer was found) are not shown.
==13130== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==13130==
==13130== For counts of detected and suppressed errors, rerun with: -v
==13130== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
Having read the output and if I have understood it correctly, it would appear thar SYRTHES is trying to acces something in the rep_listint which is not stored in memory hence the SIGSEGV. Can you try and use --main-stacksize=10000000 (which is greater than 8388608) or another value which is greater still and let me know what happens?
==57075== Memcheck, a memory error detector
==57075== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==57075== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==57075== Command: ./syrthes -d tmp.data --log /home/stefan/test/3disks2D/solid/listing_syrthes
==57075==
==57075== Invalid read of size 1
==57075== at 0x442994: rep_listint (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x437A19: decode_prophy (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x43B563: lire_donnees (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x402157: syrthes (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x401611: main (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== Address 0x6d9000 is not stack'd, malloc'd or (recently) free'd
==57075==
==57075==
==57075== Process terminating with default action of signal 11 (SIGSEGV)
==57075== Access not within mapped region at address 0x6D9000
==57075== at 0x442994: rep_listint (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x437A19: decode_prophy (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x43B563: lire_donnees (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x402157: syrthes (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x401611: main (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== If you believe this happened as a result of a stack
==57075== overflow in your program's main thread (unlikely but
==57075== possible), you can try to increase the size of the
==57075== main thread stack using the --main-stacksize= flag.
==57075== The main thread stack size used in this run was 100003840.
==57075==
==57075== HEAP SUMMARY:
==57075== in use at exit: 198,725 bytes in 36 blocks
==57075== total heap usage: 4,574 allocs, 4,538 frees, 284,925 bytes allocated
==57075==
==57075== 16 bytes in 1 blocks are definitely lost in loss record 6 of 27
==57075== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==57075== by 0x43340A: verif_maill (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x43388C: lire_maill (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x402020: syrthes (in /home/stefan/test/3disks2D/solid/syrthes)
==57075== by 0x401611: main (in /home/stefan/test/3disks2D/solid/syrthes)
==57075==
==57075== LEAK SUMMARY:
==57075== definitely lost: 16 bytes in 1 blocks
==57075== indirectly lost: 0 bytes in 0 blocks
==57075== possibly lost: 0 bytes in 0 blocks
==57075== still reachable: 198,709 bytes in 35 blocks
==57075== suppressed: 0 bytes in 0 blocks
==57075== Reachable blocks (those to which a pointer was found) are not shown.
==57075== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==57075==
==57075== For counts of detected and suppressed errors, rerun with: -v
==57075== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 1 from 1)
Segmentation fault (core dumped)
My best guess is that the memory is leaking pretty badly in one of the routines as it eats the entire stack.
I had the same “stuck at 14%”-problem in my first steps with SYRTHES, until I recognized the choice of partition control. In syrthes.gui I had to choose “METIS”, because “SCOTCH” did not work for me.
Perhaps this is only a small hint, but I simply decided to write this because of the 14% .
Thanks for the suggestion. I gave it a try but I still get a segmentation fault when I run regardless of which domain partition option I choose. I am running with 1 processor so I am not sure it uses any domain partitioning.
I ran into the exact same problem, I recently installed syrthes 4.1.1 and was following tutorials to get used. Did anybody find out how to proceed? If it can be of any help, the progress bar value reached before error varies with the number of processor used (e.g. 1cpu=14% 2=35% 3=71% 4=92%).