On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański daniel@turtleware.eu wrote:
I dont think its related to shared vs static - rather two gc running concurrently. Try commenting out GC_init call in ecl and see what happens.
I don't understand how two GCs can run concurrently on a memory region controlled by ECL which is statically linked to GC... In fact I am pretty sure no other instances of GC are running anywhere within our process tree.
By the way, I don't know whether it's obvious from the backtrace that cl_boot() has been completed, or not.
If it actually was completed, could it be a bug that invalidates the bit indicating that cl_boot() has been done?
We have seen similar troubles with clang recently, related to FPE. There an FPE bit was flipped by assignment of a double to an integer type (sic!). It took us a lot of head banging on various hard surfaces to debug this: https://trac.sagemath.org/ticket/22799 it turned out we did hit a known bug: https://bugs.llvm.org//show_bug.cgi?id=17686
Do you need sigchld for anything? Run-program was rewritten and sigchld handling wasnt viable option anymore for it.
We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we now can simply skip it all together.
Thanks, Dima
Im on phone, will be avail after the weekend.
Regards, D.
Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik dimpase+ecl@gmail.com napisał(a):
Hi Daniel, Thanks for the message. The scenario you talk about only happens if GC is a shared library, right?
I've rebuilt GC disabling shared libs, and ECL doing static linking to GC. And I still get very similar segfaults:
;;; ECL C Backtrace ;;; 0 ecl_internal_error (0x87d79b375) ;;; 1 init_unixint (0x87d7c17e0) ;;; 2 init_unixint (0x87d7c1582) ;;; 3 pthread_sigmask (0x80103779d) ;;; 4 pthread_getspecific (0x801036d6f) ;;; 5 unknown (0x7ffffffff193) ;;; 6 GC_push_current_stack (0x87d7ef7c3) ;;; 7 GC_with_callee_saves_pushed (0x87d7f7360) ;;; 8 GC_push_roots (0x87d7ef9c2) ;;; 9 GC_mark_some (0x87d7ec97c) ;;; 10 GC_stopped_mark (0x87d7e6b7a) ;;; 11 GC_try_to_collect_inner (0x87d7e6a75) ;;; 12 GC_init (0x87d7f08ea) ;;; 13 init_alloc (0x87d7d5669) ;;; 14 cl_boot (0x87d69f66b) ...
And a very similar picture on the develop branch of ECL - although I had to change our code, as in particular ECL_OPT_TRAP_SIGCHLD is gone...
So, what can it be? Some signals issue?
Thanks, Dima
On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański daniel@turtleware.eu wrote:
Hey Dima,
this looks like the issue with having GC initialized before ECL kicks in. See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a discussion about this problem. Basically some other component already called GC_init and ECL calls it once more. It's arguably not a bug.
Best regards,
Daniel
On 31.08.2017 15:29, Dima Pasechnik wrote:
Dear all,
I'm struggling to understand strange segfaults coming from ECL(+Maxima) on FreeBSD embedded into Python; they typically look as follows:
Got signal before environment was installed on our thread [2: No such file or directory]
;;; ECL C Backtrace ;;; 0 ecl_internal_error (0x87d790765) ;;; 1 init_unixint (0x87d7b6bd0) ;;; 2 init_unixint (0x87d7b6972) ;;; 3 pthread_sigmask (0x80103779d) ;;; 4 pthread_getspecific (0x801036d6f) ;;; 5 unknown (0x7ffffffff193) ;;; 6 GC_push_all_stacks (0x87db1ea2c) ;;; 7 GC_mark_some (0x87db12eec) ;;; 8 GC_stopped_mark (0x87db09baa) ;;; 9 GC_try_to_collect_inner (0x87db09a75) ;;; 10 GC_init (0x87db16f4f) ;;; 11 init_alloc (0x87d7caa59) ;;; 12 cl_boot (0x87d694a5b) ;;; 13 initecl (0x87d218340) ;;; 14 initecl (0x87d20a43f) ;;; 15 initecl (0x87d207e28) ;;; 16 _PyImport_LoadDynamicModule (0x800b3ed1c) ;;; 17 PyImport_AppendInittab (0x800b3d71f) ;;; 18 PyImport_AppendInittab (0x800b3d1a8) ;;; 19 PyImport_ImportModuleLevel (0x800b3c2ce) ;;; 20 _PyBuiltin_Init (0x800b162d7) ;;; 21 PyObject_Call (0x800a7d3e3) ;;; 22 PyEval_EvalFrameEx (0x800b2121c) ;;; 23 PyEval_EvalCodeEx (0x800b1b5d4) ;;; 24 PyEval_EvalCode (0x800b1ad96) ;;; 25 PyImport_ExecCodeModuleEx (0x800b3ad11) ;;; 26 PyImport_AppendInittab (0x800b3ddb8) ;;; 27 PyImport_AppendInittab (0x800b3d71f) ;;; 28 PyImport_AppendInittab (0x800b3d1a8) ;;; 29 PyImport_ImportModuleLevel (0x800b3c2ce) ;;; 30 _PyBuiltin_Init (0x800b162d7) ;;; 31 PyEval_EvalFrameEx (0x800b22dd1) Segmentation fault (core dumped)
It looks as if ECL (version 16.1.2) is being called before an initialisation is complete, but it it possible to say more without a debugger?
More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0 with libatomic_ops version 7.4.6. And only reproducible on FreeBSD.
ECL is built with --disable-threads; GC is built with or without threads---result is still the same. (so it's unclear to me where pthread_* calls in the trace come from).
Thanks, Dima
PS. the segfault is at the bottom of https://trac.sagemath.org/ticket/22679#comment:87
-- Wysłane za pomocą K-9 Mail.