Hello Daniel, thanks for taking the time to respond to this.
what is the object you call cl_class_of on? are you sure it is initialized cl_object? you may try attaching gdb to the process (see src/utils/gdbinit for useful configuration).
I'm not sure. I'm also not calling cl_class_of directly. If I look at the stack traces from core files generated from these crashes, I see the following:
1. A CLOS method is called somewhere in my program, resulting in the generic dispatch mechanism being triggered (generic_function_dispatch -> _ecl_standard_dispatch)
2. _ecl_standard_dispatch calls fill_spec_vector, presumably as a part of the whole generic function dispatch mechanism (This is just what I can infer from ECL sources, I am not sure whether I got this right)
3. fill_spec_vector seems to inspect a stack frame and pull out the types of the arguments. It calls cl_class_of(...) as part of this. This [1] is the exact line where cl_class_of is called and crashes. This is an example stack trace:
#0 0x00007fde3d402428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007fde3d40402a in __GI_abort () at abort.c:89 #2 0x00007fde45c26275 in ecl_internal_error (s=s@entry=0x7fde45cb5fb5 "not a lisp data object") at /build/ecl/src/c/error.d:61 #3 0x00007fde45c117cc in cl_class_of (x=<optimized out>) at /build/ecl/src/c/instance.d:396 #4 0x00007fde45c11d54 in fill_spec_vector (frame=0x7fdc73ffc010, frame=0x7fdc73ffc010, gf=0x2c0823f0, vector=0x2a674930) at /build/ecl/src/c/gfun.d:139 #5 _ecl_standard_dispatch (frame=0x7fdc73ffc010, frame@entry=0x7fdc73ffc0a0, gf=0x2c0823f0) at /build/ecl/src/c/gfun.d:235 #6 0x00007fde45c1217d in generic_function_dispatch_vararg (narg=<optimized out>) at /build/ecl/src/c/gfun.d:272 #7 0x000000000069032f in L11store_object (narg=<optimized out>, v1obj=0x2d0ffe00, v2stream=0x2d0e5750) at /root/.cache/common-lisp/ecl-16.1.3-unknown-linux-x64/root/radio/build/src/ai/lisp-deps/cl-store/plumbing.c:276
[...] there are multiple more calls to Lxxstore_object() methods below this
I am having problems debugging this because I highly doubt that the generic function dispatch mechanism is broken (otherwise *nothing ever* would work, right?) So I think something else is causing this confusion in fill_spec_vector.
I've used recently ECL with threads disabled and all seemed to work. I would try playing with flags (i.e first allow use autodetected boehm, then skip the with-dffi flag if it still doesn't work, then remove enable-shared and at last enabl-edebug). If ./configure --disable-threads without any additions still crashes then it is indeed problem with this exactly flag.
I've compiled it with only the --disable-threads flag now and I still get the same crash in the call to GC_init() in cl_boot(). However, staring the ECL interpreter works fine and embedding ECL into a single-threaded, small example program also works.
Could it be that I am missing something when trying to embed ECL in a large C++ codebase? Do I have to worry about the Boehm GC not functioning when most of the program is not designed to use GC_MALLOC? I am also statically linking my lisp code, would that make a difference here?
[1] https://gitlab.com/embeddable-common-lisp/ecl/blob/4c3dcfdbd52e427910486b2c1...
Thanks, Dennis
Daniel Kochmański daniel@turtleware.eu writes:
On Tue, 04 Jun 2019 20:22:48 -0400 Dennis Ogbe do@ogbe.net wrote:
Hello Daniel,
thanks for your reply, that's about what I expected. It's not a secret at all---My team and I (a bunch of graduate students) are building an "intelligent" radio network using software-defined radios. The source is not opened--yet--since we are competing as part of a DARPA Grand Challenge [1].
While I have you here: I am currently fighting a strange bug that crashes my process. I am still in the phase where its occurrences seem random to me, so I can't tell you how to reproduce it, but the crashes seem localized to the if statement in fill_spec_vector in src/c/gfun.d--the call to cl_class_of() crashes with an unrecoverable error "not a lisp object".
what is the object you call cl_class_of on? are you sure it is initialized cl_object? you may try attaching gdb to the process (see src/utils/gdbinit for useful configuration).
Since I've seen merge requests like [2] I wanted to try to disable threading, since I won't be using it. But when I compile ecl with
./configure --enable-shared --enable-threads=no --enable-boehm=included --with-dffi --enable-debug=yes
I now crash in cl_boot in a GC function (GC_push_all_eager)! Is building without threads supposed to work or am I trying the wrong thing here? My original problem (the crash in fill_spec_vector) only happens about 1/500 times I call the offending function (it's the store function from cl-store), and I am still investigating what the culprit could be. If you have any thoughts--I'd appreciate it!
I've used recently ECL with threads disabled and all seemed to work. I would try playing with flags (i.e first allow use autodetected boehm, then skip the with-dffi flag if it still doesn't work, then remove enable-shared and at last enabl-edebug). If ./configure --disable-threads without any additions still crashes then it is indeed problem with this exactly flag.
Thanks, Dennis
Regards, Daniel
[1] https://www.spectrumcollaborationchallenge.com/ [2] https://gitlab.com/embeddable-common-lisp/ecl/merge_requests/100
Daniel Kochmański daniel@turtleware.eu writes:
Hello Dennis,
On Mon, 2019-06-03 at 20:02 -0400, Dennis Ogbe wrote:
Hello,
I am working on embedding ECL in a reasonably-sized C++ program and I have been using v16.1.3 until now, since it seems like this is the latest official release.
Yes, 16.1.3 is the latest official release.
However, it seems like there is a lot of activity and bug fixes in the develop branch and I already ran into a few bugs (for example [1]) that are fixed in develop, but are not in any release. The documentation also seems to overlap more with the develop branch than the latest release.
That is also true, we work on the next release and we expect to make the new one soon™ (only a few tasks has been left over to implement).
In your opinion, what is the best and most stable ECL version to use as of June 2019? I have some reservations about simply picking a random commit from a dev branch, so I wanted to reach out and ask y'all directly.
There is no good answer for that. While develop branch indeed has many improvements in form of bug fixes and new (dare I say – exciting) features it is only loosely tested. Before each release we work hard to test the release candidate against a big variety of operating systems, architectures and libraries (cl-test-grid is an invaluable help with that) and try to fix regressions. If you feel adventurous just pick develop branch, we do not commit there half-baked things (only stuff which we are certain about or which was a subject of a peer review / testing around the thing being changed) - it is fairly stable. But there is no guarantee that you won't hit some ugly regression we are not aware of yet. Otherwise you may try to live with 16.1.3 until we release the new 16.2.0 version – hopefully withing a few months from now.
Thanks for all the hard work, this project is great!
That's very kind of you to say that. If it is not a secret what are you working on?
[1] https://gitlab.com/embeddable-common-lisp/ecl/issues/418
Best regards, Daniel