I note that Edit Definition (meta-dot) in SLIME is inaccurate in positioning "point" when the buffer(file) has been modified by adding, removing or rearranging parts of the buffer(file).
I recognize that this is, in general, a difficult problem.
Is the "pragmatic solution" to do "Compile/Load File" when changes to the buffer make the source code location information useless?
Back in the "good old days" with (1mhz) Symbolics Lisp Machines and the ZWEI editor, "buffer sectionization" did a pretty good job of keeping track of the source code locations. This allowed top level white-space and modifications of function defintions to be made without losing track of the source code location.
Are there any thoughts about makeing slime.el smarter about changes to the Lisp source code?
Lynn Quam quam@ai.sri.com writes:
Back in the "good old days" with (1mhz) Symbolics Lisp Machines and the ZWEI editor, "buffer sectionization" did a pretty good job of keeping track of the source code locations. This allowed top level white-space and modifications of function defintions to be made without losing track of the source code location.
How does that work?
Are there any thoughts about makeing slime.el smarter about changes to the Lisp source code?
In the past we've talked about it at little over at - http://lists.metacircles.com/pipermail/navel/2003-October/thread.html
The floor is open though. It's an important problem and so far we have no solution. We have the added complication that source location references are handled very differently between Lisp implementations.
-Luke
Ahoy,
I've hacked M-. in CMUCL to handle some cases of file/buffer modification better. Some of the mechanism is generic and might be useful for more backends in future.
I've tested all the cases I can think of and checked it in. I haven't given it as much real-usage testing as I'd like but I'm in a hurry to free up my sunday for beer drinking :-)
Background: The Lisp side of M-. in CMUCL works by finding out the filename and "source path" of the definition and then opening the file and doing some special READ-magic to find the exact character position of the definition. It then sends this position to Emacs, which does a `goto-char' to get there. This works extremely well if neither the file nor the Emacs buffer are modified, but otherwise it gets in trouble.
First consider the case where the source file hasn't been changed but the Emacs buffer has. The new idea is: since Lisp pulls up the source and finds the exact right position, how about grabbing the opening snippet of the function (a hundred bytes, say) and sending it to Emacs as a "hint"? Then Emacs can jump to the hopefully-right position and then do a bi-directional isearch for the longest match of the actual function prelude.
This seems to work very well. It's based on the seemingly reasonably assumption that if the definition of BAR started with "(define-foo bar ..." before then it probably still does now. The same trick is used for interactively compiled definitions.
This solution seems near ideal, but it does only work when Lisp can find the source file version that the code was compiled from. If it ever finds such a version then it caches the whole file in the Lisp image so that the next `C-x C-s' won't spoil the fun. Files will go into the cache the first time you M-. into them -- probably it would be better to suck them in as soon as they enter slime-mode, but I haven't tried that yet.
Now for the case where the file on disk has been changed. Source-paths can tollerate a bit of this but it does get nasty easily, e.g. M-. on CMUCL symbols usually doesn't work for me because I run random binaries but always against the CVS sources, and usually land one or two defuns away from the real definition.
For this case it now just detects that the file is modified and falls back to regexp-based search, the same way e.g. the OpenMCL backend works. This seems to work pretty well. I also tweaked the regexps a bit, hopefully for the better.
With luck this will make M-. going to the wrong place a rare occurence for CMUCL users. If problems persist then please let us know!
Cheers, Luke
Luke Gorrie wrote:
First consider the case where the source file hasn't been changed but the Emacs buffer has. The new idea is: since Lisp pulls up the source and finds the exact right position, how about grabbing the opening snippet of the function (a hundred bytes, say) and sending it to Emacs as a "hint"? Then Emacs can jump to the hopefully-right position and then do a bi-directional isearch for the longest match of the actual function prelude.
That is certainly better than using only the name and "kind" (defun defvar ...) of the definition. It seems to me that using a good hash code over the entire s-expression of the definition would be even better. In that way, if there are multiple definitions (the original definition and one or more alternate definitions being tested, or reader-macro conditioned definitions), they can be distinguished.
This solution seems near ideal, but it does only work when Lisp can find the source file version that the code was compiled from. If it ever finds such a version then it caches the whole file in the Lisp image so that the next `C-x C-s' won't spoil the fun. Files will go into the cache the first time you M-. into them -- probably it would be better to suck them in as soon as they enter slime-mode, but I haven't tried that yet.
Now for the case where the file on disk has been changed. Source-paths can tollerate a bit of this but it does get nasty easily, e.g. M-. on CMUCL symbols usually doesn't work for me because I run random binaries but always against the CVS sources, and usually land one or two defuns away from the real definition.
It would be useful if the underlying lisp implementation could be convinced to include more source location information in the compiled object file. In this way, source version info, such as RCSID, and/or file date and time stored in the object file. In the absence of RCSID, file date and time could be useful in determining the source file version associated with the object file. Similarly, the sexpr hash code could be stored for each definition in object file.
For this case it now just detects that the file is modified and falls back to regexp-based search, the same way e.g. the OpenMCL backend works. This seems to work pretty well. I also tweaked the regexps a bit, hopefully for the better.
Unfortunately, using totally different strategies for finding the definition source depending on whether the file has changed, is likely to lead to confusion for the user, as is often the situation with Ilisp. There are several things that might improve the situation:
a) Provide a "confidence" indication about the source code that is found. For an unchanged buffer, or exact match of the hash-code of the resulting source code, the confidence should be "very high". Also provide an indication when there are other possible matches for the definition.
b) Provide two forms of EDIT-DEFINITION:
. EDIT-CURRENT-DEFINITION attempts to find the definition that matches the currently loaded definition. (This is presently called EDIT-DEFINITION).
. EDIT-ALL-DEFINITIONS finds all definitions of the symbol and "kind" (defvar, defun, defmethod, ...), including reader-match conditioned definitions which fail. It might be useful to flag each definition in the *XREF[definition: xxx]* buffer with the confidence score that it is the "current definition".
This is a post about M-. from Alan Ruttenberg which bounced for boring mailing list reasons. His idea sounds great to me. When I did something like this for MCL I used to search from the position of source the last time the form was compiled, downward, then upward on the theory they more often than not I was adding to the file. In the earlier version I used the (defxxx heuristic.
For openmcl I've been thinking that I should record the whole text (from the source file) of the definition with the function. What would be ideal would be to do a fuzzy search for the whole text, along the line of a BLAST alignment in genomics and choose the best matching position, (or even highlight where your code went in the case that you refactored :)
Note that M-. and finding the source location for a given PC in the debugger are implemented in almost the same way in this scenario. You associate with each PC the bounds of the definition form relative to the stored text for the function. Then, within the fuzzy match/alignment of the containing function you fuzzy match/align the source for the form being evaluated at the PC.
What's holding me back from trying this is having access to a fuzzy match algorithm. Does anyone know of one? Or have an idea of how to use some diff variant to get the same effect? Inside emacs would be ideal but we could also run a shell script on the buffer to get the match.
Saving all text and doing lots of computation to navigate to source may seem extravagant to some but it seems to me that the big bucks are in saving our time.
-Alan
Luke Gorrie luke@bluetail.com writes:
This is a post about M-. from Alan Ruttenberg which bounced for boring mailing list reasons. His idea sounds great to me.
Eh.. bad line breaking. Everything after the above was written by Alan.
Luke Gorrie luke@bluetail.com writes:
This is a post about M-. from Alan Ruttenberg which bounced for boring mailing list reasons. His idea sounds great to me.
Note that M-. and finding the source location for a given PC in the debugger are implemented in almost the same way in this scenario. You associate with each PC the bounds of the definition form relative to the stored text for the function. Then, within the fuzzy match/alignment of the containing function you fuzzy match/align the source for the form being evaluated at the PC.
What's holding me back from trying this is having access to a fuzzy match algorithm. Does anyone know of one? Or have an idea of how to use some diff variant to get the same effect? Inside emacs would be
How fuzzy?
ideal but we could also run a shell script on the buffer to get the match.
Saving all text and doing lots of computation to navigate to source may seem extravagant to some but it seems to me that the big bucks are in saving our time.
It doesn't sound at all extravagant to me. Emacs understands Lisp source well enough to do paren matching, s-exp navigation, etc. It also does regexp matching. Worst case scenario is you can find all the defxxx forms for foo not inside a #| |# pair. Closest would be some combination of positional and text match. Emacs can search its buffers pretty quickly, so I don't think this would be a problem in terms of the time it takes to position the point in the source buffer. If the errant source file is not in an Emacs buffer, it shouldn't be too hard to deduce the file name and then load it. You can probably gen tags files on the fly also.
Also doing this all in Emacs leaves you with only two flavors to deal with, GNU and XEmacs. Maybe not even that many.
Elisp is terra-incognito for me, so I can't get specific. I just hope I'm not being too off the wall here.
Luke Gorrie luke@bluetail.com writes:
What's holding me back from trying this is having access to a fuzzy match algorithm. Does anyone know of one? Or have an idea of how to use some diff variant to get the same effect? Inside emacs would be ideal but we could also run a shell script on the buffer to get the match.
There's an easy to read tutorial about sequence alignment and some Lisp code:
http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.html http://cvs.sourceforge.net/viewcvs.py/biolingua/BioLisp/Matching/
Helmut.
Thanks, I'll have a look. -Alan
On May 4, 2004, at 11:02 AM, Helmut Eller wrote:
Luke Gorrie luke@bluetail.com writes:
What's holding me back from trying this is having access to a fuzzy match algorithm. Does anyone know of one? Or have an idea of how to use some diff variant to get the same effect? Inside emacs would be ideal but we could also run a shell script on the buffer to get the match.
There's an easy to read tutorial about sequence alignment and some Lisp code:
http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.html http://cvs.sourceforge.net/viewcvs.py/biolingua/BioLisp/Matching/
Helmut.
slime-devel site list slime-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/slime-devel
In case it's useful, here is some code to parse the output of diff. The result is a list of "edit action" (added, deleted, changed) with the corresponding line numbers in the old and new buffer.
Helmut.
Lynn Quam quam@ai.sri.com writes:
It seems to me that using a good hash code over the entire s-expression of the definition would be even better.
The trouble with a hash is that it's very sensitive to modifications and can be difficult to locate the matching text has moved.
But we can send the whole definition text and then match that if we like. Currently it sends 256 bytes which I figure will be enough for most cases.
It would be interesting to have a string matching algorithm that handles modifications better than incremental-prefix-search, as Alan suggested.
It would be useful if the underlying lisp implementation could be convinced to include more source location information in the compiled object file. In this way, source version info, such as RCSID, and/or file date and time stored in the object file. In the absence of RCSID, file date and time could be useful in determining the source file version associated with the object file. Similarly, the sexpr hash code could be stored for each definition in object file.
CMUCL does store the FILE-WRITE-DATE of the source file, so we can tell if the source has been modified. Currently I use this to cache the source file if it is known to be unmodified, so that if it is later edited we still have a good copy. (Even in defun-at-a-time hacking one often saves the Emacs buffer I think.)
I'm not 100% sure that the cache is worth the bother yet, though. I will experiment some more.
[About falling back on regexps]
Unfortunately, using totally different strategies for finding the definition source depending on whether the file has changed, is likely to lead to confusion for the user, as is often the situation with Ilisp. There are several things that might improve the situation:
I've removed the fallback-on-regexp code now. It could possibly be useful as an optional feature but for now Helmut has convinced me that source-paths are good enough.
It did solve the problem I have of finding definitions in CMUCL's CVS sources while running 18e, but this is a weird thing to be doing in the first place :-).
Cheers, Luke
Luke Gorrie luke@bluetail.com writes:
[...]
This seems to work very well. It's based on the seemingly reasonably assumption that if the definition of BAR started with "(define-foo bar ..." before then it probably still does now. The same trick is used for interactively compiled definitions.
I like the idea with the hint string. This should work pretty good, especially for stuff compiled with C-c C-c.
For this case it now just detects that the file is modified and falls back to regexp-based search, the same way e.g. the OpenMCL backend works. This seems to work pretty well. I also tweaked the regexps a bit, hopefully for the better.
Hmm... I think the old approach wasn't too bad in this case. CMUCL's form numbers are relatively robust against modifications inside toplevel forms and inserting comments at the toplevel. The location is wrong only if a toplevel form is inserted or deleted before the form we need to find. This is something like 50% of the modifications at toplevel and if it fails, it is usually only 1 or 2 forms a away from the correct place.
The regexp variant can be completely wrong. E.g. it gets confused by block comments like #| (defun foo ..) |# or if there are two variants of the same function preceded with reader conditionals, like #+x86 and #+sparc. Regexps are currently almost useless to locate methods.
I liked the old variant better than the regexp stuff.
Perhaps a combined method would give the best results. First jump to the place as determined by the form numbers and then make a plausibility check with a regexp search before and after that place.
If problems persist then please let us know!
I've problems with C-u M-. compile. Point is placed before compile-component. This is probably just a too permissive regexp.
It will be interesting to see how this works for non-toplevel forms, as needed by the debugger.
Helmut.
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
I liked the old variant better than the regexp stuff.
I'll have another crack at M-. tonight to fix up the problems you guys have mentioned.