Hi all,

I am using Git to manage a CL project I am working on, and have noticed that there is no predefined regular expression to pull out "hunk headers".  If you look at 'git diff' output, each hunk (sequence of consecutive lines with differences noted) has a header line, which is intended to show the first line of the top-level syntactic object (function definition, class declaration, etc.) that contains the hunk, so you can quickly see what function, class, etc. it is within.  To find the appropriate header line for each hunk, Git scans backwards from the top of the hunk with a regular expression, and takes the first line that matches.  The regex to use, of course, depends on the language of the source file.

Git has a table of built-in regexes for a number of languages, including Scheme, but there is neither a generic Lisp entry nor a more specific CL entry.  I am inclined to submit one for inclusion, but I wanted to bounce it off you folks first.

Here is the current Scheme regex.  This is in POSIX Extended ("ERE") syntax:

^[\t ]*(\\(((define|def(struct|syntax|class|method|rules|record|proto|alias)?)[-*/ \t]|(library|module|struct|class)[*+ \t]).*)$

This is unfortunately not general enough to work for CL, as it doesn't pick up 'defun', though it does allow some other CL constructs.

So I started to think about what would be good for CL.  Some possibilities:
  1. Simply match any line that starts with an open paren in column 0.  The upside of this simple rule is that it allows for arbitrary top-level construct names.  But if you indent your defuns for some reason, it will overlook them.
  2. Match any line that starts with '(def', even if indented.  This could have false positives.
  3. Match either (1) or (2).
I'm leaning toward (3), but would like to hear your thoughts.  Certainly, we could use a more specific regex that matches only predefined CL top-level constructs, but this seems wrong to me considering that CL encourages us to define macros to add such constructs when we see a need.

-- Scott