Hi all,
I am using Git to manage a CL project I am working on, and have noticed that there is no predefined regular expression to pull out "hunk headers". If you look at 'git diff' output, each hunk (sequence of consecutive lines with differences noted) has a header line, which is intended to show the first line of the top-level syntactic object (function definition, class declaration, etc.) that contains the hunk, so you can quickly see what function, class, etc. it is within. To find the appropriate header line for each hunk, Git scans backwards from the top of the hunk with a regular expression, and takes the first line that matches. The regex to use, of course, depends on the language of the source file.
Git has a table of built-in regexes for a number of languages, including Scheme, but there is neither a generic Lisp entry nor a more specific CL entry. I am inclined to submit one for inclusion, but I wanted to bounce it off you folks first.
Here is the current Scheme regex. This is in POSIX Extended ("ERE") syntax:
^[\t ]*(\(((define|def(struct|syntax|class|method|rules|record|proto|alias)?)[-*/ \t]|(library|module|struct|class)[*+ \t]).*)$
This is unfortunately not general enough to work for CL, as it doesn't pick up 'defun', though it does allow some other CL constructs.
So I started to think about what would be good for CL. Some possibilities:
1. Simply match any line that starts with an open paren in column 0. The upside of this simple rule is that it allows for arbitrary top-level construct names. But if you indent your defuns for some reason, it will overlook them. 2. Match any line that starts with '(def', even if indented. This could have false positives. 3. Match either (1) or (2).
I'm leaning toward (3), but would like to hear your thoughts. Certainly, we could use a more specific regex that matches only predefined CL top-level constructs, but this seems wrong to me considering that CL encourages us to define macros to add such constructs when we see a need.
-- Scott
Option 3. seems the most reasonable to me.
On Thu, 28 Mar 2024 at 07:49, Scott L. Burson Scott@sympoiesis.com wrote:
Hi all,
I am using Git to manage a CL project I am working on, and have noticed that there is no predefined regular expression to pull out "hunk headers". If you look at 'git diff' output, each hunk (sequence of consecutive lines with differences noted) has a header line, which is intended to show the first line of the top-level syntactic object (function definition, class declaration, etc.) that contains the hunk, so you can quickly see what function, class, etc. it is within. To find the appropriate header line for each hunk, Git scans backwards from the top of the hunk with a regular expression, and takes the first line that matches. The regex to use, of course, depends on the language of the source file.
Git has a table of built-in regexes for a number of languages, including Scheme, but there is neither a generic Lisp entry nor a more specific CL entry. I am inclined to submit one for inclusion, but I wanted to bounce it off you folks first.
Here is the current Scheme regex. This is in POSIX Extended ("ERE") syntax:
^[\t ]*(\(((define|def(struct|syntax|class|method|rules|record|proto|alias)?)[-*/ \t]|(library|module|struct|class)[*+ \t]).*)$
This is unfortunately not general enough to work for CL, as it doesn't pick up 'defun', though it does allow some other CL constructs.
So I started to think about what would be good for CL. Some possibilities:
- Simply match any line that starts with an open paren in column 0.
The upside of this simple rule is that it allows for arbitrary top-level construct names. But if you indent your defuns for some reason, it will overlook them. 2. Match any line that starts with '(def', even if indented. This could have false positives. 3. Match either (1) or (2).
I'm leaning toward (3), but would like to hear your thoughts. Certainly, we could use a more specific regex that matches only predefined CL top-level constructs, but this seems wrong to me considering that CL encourages us to define macros to add such constructs when we see a need.
-- Scott
On Thu, 28 Mar 2024 at 06:49, Scott L. Burson Scott@sympoiesis.com wrote:
So I started to think about what would be good for CL. Some possibilities:
Simply match any line that starts with an open paren in column 0. The upside of this simple rule is that it allows for arbitrary top-level construct names. But if you indent your defuns for some reason, it will overlook them.
FWIW, we've been using this simpler version at work for a couple of years. It removes the open parenthesis from the match and matches the first two words.
xfuncname = "^\(([^ ]+ [^ \)]+)"
There's a couple of indented defuns, but those are annoying for all sorts of reasons. I feel like (def would catch too many variables and functions starting with "default", although perhaps that could be explicitly excluded by the regex?
Cheers, Luís