At ILC 2014, one discussed show-stopper for using CL as a scripting language was startup time. Indeed, right now, when used as a script rather than as a dumped image, CL takes a lot of time to start:
time ( sbcl --noinform --eval '(require :asdf)' --eval '(progn (asdf:initialize-source-registry) (uiop:writeln (hash-table-count asdf::*source-registry*)) (uiop:quit))' ) 711 ( sbcl --noinform --eval '(require :asdf)' --eval ; ) 0.66s user 0.17s system 99% cpu 0.832 total
time cl '(hash-table-count asdf::*source-registry*)' 711 cl '(hash-table-count asdf::*source-registry*)' 1.20s user 0.25s system 99% cpu 1.456 total
That's because it will recursively walk all the directories under the registered source-registry trees, and there can be a lot of them. Ben Hyde tells me it's much worse on his machine.
The two following slightly incompatible changes divide that startup time by three, and promises to divide it further if people adhere to some discipline.
(defun collect-sub*directories (directory collectp recursep collector) "Given a DIRECTORY, call-function the COLLECTOR function designator on the directory if COLLECTP returns true when CALL-FUNCTION'ed with the directory, and recurse each of its subdirectories on which the RECURSEP returns true when CALL-FUNCTION'ed with them." (when (call-function collectp directory) (call-function collector directory) (dolist (subdir (subdirectories directory)) (when (call-function recursep subdir) (collect-sub*directories subdir collectp recursep collector)))))
This nests the dolist into the when, which is backward compatible as far as uiop and asdf internal usage is concerned, but not as far as other users might be concerned; however, the collectp function is a bit redundant and useless without this nesting.
(defun collect-sub*directories-asd-files (directory &key (exclude *default-source-registry-exclusions*) collect (stop-at-asd t)) (collect-sub*directories directory #'(lambda (dir) (let ((asds (directory-asd-files dir))) (map () collect asds) (not (and asds stop-at-asd)))) #'(lambda (x) (not (member (car (last (pathname-directory x))) exclude :test #'equal))) (constantly nil)))
The trick here is in this new stop-at-asd flag, which here defaults to t and isn't configurable, but which should default to nil and be configurable, for backward compatibility. Its effect is that recursing into subdirectories stops if a .asd file is found in the toplevel directory. This saves a lot of recursing, and would save even more if a .asd file of symlink to one exists at the top of a git hierarchy. But this is incompatible with a lot of existing code, and so the transition will be long and painful if this is adopted.
With these changes, I get:
time ( sbcl --noinform --eval '(require :asdf)' --eval '(progn (asdf:initialize-source-registry) (uiop:writeln (hash-table-count asdf::*source-registry*)) (uiop:quit))' ) 534 ( sbcl --noinform --eval '(require :asdf)' --eval ; ) 0.24s user 0.05s system 99% cpu 0.293 total
time cl '(hash-table-count asdf::*source-registry*)' 534 cl '(hash-table-count asdf::*source-registry*)' 0.54s user 0.13s system 99% cpu 0.665 total
That's much better timewise (about 3x speedup), but it's obviously missing a lot of .asd files. To recover them, I had to:
for i in */ ; do ( cd $i ; setopt NULL_GLOB ; A=( */**/*.asd ) ; echo $i $#A ; if [ $#A -gt 0 ] ; then ln -s $A . ; fi ) ; done |&tee /tmp/a
Then I get:
time ( sbcl --noinform --eval '(require :asdf)' --eval '(progn (asdf:initialize-source-registry) (uiop:writeln (hash-table-count asdf::*source-registry*)) (uiop:quit))' ) 711 ( sbcl --noinform --eval '(require :asdf)' --eval ; ) 0.24s user 0.09s system 99% cpu 0.335 total
time cl '(hash-table-count asdf::*source-registry*)' 711 cl '(hash-table-count asdf::*source-registry*)' 0.50s user 0.07s system 100% cpu 0.567 total
And... oops, I realize I failed to do the scripting at the SLIME REPL. Mea culpa.
Of course, if you want instant startup without any search, you can eschew ASDF2 style autoconfiguration, and go the sysadmin way of ASDF1. I still think there is value in combining autoconfiguration with somewhat faster startup time than we have now.
Of course, implementing such a plan with a multi-year backward-compatible migration strategy is the prerogative of the current and future maintainers, if they wish to undertake it: it would take implementing (and testing) the code, but disabling it by default, enabled with suitable (backward-incompatible) flags in configuration files. Then, while waiting for all implementations to eventually adopt the new release, pushing users to reform the way they layout their directories (and maybe adopt package-inferred-system, while we're at it — it could help systems that have a lot of one-file subsystems.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org As for poverty, no one need be ashamed to admit it: the real shame is in not taking practical measures to escape from it. — Perikles
On 21 Aug 2014, at 02:36, Faré fahree@gmail.com wrote:
[…]
The trick here is in this new stop-at-asd flag, which here defaults to t and isn't configurable, but which should default to nil and be configurable, for backward compatibility. Its effect is that recursing into subdirectories stops if a .asd file is found in the toplevel directory. This saves a lot of recursing, and would save even more if a .asd file of symlink to one exists at the top of a git hierarchy. But this is incompatible with a lot of existing code, and so the transition will be long and painful if this is adopted.
If you proposing that the stop-at-asd property would be somehow configurable in the DEFSYSTEM form, like:
(asdf:defsystem :foo :contains-interior-asdf-defintions :components …
then please ensure that this is present when/if you introduce this change to ASDF. But I get the feeling that in order to speed things up, you weren’t intending to parse the DEFSYSTEM form in your search.
If you are proposing that the user would have to do explicitly do some sort of configuration “for this instance of a user using asdf with this Lisp implementation”, I won’t be so happy because:
1) This sort of configuration hasn’t been necessary before, so we will introduce complexity in ASDF deployment for efficiency in using CL as a scripting language which is something I don’t currently use (Admittedly because my platform, ABCL, based on the JVM, is just not going to ever have reasonable startup times. Although there are systems that keep a JVM “warmed up” for firing such one-off commands to, and for specialised JVM there are memory mapped solutions for faster startup).
2) I am using a system (lsw2) not in Quicklisp that has many such “interior” ASDF definitions. Usually when systems get in this state it is because they are big enough that nobody has time to package them correctly, so they tend to stay that way. If I can’t put a flag in the top-level system, I’m going to run into problems when users haven’t done the per-user-per-lisp configuration.
On Thu, Aug 21, 2014 at 2:38 AM, Mark Evenson evenson@panix.com wrote:
On 21 Aug 2014, at 02:36, Faré fahree@gmail.com wrote:
[…]
The trick here is in this new stop-at-asd flag, which here defaults to t and isn't configurable, but which should default to nil and be configurable, for backward compatibility. Its effect is that recursing into subdirectories stops if a .asd file is found in the toplevel directory. This saves a lot of recursing, and would save even more if a .asd file of symlink to one exists at the top of a git hierarchy. But this is incompatible with a lot of existing code, and so the transition will be long and painful if this is adopted.
If you proposing that the stop-at-asd property would be somehow configurable in the DEFSYSTEM form, like:
(asdf:defsystem :foo :contains-interior-asdf-defintions :components …
then please ensure that this is present when/if you introduce this change to ASDF. But I get the feeling that in order to speed things up, you weren’t intending to parse the DEFSYSTEM form in your search.
Indeed, requiring to parse a .asd file is a bad idea — and is even worse when there are hundreds of .asd files in the directory.
But maybe we could detect a file called source-registry.conf or something similar, and parse that to look for subdirectories with .asd files in them. In the absence of such a file, the default behavior would for backward compatibility be to always recurse, or maybe for speed in a future version years from now be to recurse only if no .asd file was found.
If you are proposing that the user would have to do explicitly do some sort of configuration “for this instance of a user using asdf with this Lisp implementation”, I won’t be so happy because:
- This sort of configuration hasn’t been necessary before, so we will
introduce complexity in ASDF deployment for efficiency in using CL as a scripting language which is something I don’t currently use (Admittedly because my platform, ABCL, based on the JVM, is just not going to ever have reasonable startup times. Although there are systems that keep a JVM “warmed up” for firing such one-off commands to, and for specialised JVM there are memory mapped solutions for faster startup).
- I am using a system (lsw2) not in Quicklisp that has many such “interior”
ASDF definitions. Usually when systems get in this state it is because they are big enough that nobody has time to package them correctly, so they tend to stay that way. If I can’t put a flag in the top-level system, I’m going to run into problems when users haven’t done the per-user-per-lisp configuration.
What about supporting a source-registry.conf or .asdf-search.conf or similar file to control recursion of the search, and eventually requiring it when a directory both has a .asd file yet requires recursion?
ASDF should do the right thing and require no configuration from users, but minimal configuration of their own directory structures by programmers is acceptable — except that for backward compatibility, it should default to always recurse for now.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org The problem with Unix ever becoming a widely popular desktop operating system is referred to as the 'guru in the box' problem. To get and keep Unix running smoothly you need a captive guru on site and there just aren't enough gurus to put in the shipping boxes. — Brian Kernighan
If I understand correctly, the proposal is to require configuration only for the special case of wanting faster start up, and absent that, configuration will be as before, since optimization for scripting is the exceptional case.
That seems like a benign modification. I'd accept such a patch (with bumping of version for easy detection). We should document it appropriately, of course.
On August 21, 2014 2:39:24 AM CDT, "Faré" fahree@gmail.com wrote:
On Thu, Aug 21, 2014 at 2:38 AM, Mark Evenson evenson@panix.com wrote:
On 21 Aug 2014, at 02:36, Faré fahree@gmail.com wrote:
[…]
The trick here is in this new stop-at-asd flag, which here defaults
to
t and isn't configurable, but which should default to nil and be configurable, for backward compatibility. Its effect is that
recursing
into subdirectories stops if a .asd file is found in the toplevel directory. This saves a lot of recursing, and would save even more
if
a .asd file of symlink to one exists at the top of a git hierarchy. But this is incompatible with a lot of existing code, and so the transition will be long and painful if this is adopted.
If you proposing that the stop-at-asd property would be somehow
configurable in
the DEFSYSTEM form, like:
(asdf:defsystem :foo :contains-interior-asdf-defintions :components …
then please ensure that this is present when/if you introduce this
change to
ASDF. But I get the feeling that in order to speed things up, you
weren’t
intending to parse the DEFSYSTEM form in your search.
Indeed, requiring to parse a .asd file is a bad idea — and is even worse when there are hundreds of .asd files in the directory.
But maybe we could detect a file called source-registry.conf or something similar, and parse that to look for subdirectories with .asd files in them. In the absence of such a file, the default behavior would for backward compatibility be to always recurse, or maybe for speed in a future version years from now be to recurse only if no .asd file was found.
If you are proposing that the user would have to do explicitly do
some sort of
configuration “for this instance of a user using asdf with this Lisp implementation”, I won’t be so happy because:
- This sort of configuration hasn’t been necessary before, so we
will
introduce complexity in ASDF deployment for efficiency in using CL as
a
scripting language which is something I don’t currently use
(Admittedly because
my platform, ABCL, based on the JVM, is just not going to ever have
reasonable
startup times. Although there are systems that keep a JVM “warmed
up” for
firing such one-off commands to, and for specialised JVM there are
memory
mapped solutions for faster startup).
- I am using a system (lsw2) not in Quicklisp that has many such
“interior”
ASDF definitions. Usually when systems get in this state it is
because they are
big enough that nobody has time to package them correctly, so they
tend to stay
that way. If I can’t put a flag in the top-level system, I’m going
to run into
problems when users haven’t done the per-user-per-lisp configuration.
What about supporting a source-registry.conf or .asdf-search.conf or similar file to control recursion of the search, and eventually requiring it when a directory both has a .asd file yet requires recursion?
ASDF should do the right thing and require no configuration from users, but minimal configuration of their own directory structures by programmers is acceptable — except that for backward compatibility, it should default to always recurse for now.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org The problem with Unix ever becoming a widely popular desktop operating system is referred to as the 'guru in the box' problem. To get and keep Unix running smoothly you need a captive guru on site and there just aren't enough gurus to put in the shipping boxes. — Brian Kernighan
Asdf-devel mailing list Asdf-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel
Quick PS: what mechanism do you think should be used to tweak this setting? Should presumably be something easy to specify (i.e., not a config file), so that one can quickly start a lisp script, without messing up one's conventional lisp development environment.
Environment variable? Since there's no portable way to do this by command-line argument....
Cheers, r
On Thu, Aug 21, 2014 at 2:10 PM, Robert P. Goldman rpgoldman@sift.info wrote:
Quick PS: what mechanism do you think should be used to tweak this setting? Should presumably be something easy to specify (i.e., not a config file), so that one can quickly start a lisp script, without messing up one's conventional lisp development environment.
Environment variable? Since there's no portable way to do this by command-line argument....
I'd like to avoid unnecessary environment variables.
Once again, where under a hierarchy the .asd files are is ultimately the knowledge and responsibility of the curators of the respective source trees, not of the end-user. Therefore, the absence of recursion should be a matter of said source trees including a file cl-source-registry.conf or .cl-source-registry.conf (visible having priority over hidden), that specifies how (not) to recurse in that tree. A simple script could be provided for system writers to create or update said file. A year after all relevant packages have migrated, we can make it the default and people can drop said configuration file when they follow the default behavior of not having meaningful .asd files except directly under the top directory. Furthermore, with a :file directive, these configuration files can directly list the .asd files without any further filesystem access.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org The highest of minds / Often have built for themselves / The tallest of jails.
Faré wrote:
On Thu, Aug 21, 2014 at 2:10 PM, Robert P. Goldman rpgoldman@sift.info wrote:
Quick PS: what mechanism do you think should be used to tweak this setting? Should presumably be something easy to specify (i.e., not a config file), so that one can quickly start a lisp script, without messing up one's conventional lisp development environment.
Environment variable? Since there's no portable way to do this by command-line argument....
I'd like to avoid unnecessary environment variables.
Once again, where under a hierarchy the .asd files are is ultimately the knowledge and responsibility of the curators of the respective source trees, not of the end-user. Therefore, the absence of recursion should be a matter of said source trees including a file cl-source-registry.conf or .cl-source-registry.conf (visible having priority over hidden), that specifies how (not) to recurse in that tree. A simple script could be provided for system writers to create or update said file. A year after all relevant packages have migrated, we can make it the default and people can drop said configuration file when they follow the default behavior of not having meaningful .asd files except directly under the top directory. Furthermore, with a :file directive, these configuration files can directly list the .asd files without any further filesystem access.
I don't think that this new behavior should ever be the default. Scripting is the edge-case for CL, not building of large, complex systems. Further, for some CL implementations (like ABCL, as Mark points out), scripting use is *never* a good option.
From a cost tradeoff PoV, there are few scripting configurations now,
and fixing them all is easy and cheap, since they are for early-adopters. Fixing all systems that contain nested system definitions is neither easy nor cheap.
When one wants scripting, it should be easy to specify a "scripting lisp." For now, I suggest that people who want to script with CL should build themselves a pre-configured image for the purpose. That image could have a feature or ASDF configuration variable set so to change the default behavior to cut off recursive ASDF search. Configuration files in source trees would complement this behavior.
That approach would serve the purpose of making this behavior easy to specify at the command-line or mouse-click. Later, if CL-based scripting catches on, lisp implementations could ship with versions that are intended for rapid start-up and scripting, avoiding the need for scripters to build their own images. Or separate scripting packages could be provided.
On Mon, Aug 25, 2014 at 12:28 PM, Robert P. Goldman rpgoldman@sift.info wrote:
Once again, where under a hierarchy the .asd files are is ultimately the knowledge and responsibility of the curators of the respective source trees, not of the end-user. Therefore, the absence of recursion should be a matter of said source trees including a file cl-source-registry.conf or .cl-source-registry.conf (visible having priority over hidden), that specifies how (not) to recurse in that tree. A simple script could be provided for system writers to create or update said file. A year after all relevant packages have migrated, we can make it the default and people can drop said configuration file when they follow the default behavior of not having meaningful .asd files except directly under the top directory. Furthermore, with a :file directive, these configuration files can directly list the .asd files without any further filesystem access.
I don't think that this new behavior should ever be the default. Scripting is the edge-case for CL, not building of large, complex systems. Further, for some CL implementations (like ABCL, as Mark points out), scripting use is *never* a good option.
Well, we'll see. But faster startup times benefit everyone, not just people using Lisp for scripting. Shaving a second or so from startup time (somewhat less or much more depending on the implementation and the size of your source trees) is useful whether you're (re)starting SLIME, running tests in a batch job, building in a slave or cross-compiling as well as if you're scripting.
From a cost tradeoff PoV, there are few scripting configurations now,
and fixing them all is easy and cheap, since they are for early-adopters. Fixing all systems that contain nested system definitions is neither easy nor cheap.
That's why I proposed a two year migration plan, of the same kind we did for unicode support.
I believe it may be worth it.
On the other hand, if we support this search for cl-source-registry.conf while recursing, then whichever trees are managed (by e.g. quicklisp or debian or clbuild) can have this cl-source-registry.conf automatically optimized by the same manager, so only unmanaged source code downloaded by the user remains slow, and this should be small, except for power users who are big boys enough to do the system administration step of regenerating that file when they update their repositories.
When one wants scripting, it should be easy to specify a "scripting lisp." For now, I suggest that people who want to script with CL should build themselves a pre-configured image for the purpose. That image could have a feature or ASDF configuration variable set so to change the default behavior to cut off recursive ASDF search. Configuration files in source trees would complement this behavior.
Yes, you can already cut a lot of the time by dumping an image.
That approach would serve the purpose of making this behavior easy to specify at the command-line or mouse-click. Later, if CL-based scripting catches on, lisp implementations could ship with versions that are intended for rapid start-up and scripting, avoiding the need for scripters to build their own images. Or separate scripting packages could be provided.
The problem is that it's not a simple matter of a scripting lisp vs a non-scripting lisp, since many libraries (and/or managed set of libraries) needs to be modified to keep working in this scheme for (faster) source-registry initialization.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org To stay young requires unceasing cultivation of the ability to unlearn old falsehoods. — Robert Heinlein, "Time Enough For Love"
On Thu, Aug 21, 2014 at 9:54 AM, Robert P. Goldman rpgoldman@sift.net wrote:
If I understand correctly, the proposal is to require configuration only
for
the special case of wanting faster start up, and absent that,
configuration
will be as before, since optimization for scripting is the exceptional
case.
That seems like a benign modification. I'd accept such a patch (with
bumping
of version for easy detection). We should document it appropriately, of course.
Yes, it should all remain backward-compatible, at least unless and until some maintainer leads a two year campaign for migration to a different setting.
My plan is as follows: 1- have a special variable tell whether to recurse under a .asd by default, defaulting to t for now 2- adding some keyword argument to :tree to override this variable 3- adding support for source-registry.conf and/or .source-registry.conf as things to detect and heed when recursing into a directory.
I modified my previously posted code as a solution for 1, attached. 2 and 3, I'll add to the TODO for now.
(And yes, changing startup from 1.45s to .66s with cl-launch (resp. .83s to .33s without) is well worth it. It makes some scripts usable that are otherwise annoyingly slow; the difference is even more dramatic for me on CCL, where it drops from 2.57s to 0.37s with cl-launch (resp. 2.0s to 0.27s without).)
PS: while testing my changes, I found a trivial bug in test-program, that failed to rename load-fasl-op to load-bundle-op. Fixed.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org A flea and a fly in a flue were imprisoned, so what could they do? Said the fly: "let us flee!". Said the flea: "let us fly!". So they flew thru a flaw in the flue...