If I run several sbcl processes on different nodes in my compute cluster, it might happen that two different runs notice the same file needs to be recompiled (via asdf), and they might try to compile it at the same time. What is the best way to prevent this?
I see in the asdf documentation that there is an asdf:*user-cache* variable whose value is the path name of the directory where asdf compiles into. Would it be advisable for my to arrange so that asdf:*user-cache* is a function of the pid and hostname and perhaps thread-id (if such a thing exists) to avoid such collisions?
Or is there some better way to handle this which is build into asdf?
On 23 Jan 2018, at 12:00, Jim Newton jnewton@lrde.epita.fr wrote:
If I run several sbcl processes on different nodes in my compute cluster, it might happen that two different runs notice the same file needs to be recompiled (via asdf), and they might try to compile it at the same time. What is the best way to prevent this?
I see in the asdf documentation that there is an asdf:*user-cache* variable whose value is the path name of the directory where asdf compiles into. Would it be advisable for my to arrange so that asdf:*user-cache* is a function of the pid and hostname and perhaps thread-id (if such a thing exists) to avoid such collisions?
Or is there some better way to handle this which is build into asdf?
I had requested that ASDF includes the hostname (or machine-instance), in the built path for the cache. Unfortunately, for some reason, the maintainers of ASDF thought it was a good read to remove it. There you are!
Apparently, this approach seems to work. I’m not sure if it is the best approach. Here is what my code looks like. It creates a directory in /tmp/ and asdf:load-system seems to compile the .fasl files into there.
(require :asdf) (require :sb-posix) (let ((home (directory-namestring (user-homedir-pathname))) (uid (sb-posix:getuid)) (pid (sb-posix:getpid))) (setf asdf::*user-cache* (ensure-directories-exist (format nil "/tmp~A~D/~D/" home uid pid))))
#-quicklisp (let ((quicklisp-init "/lrde/home/jnewton/quicklisp/setup.lisp")) (if (probe-file quicklisp-init) (load quicklisp-init) (error "file not found ~S" quicklisp-init))) (asdf:load-system :lisp-types-test)
On 23 Jan 2018, at 12:47, Pascal Bourguignon pjb@informatimago.com wrote:
On 23 Jan 2018, at 12:00, Jim Newton <jnewton@lrde.epita.fr mailto:jnewton@lrde.epita.fr> wrote:
If I run several sbcl processes on different nodes in my compute cluster, it might happen that two different runs notice the same file needs to be recompiled (via asdf), and they might try to compile it at the same time. What is the best way to prevent this?
I see in the asdf documentation that there is an asdf:*user-cache* variable whose value is the path name of the directory where asdf compiles into. Would it be advisable for my to arrange so that asdf:*user-cache* is a function of the pid and hostname and perhaps thread-id (if such a thing exists) to avoid such collisions?
Or is there some better way to handle this which is build into asdf?
I had requested that ASDF includes the hostname (or machine-instance), in the built path for the cache. Unfortunately, for some reason, the maintainers of ASDF thought it was a good read to remove it. There you are!
-- __Pascal J. Bourguignon__
(Sorry for delayed response)
: Jim Newton If I run several sbcl processes on different nodes in my compute cluster, it might happen that two different runs notice the same file needs to be recompiled (via asdf), and they might try to compile it at the same time. What is the best way to prevent this?
You mean that this machines share the same host directory? Interesting.
"Normal" rules of ASDF compile to a temporary file and rename the output at the end, thus providing some kind of race resistance. But for backward-compatibility reasons, this requires every extension to manually follow a protocol for ASDF to remain robust.
I see in the asdf documentation that there is an asdf:*user-cache* variable whose value is the path name of the directory where asdf compiles into. Would it be advisable for my to arrange so that asdf:*user-cache* is a function of the pid and hostname and perhaps thread-id (if such a thing exists) to avoid such collisions?
That's an option. It is expensive, though: it means no sharing of fasl files between hosts. If you have cluster of 200 machines, that means 200x the disk space.
What about instead building your application as an executable and delivering that to the cluster?
My rule of thumb is that there is one home directory per human, and the human is only interactively building one thing at a time (and/or can set several accounts and/or $HOME variants for as many "personalities"). Thus you only need one fasl cache for interactive compilation. If you want non-interactive deployment, use tools like bazel, nix, etc., to build your software deterministically.
Or is there some better way to handle this which is build into asdf?
You can have different ASDF_OUTPUT_TRANSLATIONS or asdf:*output-translations-parameter* on each machine, or you can indeed have the user cache depend on uiop:hostname and more.
The Right Thing™ is still to build and test then deploy, rather than deploy then build. Using Bazel, you might even be able to build in parallel on your cluster.
: pjb I had requested that ASDF includes the hostname (or machine-instance), in the built path for the cache. Unfortunately, for some reason, the maintainers of ASDF thought it was a good read to remove it. There you are!
I still think it's a bad idea. If your $HOME is shared by many machines, you probably want what's in $HOME to be shared, too. Go build in /var/tmp or use Bazel or whatever. Or use uiop:hostname in your ASDF configuration.
On Tue, Jan 23, 2018 at 7:51 AM, Jim Newton jnewton@lrde.epita.fr wrote:
Apparently, this approach seems to work. I’m not sure if it is the best approach. Here is what my code looks like. It creates a directory in /tmp/ and asdf:load-system seems to compile the .fasl files into there.
(require :asdf) (require :sb-posix) (let ((home (directory-namestring (user-homedir-pathname))) (uid (sb-posix:getuid)) (pid (sb-posix:getpid))) (setf asdf::*user-cache* (ensure-directories-exist (format nil "/tmp~A~D/~D/" home uid pid))))
I still don't understand why your use case uses deploy-then-build rather than build-then-deploy.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org A child of five would understand this. Send someone to fetch a child of five. — Groucho Marx
Hi Faré,
Thanks for taking the time to understand my comments. I’ve tried to respond to some of your questions below. Sorry if my original post wasn’t explicit enough to give enough explanation for what I’m trying to do.
If I run several sbcl processes on different nodes in my compute cluster, it might happen that two different runs notice the same file needs to be recompiled (via asdf), and they might try to compile it at the same time. What is the best way to prevent this?
You mean that this machines share the same host directory? Interesting.
Yes, the cluster shares some disk, and shares home directory. And I believe two cores on the same physical host share the /tmp, but I’m not 100% sure about that.
That's an option. It is expensive, though: it means no sharing of fasl files between hosts. If you have cluster of 200 machines, that means 200x the disk space.
With regard to the question of efficient reuse of fasl files: this is completely irrelevant for my case. My code takes hours (10 to 12 hours worst case) to run, but only 20 seconds (or less) to compile. I’m very happy to completely remove the fasl files and regenerate them before each 10 hour run. (note to self: I need to double check that I do in fact delete the fasl files every time.) Besides, my current flow allows my simply to git-check-in a change, and re-lauch the code on the cluster in batch. I don’t really want to add an error-prone manual local-build-and-deploy step if that can be avoided, unless of course there is some great advantage to that approach.
What about instead building your application as an executable and delivering that to the cluster?
One difficulty about your build-then-deliver suggestion is that my local machine is running mac-os, and the cluster is running linux. I don’t think I can build linux executables on my mac.
You can have different ASDF_OUTPUT_TRANSLATIONS or asdf:*output-translations-parameter* on each machine, or you can indeed have the user cache depend on uiop:hostname and more.
This is what I’ve ended up doing. And it seems to work. Here is the code I have inserted into all my scripts.
(let ((home (directory-namestring (user-homedir-pathname))) (uid (sb-posix:getuid)) (pid (sb-posix:getpid))) (setf asdf::*user-cache* (ensure-directories-exist (format nil "/tmp~A~D/~D/" home uid pid))))
The Right Thing™ is still to build and test then deploy, rather than deploy then build.
In response to your suggestion about build then deploy. This seems very dangerous and error prone to me. For example,what if different hosts want to run the same source code but with different optimization settings? This is a real possibility, as some of my processes are running with profiling (debug 3) and collecting profiling results, and others are running super optimized (speed 3) code to try to find the fastest something-or-other.
I don’t even know whether it is possible create the .asd files so that changing a optimization declaration will trigger everything depending on it to be recompiled. And If I think i’ve written my .asd files as such, how would I know whether they are really correct?
It is not the case currently, but may very well be in the future that I want different jobs in the cluster running different git branches of my code code. That would be a nightmare to manage if I try to share fasl files.
Using Bazel, you might even be able to build in parallel on your cluster.
Basel sounds interesting, but I don’t really see the advantage of building in parallel when it only takes a few seconds to build, but half a day to execute.
I still don't understand why your use case uses deploy-then-build rather than build-then-deploy.
I hope it is now clear why I can’t. (1) local machine is mac-os while cluster is linux (2) different jobs in cluster are using different optimization settings. (3) future enhancement to have different cluster nodes running different branches of the code.
Kind regards Jim
: Jim Newton
One difficulty about your build-then-deliver suggestion is that my local machine is running mac-os, and the cluster is running linux. I don’t think I can build linux executables on my mac.
Your build does not have to be "local": pick one random Linux machine, have it do the compilation, and when it's done, your entire cluster is ready to start from the compiled executable.
The advantage is that you don't have ugly build race conditions as above.
ASDF's program-op and/or cl-launch will help you build and deliver a single executable for all your needs. You can even use cl-launch's multicall capabilities so the same executable has multiple functions.
For example,what if different hosts want to run the same source code but with different optimization settings? This is a real possibility, as some of my processes are running with profiling (debug 3) and collecting profiling results, and others are running super optimized (speed 3) code to try to find the fastest something-or-other.
Then have one output-translations per optimization setting, and produce two binaries with different names.
I don’t even know whether it is possible create the .asd files so that changing a optimization declaration will trigger everything depending on it to be recompiled. And If I think i’ve written my .asd files as such, how would I know whether they are really correct?
You need to configure optimization settings in your build script, after you load asdf and before you use it. See for instance the snapshot of quux on qitab for how we did it at ITA (in qres-build).
It is not the case currently, but may very well be in the future that I want different jobs in the cluster running different git branches of my code code. That would be a nightmare to manage if I try to share fasl files.
Indeed. Build multiple binaries each with its own output-translations, then distribute the binaries under different names.
Basel sounds interesting, but I don’t really see the advantage of building in parallel when it only takes a few seconds to build, but half a day to execute.
A split second is better than a few seconds, but yes, if you're the only user, the cost of setting it up is probably not worth it.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org I'm a polyatheist — there are many gods I don't believe in. — Dan Fouts
For example,what if different hosts want to run the same source code but with different optimization settings? This is a real possibility, as some of my processes are running with profiling (debug 3) and collecting profiling results, and others are running super optimized (speed 3) code to try to find the fastest something-or-other.
Then have one output-translations per optimization setting, and produce two binaries with different names.
I suspect this would be one binary for each permutation of the optimization settings used times the number of top-level entry points. Right? That number is much larger than the number of /tmp directories I need just to automatically compile before running.
I suspect this would be one binary for each permutation of the optimization settings used times the number of top-level entry points. Right? That number is much larger than the number of /tmp directories I need just to automatically compile before running.
No, use cl-launch/dispatch to create a single executable for all entry-points.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org I'm only a stupid AI, but my creator is a real genius!
Sorry for the late response.
What you have seems like it will work, but couldn't you just as easily use the ASDF output translations configuration facility that is described here: https://common-lisp.net/project/asdf/asdf/Controlling-where-ASDF-saves-compi...
There's an example there that you could probably adapt by inserting `uid` and `pid`, as appropriate.
Best, r
On 23 Jan 2018, at 6:51, Jim Newton wrote:
Apparently, this approach seems to work. I’m not sure if it is the best approach. Here is what my code looks like. It creates a directory in /tmp/ and asdf:load-system seems to compile the .fasl files into there.
(require :asdf) (require :sb-posix) (let ((home (directory-namestring (user-homedir-pathname))) (uid (sb-posix:getuid)) (pid (sb-posix:getpid))) (setf asdf::*user-cache* (ensure-directories-exist (format nil "/tmp~A~D/~D/" home uid pid))))
#-quicklisp (let ((quicklisp-init "/lrde/home/jnewton/quicklisp/setup.lisp")) (if (probe-file quicklisp-init) (load quicklisp-init) (error "file not found ~S" quicklisp-init))) (asdf:load-system :lisp-types-test)
On 23 Jan 2018, at 12:47, Pascal Bourguignon pjb@informatimago.com wrote:
On 23 Jan 2018, at 12:00, Jim Newton <jnewton@lrde.epita.fr mailto:jnewton@lrde.epita.fr> wrote:
If I run several sbcl processes on different nodes in my compute cluster, it might happen that two different runs notice the same file needs to be recompiled (via asdf), and they might try to compile it at the same time. What is the best way to prevent this?
I see in the asdf documentation that there is an asdf:*user-cache* variable whose value is the path name of the directory where asdf compiles into. Would it be advisable for my to arrange so that asdf:*user-cache* is a function of the pid and hostname and perhaps thread-id (if such a thing exists) to avoid such collisions?
Or is there some better way to handle this which is build into asdf?
I had requested that ASDF includes the hostname (or machine-instance), in the built path for the cache. Unfortunately, for some reason, the maintainers of ASDF thought it was a good read to remove it. There you are!
-- __Pascal J. Bourguignon__
On 23 Jan 2018, at 5:47, Pascal Bourguignon wrote:
On 23 Jan 2018, at 12:00, Jim Newton jnewton@lrde.epita.fr wrote:
If I run several sbcl processes on different nodes in my compute cluster, it might happen that two different runs notice the same file needs to be recompiled (via asdf), and they might try to compile it at the same time. What is the best way to prevent this?
I see in the asdf documentation that there is an asdf:*user-cache* variable whose value is the path name of the directory where asdf compiles into. Would it be advisable for my to arrange so that asdf:*user-cache* is a function of the pid and hostname and perhaps thread-id (if such a thing exists) to avoid such collisions?
Or is there some better way to handle this which is build into asdf?
I had requested that ASDF includes the hostname (or machine-instance), in the built path for the cache. Unfortunately, for some reason, the maintainers of ASDF thought it was a good read to remove it. There you are!
This would be a poor solution for the many users who connect via DHCP on different networks and have hostnames that change.
There are relatively few people who have an issue with contention over their home directories, and those users can simply reconfigure their ASDF output translations. But there are relatively many who might find their hostname changing.