Hello,
I was thinking about what a bear it will be to have to specify each of my compute node systems in my clusters individually.
I was thinking about syntax such as the following that would create an equipment object for each of 256 hosts named l001 through l256.
(machine-cluster "LosLobos" linux-host (user "download") (disks (disk "/dev/sda1" 80 95)) (machinerange "l001-l256"))
What do y'all think? Is there a better way to do this? Something like this would reduce the size of my shop's config file a couple orders of magnitude - in turn, I'd screw things up accordingly less often :)
Jim
James E. Prewett Jim@Prewett.org download@hpc.unm.edu Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ Designated Security Officer OpenPGP key: pub 1024D/31816D93 HPC Systems Engineer III UNM HPC 505.277.8210
Jim writes:
I was thinking about what a bear it will be to have to specify each of my compute node systems in my clusters individually.
I was thinking about syntax such as the following that would create an equipment object for each of 256 hosts named l001 through l256.
(machine-cluster "LosLobos" linux-host (user "download") (disks (disk "/dev/sda1" 80 95)) (machinerange "l001-l256"))
It is definitely a cool idea. I am not ENTIRELY sure what the best syntax for the name range would be, but it's definitely something that would be handy for many users.
What do y'all think? Is there a better way to do this? Something like this would reduce the size of my shop's config file a couple orders of magnitude - in turn, I'd screw things up accordingly less often :)
At the moment, you could PROBABLY use LOOP and friends, but that'd be icky. The whole config file is a chain of macros expanded to code generating configuration objects. On further consideration, no, you can't (yes, I tested, no you're not supposed tio, it seems it worked...).
I was actually thinking, the other day, that the current config-nesting only allows a nested macro to use a single context and I was trying to thing of a use-case where one would want stuff nested in more than one and now I have one.
I'll get the changes necessary for defnested sorted before heading off to work. I suspect a workable method for naming the individual hosts would be a "member-name" config stanza, taking (say) a format-string (either C or CL, we should eb able to compile the former to the latter), a start number and an end number. I feel that making sure you do the right thing for something like "l001-l256" is just plain hard in the general case.
Imagine trying t figure out how many hosts and what they SHOULD be called when faced with something like "rtr-f01-001-rtr-f03-999". Is that 999 routers, 333 named "...f01...", 3 routers, named "...f01-001", "f02-500" and "...f03-999" or 2997 routers? I could make a case for all of those. :) I also suspect we only want to support a single range, to make things MUCH easier.
//Ingvar
It is definitely a cool idea. I am not ENTIRELY sure what the best syntax for the name range would be, but it's definitely something that would be handy for many users.
I'll admit my proposed syntax, mostly in terms of the name range leaves something to be desired. I'm very glad that you appreciate the idea :) I'm sure you've made 1000 identical 10 line entries in some config file before. Yuck! :P :)
What do y'all think? Is there a better way to do this? Something like this would reduce the size of my shop's config file a couple orders of magnitude - in turn, I'd screw things up accordingly less often :)
At the moment, you could PROBABLY use LOOP and friends, but that'd be icky. The whole config file is a chain of macros expanded to code generating configuration objects. On further consideration, no, you can't (yes, I tested, no you're not supposed tio, it seems it worked...).
If I'm understanding what you're saying here correctly: basically that the configuration file is "just Lisp", I think that is a good thing (TM). Certainly if we think we can specify everything our user might ever want to put into a config file, we are just dead wrong (Lisp is better than Jim's config language *unless* Jim's config language is a superset of Lisp or some other reasonably powerful language simply because how can Jim guess what you want to do and what your environment looks like?)
I was actually thinking, the other day, that the current config-nesting only allows a nested macro to use a single context and I was trying to thing of a use-case where one would want stuff nested in more than one and now I have one.
I'll get the changes necessary for defnested sorted before heading off to work. I suspect a workable method for naming the individual hosts would be a "member-name" config stanza, taking (say) a format-string (either C or CL, we should eb able to compile the former to the latter), a start number and an end number. I feel that making sure you do the right thing for something like "l001-l256" is just plain hard in the general case.
Imagine trying t figure out how many hosts and what they SHOULD be called when faced with something like "rtr-f01-001-rtr-f03-999". Is that 999 routers, 333 named "...f01...", 3 routers, named "...f01-001", "f02-500" and "...f03-999" or 2997 routers? I could make a case for all of those. :) I also suspect we only want to support a single range, to make things MUCH easier.
agreed!
I'd actually decided this syntax was pretty problematic about 6 months ago working on the SCAT project. However, since SCAT is a rip-off of X-CAT (IBM's unsupported, not open source, not a product cluster management software (that sucks)) I decided to punt on that one and just use the syntax you rightly note is problematic (and swear to myself I'd never name a host rtr-f01-001). I've gotten so used to this syntax now that .... well... bad habits are hard to break I guess :)
When we come up with a good syntax for this, I can pretty much guarantee I'm going to steal it for SCAT :)
Jim
One more thing here... I think we should *guarantee* that the config file is Lisp. That way I could create my own monitor classes, methods, etc. (even replace internal noctool methods :) inside my config file without ever having to touch noctool sources. This is pretty much what I do with the LoGS config file - I think it is an especially powerful concept.
If the noctool developers weren't on top of having sets of machines, I, the user, sure would want to be able to use LOOP or some such thing. Its also awefully handy if there's some obnoxious-ass user that wants a frob object and the entire rest of the user community thinks its a bad idea - he can just implement it himself, hopefully without having to maintain his own fork of the codebase. :)
I LIKE BEING ABLE TO POINT THE BAZOOKA AT MY FOOT! I'M WORKING ON A DOUBLE-BARRELLED BAZOOKA SO I CAN POINT IT AT *BOTH* FEET!! ;)
(i just work dilligently and pray (a lot) i don't pull the trigger at the wrong time)
I get offended by tools that *tell me what I want to do*. Maybe you shouldn't be able to use dc as a word processor... Then again, why not? (if you're sick and twisted enough to put in the work, I could imagine a calculator you could turn into a word processor... look at Mathematica for pete's sake! I think thats actually precisely what happened with Mathematica - and its a pretty cool tool because that sort of thing is possible.). No, I'm not a Mathematica salesman... or even a user, really :P :)
I'll get off my soap box now :)
Jim p.s. I think I want to use noctool as a word processor - Can you help me with that Ingvar? ;P I've got this PDF file.... ;)
On Thu, 22 May 2008, Jim Prewett wrote:
It is definitely a cool idea. I am not ENTIRELY sure what the best syntax for the name range would be, but it's definitely something that would be handy for many users.
I'll admit my proposed syntax, mostly in terms of the name range leaves something to be desired. I'm very glad that you appreciate the idea :) I'm sure you've made 1000 identical 10 line entries in some config file before. Yuck! :P :)
What do y'all think? Is there a better way to do this? Something like this would reduce the size of my shop's config file a couple orders of magnitude - in turn, I'd screw things up accordingly less often :)
At the moment, you could PROBABLY use LOOP and friends, but that'd be icky. The whole config file is a chain of macros expanded to code generating configuration objects. On further consideration, no, you can't (yes, I tested, no you're not supposed tio, it seems it worked...).
If I'm understanding what you're saying here correctly: basically that the configuration file is "just Lisp", I think that is a good thing (TM). Certainly if we think we can specify everything our user might ever want to put into a config file, we are just dead wrong (Lisp is better than Jim's config language *unless* Jim's config language is a superset of Lisp or some other reasonably powerful language simply because how can Jim guess what you want to do and what your environment looks like?)
I was actually thinking, the other day, that the current config-nesting only allows a nested macro to use a single context and I was trying to thing of a use-case where one would want stuff nested in more than one and now I have one.
I'll get the changes necessary for defnested sorted before heading off to work. I suspect a workable method for naming the individual hosts would be a "member-name" config stanza, taking (say) a format-string (either C or CL, we should eb able to compile the former to the latter), a start number and an end number. I feel that making sure you do the right thing for something like "l001-l256" is just plain hard in the general case.
Imagine trying t figure out how many hosts and what they SHOULD be called when faced with something like "rtr-f01-001-rtr-f03-999". Is that 999 routers, 333 named "...f01...", 3 routers, named "...f01-001", "f02-500" and "...f03-999" or 2997 routers? I could make a case for all of those. :) I also suspect we only want to support a single range, to make things MUCH easier.
agreed!
I'd actually decided this syntax was pretty problematic about 6 months ago working on the SCAT project. However, since SCAT is a rip-off of X-CAT (IBM's unsupported, not open source, not a product cluster management software (that sucks)) I decided to punt on that one and just use the syntax you rightly note is problematic (and swear to myself I'd never name a host rtr-f01-001). I've gotten so used to this syntax now that .... well... bad habits are hard to break I guess :)
When we come up with a good syntax for this, I can pretty much guarantee I'm going to steal it for SCAT :)
Jim _______________________________________________ noctool-devel mailing list noctool-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/noctool-devel
I LIKE BEING ABLE TO POINT THE BAZOOKA AT MY FOOT! I'M WORKING ON A DOUBLE-BARRELLED BAZOOKA SO I CAN POINT IT AT *BOTH* FEET!! ;)
:)
I've spent years, on and off, worrying about system monitoring. Years ago I made available a giant Python system called Mom (v3) that no one but me could use. The publish-subscribe mechanism required learning a tiny matching language to use.
I had three successes with Mom.v3 which I really think deserve consideration in any future monitoring systems. My first minor success of sorts was that, due to my desire to produce statistical models of a bunch of system measures, I have years and years of collected data to test new algorithms on. RRD or similar graphs are a loss for anything except visualization. My second unalloyed success was producing an algorithm that told me when my users were filling up a disk partition *before* thresholds were crossed. [1]
The final huge win of Mom.v3 was that difficult publish-subscribe engine. The landscape is full of single-purpose monitoring tools that don't interact very well outside of their own little world. If, instead, you center a monitoring system around a communications protocol (a not too difficult one, hopefully) then you can plug in whatever you want. Full-contact, improvisational bazooka juggling can then be indulged in without necessarily endangering the stability of the main system. An example might be useful.
The Life Cycle of a Disk Sample in Mom.v3
Some dumb data collecting agent running locally runs 'df' and yanks out the relevant numbers. The data is packaged up into a network Message format (a collection of property-value pairs, including the agent type, the data points, a timestamp, etc.) and is passed off to a forwarder agent running on the same host. That agent is the only program running locally that knows enough to speak to the central publish-subscribe system, called the Kiosk. If the forwarder cannot talk to the Kiosk it caches messages. Otherwise it opens an authenticated and encrypted socket to the Kiosk.
Once the message is accepted by the Kiosk and a receipt goes back to the forwarder the new message is shoved into a processing queue. Now the entire set of subscription rules is run against the new message. A simple subscription rule might look like this:
agent == 'disk'
a trickier one:
DEFINED class AND class == 'security' AND DEFINED message
These property-checking subscriptions are paired with data sinks I called 'transports' in Mom.v3. In the case of this disk agent, we have two transports attached to the subscription. The first shunts the disk use sample into a log file for latter grovelling over. The second sends the sample into the diskwatcher transport, which keeps enough disk samples around to run the impending disk doom algorithm. That transport then *adds another message to the queue* with the analysis. You might attach this subscription to a transport that sends out email or a page:
agent == 'diskwatcher' AND class == 'notification' AND degree in 3 4
So, if the disk appears to be filling up fast, you'll hear about it.
Now, in Mom.v3 rather too much of this message cascade happened in the same process. If an analysis transport went bad it could muck up the entire Kiosk. Fortunately Python has enough introspective abilities that I could deactivate really badly behaving transports, but this isn't ideal. This publish-subscribe message routing really was just incredibly powerful - I had correlation engines, time series models, logs, database sinks, etc. A single data sample message could result in a half-dozen message being reinjected into the system. But because so much ran in the same process there were certain things I couldn't try out live. So my current focus - when I have time to code on this - is to generalize a monitoring protocol that'll let me plug in some experimental analysis engine without endangering the other parts that are working correctly. It would also permit different ways of accessing the data so we're not forced to pretend system monitoring maps well to web pages.
I'm thinking aloud here. A few weeks before Ingvar announced his common-lisp.net project and this mailing list I was thinking of contacting him and Chun Tian (the author of cl-net-snmp) to see if they thought we should create a "Common Lisp and Monitoring" mailing list to discuss our separate projects, to share what works and what doesn't.
-- wm
[1] http://www.biostat.wisc.edu/~annis/granny/notes/impending-doom.html - lisp code available if anyone is curious
Jim writes:
One more thing here... I think we should *guarantee* that the config file is Lisp. That way I could create my own monitor classes, methods, etc. (even replace internal noctool methods :) inside my config file without ever having to touch noctool sources. This is pretty much what I do with the LoGS config file - I think it is an especially powerful concept.
At the end of the day, they're loaded by CL:LOAD, it'd be a Right Hassle<tm> to change that. It's just that the package they're loaded in is not very well connected to the rest of the CL world.
If the noctool developers weren't on top of having sets of machines, I, the user, sure would want to be able to use LOOP or some such thing. Its also awefully handy if there's some obnoxious-ass user that wants a frob object and the entire rest of the user community thinks its a bad idea - he can just implement it himself, hopefully without having to maintain his own fork of the codebase. :)
One of my general thoughts is that there should be some sort of "internals" doicument, with "this is a stable internal interface, please use for extensions" and a "this is the rest of the internals" documents. Ideally, I want to be able to programmatically generate config files, though. Not necessarily in an identical maner, but at least one taht ends up with equipment and monitor objects that have the same effect at run-time.
At least at the moment, there is working code that serialises the equipment and monitors to disk, so that you can reload them in a new image. They're not called, from the rest of the code, but they have been tested. No config serialisation as of yet.
That reminds me, there should be a check in the config reader to NOT instantiate equipment that shares a class and name with pre-existing equipment, but (ideally) intelligently merges the monitors.
I LIKE BEING ABLE TO POINT THE BAZOOKA AT MY FOOT! I'M WORKING ON A DOUBLE-BARRELLED BAZOOKA SO I CAN POINT IT AT *BOTH* FEET!! ;)
(i just work dilligently and pray (a lot) i don't pull the trigger at the wrong time)
That's where the "supported" and "unsupported" internal API docs come in. Anything that's been classed as "stable" should persist at least through the next publicly released version. Anything marked "unstable" is fair game. If you extend and break things, you get to keep ALL the parts!
I get offended by tools that *tell me what I want to do*. Maybe you shouldn't be able to use dc as a word processor... Then again, why not? (if you're sick and twisted enough to put in the work, I could imagine a calculator you could turn into a word processor... look at Mathematica for pete's sake! I think thats actually precisely what happened with Mathematica - and its a pretty cool tool because that sort of thing is possible.). No, I'm not a Mathematica salesman... or even a user, really :P :)
I'll get off my soap box now :)
Jim p.s. I think I want to use noctool as a word processor - Can you help me with that Ingvar? ;P I've got this PDF file.... ;)
Sure, start with a document class (goes in classes.lisp), then get "bold", "paragraph", "normal-weight", "image" and "page" classes, parse the document into these and...
//Ingvar
Jim writes:
One more thing here... I think we should *guarantee* that the config file is Lisp. That way I could create my own monitor classes, methods, etc. (even replace internal noctool methods :) inside my config file without ever having to touch noctool sources. This is pretty much what I do with the LoGS config file - I think it is an especially powerful concept.
At the end of the day, they're loaded by CL:LOAD, it'd be a Right Hassle<tm> to change that. It's just that the package they're loaded in is not very well connected to the rest of the CL world.
I think thats completely fair. If you must use LOOP, CL:LOOP works.
If the noctool developers weren't on top of having sets of machines, I, the user, sure would want to be able to use LOOP or some such thing. Its also awefully handy if there's some obnoxious-ass user that wants a frob object and the entire rest of the user community thinks its a bad idea - he can just implement it himself, hopefully without having to maintain his own fork of the codebase. :)
One of my general thoughts is that there should be some sort of "internals" doicument, with "this is a stable internal interface, please use for extensions" and a "this is the rest of the internals" documents. Ideally, I want to be able to programmatically generate config files, though. Not necessarily in an identical maner, but at least one taht ends up with equipment and monitor objects that have the same effect at run-time.
At least at the moment, there is working code that serialises the equipment and monitors to disk, so that you can reload them in a new image. They're not called, from the rest of the code, but they have been tested. No config serialisation as of yet.
That reminds me, there should be a check in the config reader to NOT instantiate equipment that shares a class and name with pre-existing equipment, but (ideally) intelligently merges the monitors.
I LIKE BEING ABLE TO POINT THE BAZOOKA AT MY FOOT! I'M WORKING ON A DOUBLE-BARRELLED BAZOOKA SO I CAN POINT IT AT *BOTH* FEET!! ;)
(i just work dilligently and pray (a lot) i don't pull the trigger at the wrong time)
That's where the "supported" and "unsupported" internal API docs come in. Anything that's been classed as "stable" should persist at least through the next publicly released version. Anything marked "unstable" is fair game. If you extend and break things, you get to keep ALL the parts!
:) here here!
I get offended by tools that *tell me what I want to do*. Maybe you shouldn't be able to use dc as a word processor... Then again, why not? (if you're sick and twisted enough to put in the work, I could imagine a calculator you could turn into a word processor... look at Mathematica for pete's sake! I think thats actually precisely what happened with Mathematica - and its a pretty cool tool because that sort of thing is possible.). No, I'm not a Mathematica salesman... or even a user, really :P :)
I'll get off my soap box now :)
Jim p.s. I think I want to use noctool as a word processor - Can you help me with that Ingvar? ;P I've got this PDF file.... ;)
Sure, start with a document class (goes in classes.lisp), then get "bold", "paragraph", "normal-weight", "image" and "page" classes, parse the document into these and...
ROTFLMAO! I'm sure glad extensability is a goal ;)
Jim
I wrotes:
Jim writes:
I was thinking about what a bear it will be to have to specify each of my compute node systems in my clusters individually.
I was thinking about syntax such as the following that would create an equipment object for each of 256 hosts named l001 through l256.
(machine-cluster "LosLobos" linux-host (user "download") (disks (disk "/dev/sda1" 80 95)) (machinerange "l001-l256"))
It is definitely a cool idea. I am not ENTIRELY sure what the best syntax for the name range would be, but it's definitely something that would be handy for many users.
Say... If (and that may be a big "if") we can make sure that all 'top-level' config macros have name as the first parameter...
Or, wait, even better!
Does this look at least vaguely sane? (cluster ("rtr-~3,'0d" 1 10) (machine name linux-host (user "testuser")))
NOCTOOL> (noctool-config:load "test-files/cluster.cfg") NIL NOCTOOL> *equipment* (#<LINUX-HOST {AB41929}> #<LINUX-HOST {AC64A31}> #<LINUX-HOST {ACB7291}> #<LINUX-HOST {ACC26D9}> #<LINUX-HOST {AD35B39}> #<LINUX-HOST {AD40FA1}> #<LINUX-HOST {AD4C3E9}> #<LINUX-HOST {AD57879}> #<LINUX-HOST {AD62CD9}> #<LINUX-HOST {AD6E139}>) NOCTOOL> (mapcar 'name *equipment*) ("rtr-001" "rtr-002" "rtr-003" "rtr-004" "rtr-005" "rtr-006" "rtr-007" "rtr-008" "rtr-009" "rtr-010") NOCTOOL>
That has just been tested, though not checked in. Mainly because I suspect having C-style format strings as default is user-friendlier than having to use CL format strings (yep "~3,'0d" is more-or-less the same as "%03d", if we OTOH allow other format wossnames, it gets trickier). If it looks like something to work from, I'll check it in later today.
(defmacro cluster ((fmt low high &optional (name nil) (c-fmt nil)) form) (let ((format-string fmt) ;;; At some point, allow C-style format string (name (or name (get-config-symbol "NAME")))) `(progn ,@(loop for n from low to high for realname = (format nil format-string n) collect (substitute realname name form)))))
This way, we allow clustering of any top-level config form, we aren't restricted in the actual parameter (it defaults to substitute "name" but if you prefer, you can replace "blahonga" or "supercallifragilistic" with each generated name).
//Ingvar
Say... If (and that may be a big "if") we can make sure that all 'top-level' config macros have name as the first parameter...
Or, wait, even better!
Does this look at least vaguely sane? (cluster ("rtr-~3,'0d" 1 10) (machine name linux-host (user "testuser")))
I think that looks mostly sane. :)
One thing that bothers me a little is the use of the optional NAME parameter. It reminds me a little *too much* of Paul Graham's AIF macro (which I love, BTW, but unhiegenic macros can be a bit of a pain). As opposed to AIF, I do like that I can call the symbol whatever I want. :)
I can certainly get used to it, so don't count this as an objection. :)
Are there any packages out there for C-style format strings with Lisp? Personally, I'm very happy with the Lisp-style format strings, but I (unfortunately) have to agree with Ingvar that "end user" types will likely know more about C than Lisp. :P
Should we consider an alternate configuration syntax for Lusers? I guess if it were me, I'd make some macros that expand to the noctool-config macros. Something that might be more familular to someone who has configured Nagios or some such simular tool. IMO, this syntax should feel a lot like Perl (shudder!!!). This would probably add a boatload of maintenance, but might make the "entry fee" a little lower (learning a little Lisp + learning Noctool is harder than using a familular syntax while learning Noctool is basically my thought). It feels a tad preliminary to be thinking about this sort of thing - it is another thing that I've struggled with with LoGS (my potential users don't want to learn (*any*) Lisp!!!)
This way, we allow clustering of any top-level config form, we aren't restricted in the actual parameter (it defaults to substitute "name" but if you prefer, you can replace "blahonga" or "supercallifragilistic" with each generated name).
That sounds great!
Jim
James E. Prewett Jim@Prewett.org download@hpc.unm.edu Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ Designated Security Officer OpenPGP key: pub 1024D/31816D93 HPC Systems Engineer III UNM HPC 505.277.8210
I'm still thinking about this one...
Or, wait, even better!
Does this look at least vaguely sane? (cluster ("rtr-~3,'0d" 1 10) (machine name linux-host (user "testuser")))
Using your example from earlier, how would we specify something like "rtr-f01-001" through "rtr-f03-999". This is *different* than your objection to my initial syntax - what in the heck would a format string look like if this were actually trying to specify 3000 or so routers... (* OK, imagine we're either not FORMAT ninjas or we're using something that doesn't make julienne fries in addition to printing, like C's printf ;) I UNDERSTAND IT IS POSSIBLE TO CURE CANCER WITH A SINGLE INVOCATION OF FORMAT ;) *)
This seems like a better place to screw things up royally than I had initially anticipated :P :) Its going to be difficult to get this right I'm thinking...
*To get something working*, I'm thinking we should just accept a format string something like Ingvar's suggestion quoted above. There are, hopefully, more interesting problems to solve. :)
Jim
Hi all,
I came up with a straw-man to shoot some holes in... I came up with a function, RANGE that is super easy to use to produce a set of machine names - Like I said earlier, I'm trying to figure out how to make this syntax workable.
;; trust me, this makes *some* sense... (defun range (start end fmts) (loop for fmt in fmts nconc (loop for i from start to end collect (format NIL fmt i))))
;; now, we can call range for a simple cluster
(cl-user::range 1 3 '("small~2,'0d"))
("small01" "small02" "small03")
;; we can also call it recursively to make more complex numbering schemes NOCTOOL> (cl-user::range 0 9 (cl-user::range 1 2 '("rtr-~2,'0d-~~3,'0d"))) ("rtr-01-000" "rtr-01-001" "rtr-01-002" "rtr-01-003" "rtr-01-004" "rtr-01-005" "rtr-01-006" "rtr-01-007" "rtr-01-008" "rtr-01-009" "rtr-02-000" "rtr-02-001" "rtr-02-002" "rtr-02-003" "rtr-02-004" "rtr-02-005" "rtr-02-006" "rtr-02-007" "rtr-02-008" "rtr-02-009")
The inner range produces a list of format strings for the outer if thats not clear (which is why RANGE takes a list of format strings).
I realize this is pretty much ugly as sin... BUT, to me it seems a *mostly* reasonable way to specify an extremely large/complex network - which shouldn't have to be a 1 line config file written by a jr admin ;) Its OK to need a *little* fu if you're monitoring 3,000 routers ;) I *NEED* for this to be doable in a halfway reasonable way - all I care about is large compute clusters and, to a lesser extent, the networks that surround them: I want to cater to *my* ("supercomputing") community. We're all here for selfish reasons, right? :)
Jim
James E. Prewett Jim@Prewett.org download@hpc.unm.edu Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ Designated Security Officer OpenPGP key: pub 1024D/31816D93 HPC Systems Engineer III UNM HPC 505.277.8210
On Wed, 28 May 2008, Jim Prewett wrote:
I'm still thinking about this one...
Or, wait, even better!
Does this look at least vaguely sane? (cluster ("rtr-~3,'0d" 1 10) (machine name linux-host (user "testuser")))
Using your example from earlier, how would we specify something like "rtr-f01-001" through "rtr-f03-999". This is *different* than your objection to my initial syntax - what in the heck would a format string look like if this were actually trying to specify 3000 or so routers... (* OK, imagine we're either not FORMAT ninjas or we're using something that doesn't make julienne fries in addition to printing, like C's printf ;) I UNDERSTAND IT IS POSSIBLE TO CURE CANCER WITH A SINGLE INVOCATION OF FORMAT ;) *)
This seems like a better place to screw things up royally than I had initially anticipated :P :) Its going to be difficult to get this right I'm thinking...
*To get something working*, I'm thinking we should just accept a format string something like Ingvar's suggestion quoted above. There are, hopefully, more interesting problems to solve. :)
Jim _______________________________________________ noctool-devel mailing list noctool-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/noctool-devel
Jim writes:
I'm still thinking about this one...
Or, wait, even better!
Does this look at least vaguely sane? (cluster ("rtr-~3,'0d" 1 10) (machine name linux-host (user "testuser")))
Using your example from earlier, how would we specify something like "rtr-f01-001" through "rtr-f03-999". This is *different* than your objection to my initial syntax - what in the heck would a format string look like if this were actually trying to specify 3000 or so routers... (* OK, imagine we're either not FORMAT ninjas or we're using something that doesn't make julienne fries in addition to printing, like C's printf ;) I UNDERSTAND IT IS POSSIBLE TO CURE CANCER WITH A SINGLE INVOCATION OF FORMAT ;) *)
At the moment? A bit tricky, methinks. But, just ABOUT doable, I think: (cluster ("rtr-f~2,'0d-~~3,'0d" 1 3 fmtstr) (cluster (fmtstr 1 999) (machine name linux-host (user "testuser"))))
If/when I finish a simplistic converter from C-style format strings, it'd be slightly easier, as you can have one as "C-style" and one as "lisp-style" and not need the double-escaping.
This seems like a better place to screw things up royally than I had initially anticipated :P :) Its going to be difficult to get this right I'm thinking...
*To get something working*, I'm thinking we should just accept a format string something like Ingvar's suggestion quoted above. There are, hopefully, more interesting problems to solve. :)
Mmmm. I actually left the (cluster ...) as is, just so having "cluster-in-cluster" would be an option (most of the time, I don't think it'll be needed but I suspect we'll have to write some config documentation, one of these days, and having the "cluster" bit in the cookbook might well be a Good Thing).
//Ingvar
Seriously, How do tools like Nagios and friends do this? I'll admit, I'm a bit out of the loop here. Most of what we do A) isn't done by me and B) is in-house stuff.
Is there any sort of precedent that anyone has seen here? I love to steal other people's hard work ;)
Jim
James E. Prewett Jim@Prewett.org download@hpc.unm.edu Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ Designated Security Officer OpenPGP key: pub 1024D/31816D93 HPC Systems Engineer III UNM HPC 505.277.8210
On Thu, 29 May 2008, Jim Prewett wrote:
Hi Ingvar,
If/when I finish a simplistic converter from C-style format strings, it'd be slightly easier, as you can have one as "C-style" and one as "lisp-style" and not need the double-escaping.
I'm gonna go ahead and smack you for even saying this ;P ;) :)
Jim _______________________________________________ noctool-devel mailing list noctool-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/noctool-devel
Seriously, How do tools like Nagios and friends do this? I'll admit, I'm a bit out of the loop here. Most of what we do A) isn't done by me and B) is in-house stuff.
As far as I can tell, "not at all". You want two mostly-identical objects monitored, you write two mostly-identical config stanzas. You want 4000, you write a piece of code to write the configuration for you.
With the "Big Kit", you configure one instance, see what you want different from the defaults, tweak the defaults, then click "auto-discover".
Is there any sort of precedent that anyone has seen here? I love to steal other people's hard work ;)
For now, I suspect the cluster config that noctool has is "good enough" (I pushed the changes in this morning, with C-style format strings as default, the converter understands %%, %d and %0wd (where w is a decimal number) escapes, lisp-style format strings are available with a flag.
//Ingvar
Hi Ingvar,
I was chatting with one of my co-workers - mostly trying to avoid anything too strenous on a friday... ;)
He brought up that we would really like some sort of "compact configuration syntax" for systems like we have in our 'viz lab'. The viz lab has boxes, named by the crazy Director*, with names like "taro", "poi", "vino", "macaroni", "dumpling", (and it just goes on ;) .
* I say that with all the love for my boss's boss. :) He can name machines whatever in the hell he wants. ;)
I just hate typing ;)
Jim
James E. Prewett Jim@Prewett.org download@hpc.unm.edu Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ Designated Security Officer OpenPGP key: pub 1024D/31816D93 HPC Systems Engineer III UNM HPC 505.277.8210
On Thu, 29 May 2008, Ingvar wrote:
Seriously, How do tools like Nagios and friends do this? I'll admit, I'm a bit out of the loop here. Most of what we do A) isn't done by me and B) is in-house stuff.
As far as I can tell, "not at all". You want two mostly-identical objects monitored, you write two mostly-identical config stanzas. You want 4000, you write a piece of code to write the configuration for you.
With the "Big Kit", you configure one instance, see what you want different from the defaults, tweak the defaults, then click "auto-discover".
Is there any sort of precedent that anyone has seen here? I love to steal other people's hard work ;)
For now, I suspect the cluster config that noctool has is "good enough" (I pushed the changes in this morning, with C-style format strings as default, the converter understands %%, %d and %0wd (where w is a decimal number) escapes, lisp-style format strings are available with a flag.
//Ingvar
noctool-devel mailing list noctool-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/noctool-devel
He brought up that we would really like some sort of "compact configuration syntax" for systems like we have in our 'viz lab'. The viz lab has boxes, named by the crazy Director*, with names like "taro", "poi", "vino", "macaroni", "dumpling", (and it just goes on ;) .
It seems like computers are actually a good match for certain OO concepts. Create a bunch of profile mixins (OS, machine's main purpose, etc) and the slam a bunch of machines into appropriate mixes.
(define-machine-class solaris (df-command "/bin/df -kl") ...)
(define-machine-class condor-host (ensure-process "condor_master") (ensure-pingable) ...)
(define-monitor-group (solaris condor-host) "dedicated solaris condor hosts" (hosts "hegel" "kant" "schopenhauer" "wittgenstein"))
Certain kinds of machines of course will always have to be tweaked by hand, but where you're lucky enough to have groups of basically identical machines this sort of thing is a big win. One group on campus I know of have 1000-host clusters doing a single job on basically identical hardware. The only thing you should have to repeat is the name/address.
-- wm
William (Bill?) writes:
Jim writes:
He brought up that we would really like some sort of "compact configuration syntax" for systems like we have in our 'viz lab'. The viz lab has boxes, named by the crazy Director*, with names like "taro", "poi", "vino", "macaroni", "dumpling", (and it just goes on ;) .
It seems like computers are actually a good match for certain
OO concepts. Create a bunch of profile mixins (OS, machine's main purpose, etc) and the slam a bunch of machines into appropriate mixes.
Subclass equipment (or suitable existing subclass), then stick yer default monitors in...
Yes, it shoudl, probably, be exposed in SOME way in the config interface, but I am loathe to do this until I know what "the right way" is. Convince me with working check-ins...
//Ingvar (just was on a stag-do on the dog track next doors...)
Does this look at least vaguely sane? (cluster ("rtr-~3,'0d" 1 10) (machine name linux-host (user "testuser")))
I've been thinking more about this. It seems to me that we need something a little more general that what you're proposing. One example is that I probably want to have an IP range as well.
I'm thinking syntax like this might work:
(my-cluster 12 ;; system count ((name (trucha-name-func)) ;; variable elements (ip (trucha-ip-func))) (machine ;; machine template name linux-host (user "download") (ip ip) (disks (disk "/dev/hda1" 80 95))))
where trucha-name-func and trucha-ip-func basically are generators, each time they are called, they produce a new result.
With this particular cluster, we have an interesting problem. The IP space is 129.24.244.21 - 129.24.244.30 and 129.24.244.43 - 129.24.244.44. There are a total of 12 IPs, but they are not in a contiguous block. (this is a legacy issue :P )
I realize that the above syntax still leaves something to be desired. It is, IMO, too general for what most users need. I think we could probably make a couple of convienance macros that would expand into the above.
I'm not sure yet if I like the idea of using "generators" here, or if each function should be given an argument of which machine it is.
Jim
Jim writes:
Does this look at least vaguely sane? (cluster ("rtr-~3,'0d" 1 10) (machine name linux-host (user "testuser")))
I've been thinking more about this. It seems to me that we need something a little more general that what you're proposing. One example is that I probably want to have an IP range as well.
I'm thinking syntax like this might work:
(my-cluster 12 ;; system count ((name (trucha-name-func)) ;; variable elements (ip (trucha-ip-func))) (machine ;; machine template name linux-host (user "download") (ip ip) (disks (disk "/dev/hda1" 80 95))))
where trucha-name-func and trucha-ip-func basically are generators, each time they are called, they produce a new result.
With this particular cluster, we have an interesting problem. The IP space is 129.24.244.21 - 129.24.244.30 and 129.24.244.43 - 129.24.244.44. There are a total of 12 IPs, but they are not in a contiguous block. (this is a legacy issue :P )
I realize that the above syntax still leaves something to be desired. It is, IMO, too general for what most users need. I think we could probably make a couple of convienance macros that would expand into the above.
I'm not sure yet if I like the idea of using "generators" here, or if each function should be given an argument of which machine it is.
Something that MIGHT work is having a "template mechanism" of some sort. The cluster config should do what you need, as long as there's DNS (at least I think so, the general thought was "use hostname, unless IP is provided").
Another option would be to declare a cluster-host class, with suitable defaults for disks and the like, but that MIGHT get hairier.
From the feedback I've seen, most other systems deal wit htis by having the
admin generate teh config in full, one way or another (either by clever editor tricks or by writing some code to generate config stanzas).
//Ingvar
From: Jim Prewett download@hpc.unm.edu
I realize that the above syntax still leaves something to be desired. It is, IMO, too general for what most users need. I think we could probably make a couple of convienance macros that would expand into the above.
I'm not sure yet if I like the idea of using "generators" here, or if each function should be given an argument of which machine it is.
IP ranges are easy enough to generate and loop over:
(require :split-sequence)
(defun addr4->int (ipaddr) (destructuring-bind (a1 a2 a3 a4) (mapcar #'parse-integer (split-sequence:split-sequence #. ipaddr)) (dpb a1 (byte 8 24) (dpb a2 (byte 8 16) (dpb a3 (byte 8 8) a4)))))
(defun int->addr4 (intaddr) (format nil "~D.~D.~D.~D" (ldb (byte 8 24) intaddr) (ldb (byte 8 16) intaddr) (ldb (byte 8 8) intaddr) (ldb (byte 8 0) intaddr)))
(defmacro do-ip-range ((var start-ip end-ip &optional return) &body body) (let ((start (gensym "start")) (end (gensym "end")) (a (gensym))) `(let ((,start (addr4->int ,start-ip)) (,end (addr4->int ,end-ip))) (loop for ,a from ,start to ,end do (let ((,var (int->addr4 ,a))) ,@body) finally (return ,return)))))
(do-ip-range (addr "128.104.206.15" "128.104.206.33" 'woohoo) (print addr))
Seems like this might be useful in several places (network discovery, say).
-- wm, trying to decide if DO-CIDR-RANGE is in order...