On 7/27/05, Faré fahree@gmail.com wrote:
OK. So the question is: what happens if you're linked to a process on another node, and that node fails? You get an EXIT message, don't you?
If the communication with a monitored node fails, but that node is still running its processes, then you get at least a {nodedown, <node>} message. I don't know if you get EXIT messages for the linked processes on that node: on the one hand, the process has not exited yet; on the other hand, if the process exists later on, the EXIT message can not be sent at that time -- so I could imagine either case.
It's important to know that a process needs to explicitly state that it wants to receive EXIT signals from linked processes, using "process_flag(trap_exit, true)" in the process that should receive the EXIT signals. If a process P has not set this flag, an exiting linked process P2 will also cause the process P to exit (unless the exit reason of the exiting linked process is "normal"). By setting the flag, process P override this default cascading effect and instead of exiting P will get the EXIT message.
Or do you just sit idly until timeout, and need to have another process explicitly monitor the remote node and kill you? I guess that in Lisp, these details are handled with a proper meta-level protocol. Is there such a protocol in Erlang?
I think the behaviour of a single process regarding failures etc is not very variable, so instead of having a meta-protocol at the level of individual processes, there are protocols of how sets of processes behave in an application, in terms of "supervision trees". This deals with automatically restarting failed processes or subsets of supervised processes, and also a cascading restarting into higher application levels when an error persists.
Individual processes that implement a "standard" entity like a server or finite-state-machine can save much code by being implemented using a "behaviour". A "supervisor" is another such "standard" entity.
Other than the gen_server and gen_fsm behaviours, I have not used these pretty advanced features, so I don't know more about them than what is documented in the document below.
"OTP Design Principles" http://erlang.se/doc/doc-5.4.8/doc/design_principles/part_frame.html.
The information there can be quite overwhelming. I found the following tutorial very useful for understanding the value and usage of the "gen_server behaviour":
http://www.duomark.com/erlang/tutorials/proxy.html
- Willem