I think within Erlang nodes use a keep alive message to make sure they have not been split off. Quite often, though, you cannot detect if the foreign process exists. If it's a local process id and you send it a message and it's dead then you will get a no_proc exception, not so with a foreign process.
It does seem to me that the easiest way to verify if a node is alive is to send a keep-alive message to a housekeeping process on that node and consider it split off if a reply is not received after a timeout.
I'm not sure if it matters if a node is dead or just split off.
On Jul 27, 2005, at 2:11 AM, Eric Lavigne wrote:
This leads me to wonder how they do a reliable detection of a remote node being dead, as opposed to the communication channel being down -- and how they cope with a mistake between the two. Surely the point is tackled somewhere in some Erlang documentation...
Best I can think of is for the node itself to be represented as a process whose only job is communication with the outside world. Since the user won't control this process directly, it should be possible to make it fairly durable so that we can assume it is alive. *crosses fingers*