On Fri, Apr 3, 2020 at 1:37 PM Martin Simmons <martin@lispworks.com> wrote:
>>>>> On Thu, 2 Apr 2020 14:16:55 -0400, Mirko Vukovic said:
>
> On Thu, Apr 2, 2020 at 9:09 AM Martin Simmons <martin@lispworks.com> wrote:
>
> > >>>>> On Thu, 2 Apr 2020 08:33:42 -0400, Mirko Vukovic said:
> > >
> > > On Thu, Apr 2, 2020 at 7:55 AM Martin Simmons <martin@lispworks.com>
> > wrote:
> > >
> > > > >>>>> On Wed, 1 Apr 2020 08:59:30 -0400, Mirko Vukovic said:
> > > > >
> > > > > On Tue, Mar 31, 2020 at 11:51 AM Martin Simmons <
> > martin@lispworks.com>
> > > > > wrote:
> > > > >
> > > > > > >>>>> On Mon, 30 Mar 2020 21:16:00 -0400, Mirko Vukovic said:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > My setup is Sly on Spacemacs with Windows 10 running remote lisp
> > on
> > > > Linux
> > > > > > > over a corporate network. I have not found a Sly mailing list,
> > and I
> > > > > > hope I
> > > > > > > can get an answer here.
> > > > > > >
> > > > > > > Emacs is running Sly on Spacemacs on Windows 10. Lisp is running
> > on a
> > > > > > Linux
> > > > > > > server. But Sly does not connect to the listening Lisp. Corporate
> > > > network
> > > > > > > security policies have changed. I can ask for IT to accommodate
> > me,
> > > > but
> > > > > > > first I need to know what to ask for.
> > > > > > >
> > > > > > > So far, I have opened a tunnel, and started a listening lisp
> > (details
> > > > > > > below).
> > > > > > >
> > > > > > > In Emacs I get:
> > > > > > >
> > > > > > > sly-connect RET RET RET
> > > > > > > [sly] Connecting to Slynk on port 4005..
> > > > > > > helm-M-x-execute-command: make client process failed: Connection
> > > > timed
> > > > > > out,
> > > > > > > :name, sly-9, :buffer, nil, :host, hal9000, :service, 4005,
> > :nowait,
> > > > nil,
> > > > > > > :tls-parameters, nil
> > > > > > >
> > > > > > > The session transcript:
> > > > > > > > ssh -L4005:localhost:4005 mirko@hal9000
> > > > > > >
> > > > > > > [mirko@hal9000 .roswell]$ ros -L ccl-bin run --load
> > > > > > start-slynk-server.lisp
> > > > > > >
> > > > > > >  Added SLYNK path to ASDF:*CENTRAL-REGISTRY*
> > > > > > > SLYNK's ASDF loader finished.
> > > > > > >  Loaded ASDF system
> > > > > > > ;; Slynk started at port: 4005.
> > > > > > >
> > > > > > >  Created SLYNK server on port 4005
> > > > > > >  Set *USE-DEDICATED-OUTPUT-STREAM* to NIL
> > > > > > > Clozure Common Lisp Version 1.11.5/v1.11.5  (LinuxX8664)
> > > > > > >
> > > > > > > For more information about CCL, please see
> > http://ccl.clozure.com.
> > > > > > >
> > > > > > > CCL is free software.  It is distributed under the terms of the
> > > > Apache
> > > > > > > Licence, Version 2.0.
> > > > > > > ?
> > > > > > >
> > > > > > > My question is as follows:
> > > > > > >
> > > > > > >    1. Do I need bi-directional traffic on 4005?
> > > > > >
> > > > > > Assuming you are using the ssh tunnel above, then you don't need
> > port
> > > > 4005
> > > > > > traffic on the LAN (it is all hidden in the tunnel).
> > > > > >
> > > > > > The most likely problem is that some firewall on the Windows
> > machine is
> > > > > > blocking port 4005.  You may need to configure that firewall to
> > allow
> > > > ssh
> > > > > > to
> > > > > > listen on localhost:4005 and/or to accept connections to it from
> > > > Spacemacs.
> > > > > > In theory you might have similar localhost firewall issues on
> > hal9000,
> > > > but
> > > > > > that is less likely.
> > > > > >
> > > > > >
> > > > > > >    2. Do I need bi-directional traffic on 22? (after recent
> > changes I
> > > > > > >    cannot ssh or scp into my Windows machine)
> > > > > >
> > > > > > I'm assuming that you ran the ssh command on the Windows 10 machine
> > > > and it
> > > > > > gave you a working login to hal9000.  If so, then it looks like you
> > > > already
> > > > > > have what you need for port 22.
> > > > > >
> > > > >
> > > > > Yes, I can log in to hal9000 with the -L switch:
> > > > >
> > > > > > ssh -L4005:localhost:4005 mirko@hal9000
> > > > > Last login: Thu Mar 19 14:33:17 2020 from 172.27.236.189
> > > > > [mirko@hal9000 ~]$
> > > > >
> > > > >
> > > > > >
> > > > > > Note that bi-directional traffic on a connected socket is different
> > > > from
> > > > > > whether you can make a connection in both directions.
> > > > > >
> > > > > >
> > > > > > >    3. What tools can I use to try to narrow down the cause of the
> > > > > > problem?
> > > > > > >    For instance, can I send a command to the lisp image, and see
> > its
> > > > > > effects
> > > > > > >    on the lisp side?
> > > > > >
> > > > > > Firstly, run "netstat -antp" on hal9000 to see if Lisp is
> > listening on
> > > > port
> > > > > > 4005.
> > > > > >
> > > > >
> > > > > It looks that ccl-bin is listening:
> > > > > $ sudo netstat -antp | grep :4005
> > > > > tcp        0      0 127.0.0.1:4005          0.0.0.0:*
> > > >  LISTEN
> > > > >      104461/lx86cl64
> > > > >
> > > > >
> > > > > >
> > > > > > Secondly, run "netstat -anop tcp" on the Windows 10 machine to see
> > if
> > > > ssh
> > > > > > is
> > > > > > listening on port 4005.
> > > > > >
> > > > > >
> > > > > I have Msys2's netstat. On the laptop:
> > > > > > which netstat
> > > > > /c/WINDOWS/system32/netstat
> > > > > /c/Users/mirko/Downloads
> > > > > > netstat -anop tcp | grep :4005
> > > > >   TCP    127.0.0.1:4005         0.0.0.0:0              LISTENING
> > > >  12052
> > > >
> > > > Yes, both netstat outputs look good at that point.
> > > >
> > > >
> > > > > > Thirdly, run "ssh -p 4005 localhost" on the Windows 10 machine.
> > This
> > > > use a
> > > > > > ssh is very bogus, but it should at least give an error message
> > with
> > > > some
> > > > > > diagnostics.  (Normally I would use telnet for this, but it is not
> > > > > > installed
> > > > > > on Windows 10 by default.)
> > > > > >
> > > > >
> > > > > Outputs of both ssh and telnet on the laptop:
> > > > > > which telnet
> > > > > /usr/bin/telnet
> > > > > /c/Users/mirko/Downloads
> > > > > > telnet localhost 4005
> > > > > Trying ::1...
> > > > > Connected to localhost.
> > > > > Escape character is '^]'.
> > > > > Connection closed by foreign host.
> > > >
> > > > OK, so it is connected to the Windows side at least.
> > > >
> > > > Check that the Slynk server was created with :dont-close t (or set
> > > > slynk:*dont-close* to t before creating it).  If dont-close is nil, it
> > will
> > > > only accept one connection, which makes debugging difficult.
> > > >
> > > > Then restart the log in to hal9000 with -v option to ssh to make it
> > print
> > > > debug
> > > > information:
> > > >
> > > > ssh -v -L4005:localhost:4005 mirko@hal9000
> > > >
> > > > and try the telnet again to see what is happening at the Linux end.
> > > >
> > > > __Martin
> > > >
> > > Here is the test log. Telnet and ssh debug are at the bottom.
> > > 1 Start slynk with :dont-close t
> > >
> > > Modified startup script:
> > >
> > > (let ((port 4005))
> > >     (slynk:create-server :port port :dont-close t)
> > >     (format t "~% Created SLYNK server on port ~a" port))
> > > (setf slynk:*use-dedicated-output-stream* nil)
> > >
> > > 2 Started tunnel with verbose option, -v switch
> > >
> > > $ ssh -v -L4005:hal9000:4005 mirko@hal9000
> > >
> > > 3 Telnet on laptop side to laptop port 4005
> > >
> > > @laptop> telnet localhost 4005
> > > Trying ::1...
> > > Connected to localhost.
> > > Escape character is '^]'.
> > > Connection closed by foreign host.
> > >
> > > 4 SSH debug output
> > >
> > > @hal9000> debug1: Connection to port 4005 forwarding to hal9000 port
> > > 4005 requested.
> > > debug1: channel 3: new [direct-tcpip]
> > > channel 3: open failed: connect failed: Connection refused
> > > debug1: channel 3: free: direct-tcpip: listening port 4005 for hal9000
> > > port 4005, connect from ::1 port 64100 to ::1 port 4005, nchannels 4
> >
> > It looks like hal9000 is resolving to the IPv6 localhost address ::1 on
> > hal9000, but the Lisp is probably only listening on IPv4.
> >
> > Try restarting the tunnel with -L4005:127.0.0.1:4005 to force IPv4.
> >
> > __Martin
> >
>
> I restarted tunnel with IPv4: Now when I type something in the telnet
> session, I get Lisp to respond (with debugger in this case). Sly is still
> not connecting - maybe I need to restart emacs.
>
> Log:
> 1 Start slynk with :dont-close t as before
> 2 Start tunnel with verbose option, -v switch and IPv4
>
> $ ssh -v -L4005:127.0.0.1:4005 977315@hal9000
>
> 3 Telnet on laptop side to laptop port 4005
>
> @laptop> telnet localhost 4005
> Trying ::1...
> Connected to localhost.
> Escape character is '^]'.
>
> 3.1 SSH debug output to telnet start
>
> debug1: Connection to port 4005 forwarding to 127.0.0.1 port 4005 requested.
> debug1: channel 3: new [direct-tcpip]
>
> 4 Send command via telnet session
>
> Using telnet session send command:
>
> *features*
> Connection closed by foreign host.
>
> 4.1 Results in an error in lisp due to syntax error
>
> ;; slynk:close-connection: Not an integer string: "*featu"
> ;; closing 0 channels
> ;; closing 0 listeners
> ;; Event history start:
> decode-message
> close-connection: Not an integer string: "\x00FF\x00F4\x00FF\x00FD\x00FF" ...
> close-connection Not an integer string:
> "\x00FF\x00F4\x00FF\x00FD\x00FF" ... done.

This \x00FF\x00F4\x00FF\x00FD\x00FF string is unexpected to me.  Did that
happen when you tried to connect from emacs?  Restarting emacs is probably a
good idea to rule out any issue with its state.


> decode-message
> close-connection: Not an integer string: "*featu" ...
> ;; Event history end.
> ;; Backtrace:
> 0: (NIL #<Unknown Arguments>)
> 1: (NIL #<Unknown Arguments>)
> 2: (SLYNK-BACKEND:CALL-WITH-DEBUGGING-ENVIRONMENT #<Compiled-function
> (:INTERNAL SLYNK::SAFE-BACKTRACE) (Non-Global)  #x302000A0BFFF>)
> 3: (SLYNK::SAFE-BACKTRACE)
> 4: (SLYNK::SIGNAL-SLYNK-ERROR #<CCL::PARSE-INTEGER-NOT-INTEGER-STRING
> #x302000C1F32D> NIL)
> 5: (SIGNAL #<CCL::PARSE-INTEGER-NOT-INTEGER-STRING #x302000C1F32D>)
> 6: (CCL::%ERROR #<CCL::PARSE-INTEGER-NOT-INTEGER-STRING
> #x302000C1F32D> (:STRING "*featu") 5975725205395)
> 7: (PARSE-INTEGER "*featu" :START 0 :END 6 :RADIX 16 :JUNK-ALLOWED NIL)
> 8: (SLYNK-RPC:READ-PACKET #<BASIC-TCP-STREAM ISO-8859-1 (SOCKET/4)
> #x302000BDFD9D>)
> 9: (SLYNK-RPC:READ-MESSAGE #<BASIC-TCP-STREAM ISO-8859-1 (SOCKET/4)
> #x302000BDFD9D> #<Package "SLYNK-IO-PACKAGE">)
> 10: (SLYNK::DECODE-MESSAGE #<BASIC-TCP-STREAM ISO-8859-1 (SOCKET/4)
> #x302000BDFD9D>)
> 11: (SLYNK::READ-LOOP #<MULTITHREADED-CONNECTION #x302000BDEFCD>)
> 12: (CCL::RUN-PROCESS-INITIAL-FORM #<PROCESS reader-thread(9) [Active]
> #x302000BFEFBD> (#<COMPILED-LEXICAL-CLOSURE (:INTERNAL
> CCL::%PROCESS-RUN-FUNCTION) #x302000BFED2F>))
> 13: ((:INTERNAL (CCL::%PROCESS-PRESET-INTERNAL (PROCESS))) #<PROCESS
> reader-thread(9) [Active] #x302000BFEFBD> (#<COMPILED-LEXICAL-CLOSURE
> (:INTERNAL CCL::%PROCESS-RUN-FUNCTION) #x302000BFED2F>))
> 14: ((:INTERNAL CCL::THREAD-MAKE-STARTUP-FUNCTION))
> ;; Connection to Emacs lost. [
> ;;  condition: Not an integer string: "*featu"
> ;;  type: CCL::PARSE-INTEGER-NOT-INTEGER-STRING
> ;;  style: :SPAWN]

This is working as expected -- the slynk/swank server is not a REPL so you
can't just send *features* to it.

__Martin

Apologies for the long silence on this topic - I had other pressing matters to deal with. I still cannot connect to the swank server from Emacs. 

To summarize: Based on my tests (which are detailed below), I can connect to the swank server as tested via telnet. But emacs is still failing to connect to swank. I did some debugging and identified that make-network-process is timing out. 

I will start another thread on the topic of figuring out why make-network-process is timing out.

Martin, many thanks for helping me out so far.

Here is my test procedure:

1 Establish tunnel

> ssh -v -L4005:127.0.0.1:4005 mirko@hal9000

2 Start Swank server

[mirko@laptop] > ssh mirko@hal9000
[mirko@hal9000 .roswell]$ ros -L ccl-bin run --load start-slynk-server.lisp

 Added SLYNK path to ASDF:*CENTRAL-REGISTRY*
SLYNK's ASDF loader finished.
 Loaded ASDF system
;; Slynk started at port: 4005.

 Created SLYNK server on port 4005
 Set *USE-DEDICATED-OUTPUT-STREAM* to NIL
Clozure Common Lisp Version 1.11.5/v1.11.5  (LinuxX8664)

For more information about CCL, please see http://ccl.clozure.com.

CCL is free software.  It is distributed under the terms of the Apache
Licence, Version 2.0.
?

3 Test laptop is listening on port 4005 with Telnet

> telnet localhost 4005
Trying ::1...
Connected to localhost.
Escape character is '^]'.

4 Test tunnel to lisp

Typing stuff into telnet session

"foo"
Connection closed by foreign host.

makes CCL react. In this case it drops into debugger because of invalid input (I do not know what is valid input):

?
"; slynk:close-connection: Not an integer string: "\"foo\"
;; closing 0 channels
;; closing 0 listeners
;; Event history start:
... 

5 Try slime-connect
slime-connect times out

Created: 2020-04-18 Sat 19:41