Re: [Antik-devel] Matlisp, CLOS

8 Mar 2013

      At Fri, 08 Mar 2013 13:09:00 +0100,
Nicolas Neuss wrote:
...
[I added some more people to the CC]
Akshay Srinivasan <akshaysrinivasan@gmail.com> writes:
...
At Thu, 07 Mar 2013 10:35:46 +0100,
Nicolas Neuss wrote:
...
Akshay Srinivasan <akshaysrinivasan@gmail.com> writes:
...
I wanted to polish the whole "static object" thing I'm doing write now,
by writing a thin layer over defclass itself. This seems awful lot like 
MOP.
What I'm doing right now is essentially that every tensor class has a set
of inlined functions associated with it, which are then used inside macros
everywhere to define type specific functions.
From what I understand of MOP, I can possibly define a meta-class which holds
all these functions, and then possibly define the defmethod over instances of the
metaclass (which sort of does the equivalent of macroexpansion ?). Essentially
I don't really want to have runtime dispatch; can I define in some sense a
meta-version of defmethod/defgeneric on a class so that it wraps the code for the
method inside a symbol-macrolet to replace things like "+,-" with something
specialised for the class?
Although I already can imagine what you want quite well - could you
maybe sketch an example (as simple as possible)?
...
I can't find any real resource on MOP, so forgive me if this doesn't
make any sense whatsoever.
The chapters 5 and 6 of the AMOP book should be freely available, see
http://www.clisp.org/impnotes/mop-chap.html
or google for "mop_spec.pdf".
...
Nicholas: I know this is probably more up your alley. Do you think this sort of
thing is possible with MOP ?
I doubt a little that the MOP is sufficient for achieving this, but I
also would not find using something like DEFMETHOD* instead of DEFMETHOD
very bad and then you have all liberty you want.  As I see it the MOP
was created for achieving high flexibility, but not high performance,
and so implementations like CMUCL or SBCL have a slightly similar but
more lowlevel mechanism (compiler transforms, IIRC) how one can instruct
the compiler to optimize operations with type information that is known
at compile time.
Sigh. I was hoping to avoid doing all the superclass ordering and stuff,
oh well. I sort of want to do things like the macro generate-typed-copy!
in the file:
https://github.com/enupten/matlisp/blob/tensor/src/level-1/copy.lisp
without having to read things from a Hashtable everytime. Maybe writing a
new object system is overkill, and I should just use a macro like with-slots.
What about doing a type-specialized compilation on-demand as I do in
femlisp/src/matlisp/blas-basic.lisp?
Alternatively, this could also be triggered when NO-APPLICABLE-METHOD is
called.
I glanced through the code. It looks like you're incrementally encoding information
about loop ordering and what to do in the loop ? Am I right ? Is there a  version 
of m^* which uses such a thing though. I've only read the source for t* (which was
the inspiration for mod-idxtimes :)

Yes, I probably should've done that; but it only got painful when I had enumerate
all the loop orderings for GEMM. The macros for different which generate the basic
BLAS functions can be replaced in time with more elegant code with time.
...
...
...
...
Akshay
Apropos: I am still trying to build and run your Matlisp without
success.  First, I had difficulties because f77 did not know the "exit"
command used in "iladlr.f", for example.  Using gfortran compiled at
least the Fortran code, however after compilation I am left in a state
with apparently nothing new available.
Is this the Intel compiler ?
From the Manpages on my system
gfortran - GNU Fortran compiler
f77=fort77 - invoke f2c Fortran translator transparently, like a
compiler
I think this has changed recently.  Some time ago, f77 was the GNU
Fortran compiler.
I'll try compiling it with f77 over the weekend and report back.
...
...
...
[...]
; /home/neuss/.cache/common-lisp/sbcl-1.1.5.5-203e2ac-linux-x64/home/neuss/matlisp/src/sugar/ASDF-TMP-seq.fasl written
; compilation finished in 0:00:00.022
; 
; compilation unit finished
;   printed 8 notes
** MATLISP is loaded.  Type (HELP MATLISP)
    to see a list of available symbols.
    To use matlisp:
(use-package "MATLISP")
or
(in-package "MATLISP-USER")
* (help matlisp)
; in: HELP MATLISP
;     (HELP MATLISP)
; 
; caught STYLE-WARNING:
;   undefined function: HELP
; 
; caught WARNING:
;   undefined variable: MATLISP
; 
; compilation unit finished
;   Undefined function:
;     HELP
;   Undefined variable:
;     MATLISP
;   caught 1 WARNING condition
;   caught 1 STYLE-WARNING condition
debugger invoked on a UNBOUND-VARIABLE in thread
#<THREAD "main thread" RUNNING {10029D9833}>:
  The variable MATLISP is unbound.
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.
((LAMBDA ()))
0] 0
* (apropos "matlisp")
COMMON-LISP-USER::MATLISP
:MATLISP (bound)
:MATLISP-TESTS (bound)
:MATLISP-USER (bound)
*MATLISP-VERSION* (bound)
MATLISP
MATLISP-HERALD (fbound)
MATLISP-NAME
MATLISP-VERSION (fbound)
SAVE-MATLISP (fbound)
MATLISP-FFI::MATLISP-SPECIALIZED-ARRAY
MATLISP-SYSTEM::MATLISP
MATLISP-SYSTEM::MATLISP-CONDITIONS
MATLISP-SYSTEM::MATLISP-CONFIG
MATLISP-SYSTEM::MATLISP-PACKAGES
MATLISP-SYSTEM::MATLISP-TESTS
MATLISP-SYSTEM::MATLISP-UTILITIES
Yes, the old help system isn't yet incorporated. I'm sorry that you've to
tread through my undocumented code.
Assuming you want to test the GEMM, you'd want to do something like:
-----------------------------------------------------------------------------
(in-package :matlisp)
(let ((A (make-real-tensor 1000 1000))
      (B (make-real-tensor 1000 1000)))
  ;;Slow and dynamic
  (time   
   (mod-dotimes (idx (dimensions A))
     do (progn
    (setf (tensor-ref A idx) (random 1d0)
      (tensor-ref B idx) (random 1d0)))))
  ;;Faster (although random slows it down quite a bit).
  #+nil
  (time   
   (let-typed ((sto-a (store A) :type real-store-vector)
         (sto-b (store B) :type real-store-vector))
      (mod-dotimes (idx (dimensions A))
  with (linear-sums
        (of-a (strides A) (head A))
        (of-b (strides B) (head B)))
   do (progn
        (real-typed.value-writer (random 1d0) sto-a of-a)
        (real-typed.value-writer (random 1d0) sto-b of-b)))))
   ;;Use lisp
  (let ((*real-l3-fcall-lb* 1000))
    (time (gemm 1d0 A B nil nil)))
  ;;Use fortran
  (let ((*real-l3-fcall-lb* 0))
    (time (gemm 1d0 A B nil nil))))
--------------------------------------------------------------------------------
I realised I haven't actually added all my "test" files into the repository.
I'll add them to the repo today.
On my computer the timings are something like:
Lisp: 3.2s
C (tests/mm.c): 2.2s
Goto: 0.2s
Akshay
OK, this works.
The timings were on SBCL by the way. CCL sadly tends to be extremely slow.
Don't know about other compiled lisps.
...
Some further questions and remarks:
- Do you have also a reader macro like [...] in old Matlisp?  And could
  you illustrate how slicing works?
No, there isn't. I'm trying to tweak Mark Kantrowitz' infix and add
slicing and the [..] declaration to it. 

You can do the slicing in Lisp by doing things like:
...
(defvar X (make-real-tensor 10 10 10))
  X
;; Get (:, 0, 0)
...
(sub-tensor~ X '((* * *) (0 * 1) (0 * 1)))
;; Get (:, 2:5, :)
...
(sub-tensor~ X '((* * *) (2 * 5)))
;; Get (:, :, 0:2:10) (0:10:2 = [i : 0 <= i < 10, i % 2 = 0])
...
(sub-tensor~ X '((* * *) (* * *) (0 2 10)))
The semantics of the slicing operator resembles that of Python, except
for the "step" of the slice always being in the middle rather than the end.
I know this function is ugly, but this was written with the purpose of
being the backend to the infix (and move parsing into the infix reader).
...
- Looking at how complicated e.g. "gemm.lisp" is, I am not sure if doing
  this in CL is really worthwile.  Optimizing for small matrices might
  be the wrong idea from the beginning.
Its actually optimized for everything. The code for gemm.lisp is extraordinarily
hairy, because it has the code for every loop order (of which there are 3), to
take advantage of SSE when possible. Its not too bad though. I know I could've
written some sort of code to automate this, but I only have so much time.

It calls BLAS when the size of the matrix exceeds *real-l3-fcall-lb*, and so 
works very well on matrices of all sizes. Calling fortran for matrices of size
less than 10, is quite expensive though. Even this is not entirely clear, 
because it appears that if you call the same fortran function repeatedly in
a loop then the overhead tends to be much less for subsequent foreign
calls than the first one. The power user can then bind the variables in
src/base/tweakable.lisp for fine-tuned optimization.
...
- I would be interested in the minimal amount of code necessary for
  adding some new LAPACK routine.  If possible, the stub should be even
  smaller than in femlisp/src/matlisp/ggev.lisp and
  femlisp/src/matlisp/hgev.lisp (solutions of generalized eigenvalue
  problems).
I don't think writing code for LAPACK in lisp is feasible; it doesn't look
like femlisp does that, but I know this project called Lisplab has its code 
for LU ..

The code in src/lapack/getrs.lisp is not really all that bad. Sure its 
probably not as neat as that in lisp-matrix or femlisp, but its
essentially doing the same thing.

The whole polishing the code thing I was referring to before would be the
phase where I steal code and ideas from each of your projects and use them
around the current structure of Matlisp.
...
- I really am interested in single-float stuff too, because I will look
  more closely in generating high-performance code in the next future.
  In this domain, using single-float is often interesting, because it
  needs only half the memory and using it can lead to double efficiency
  in situations where memory bandwidth is the limiting factor.
Yes, single-floats (and single-complex) should be quite useful. 
This should be very easy to add in the sense of basic functionality. All you have
to do is define a new tensor type as in src/classes/symbolic-tensor.lisp, and call
each of the method generation macros like generate-typed-gemm!.

I think it is trivial to generate methods if the arguments are of the same type,
but it will take some work if this is not the case (for instance gemm with a
real-matrix and a complex-matrix). Again I have an idea as to how to go about it,
but I only have so much time on my hand. This part very much resembles how an 
Object system would work. The combinatorial explosion also means that hand-coding
methods is going to be an utter pain in the back. I want to get all the method
generation stuff working, before I even bother with this.

Akshay