I agree there are the different classes of usage, and it's certainly my hope that whatever we adopt will be usable and convenient for both cases. I'm not sure there will be a dramatic difference in efficiency; I think this it might be a case of premature optimization.
Anyway, I'd like to make a distinction in surface syntax and core implementation. I think any algorithm can be implemented with any surface syntax that we want, and it seems that a lot of the syntax of xarray and grid are similar, and where they're different sometimes it's because I ran out of time and didn't carry up through the layers some of what Tamas did in affi that I see in xarray. Other times it's just a missing feature, like reduction. As far as core implementation goes, it seems like there ought to be a choice between affi and xarray, presuming there's a difference in efficiency or some other useful quality.
Liam
On Sun, Jan 24, 2010 at 7:24 PM, Mirko Vukovic mirko.vukovic@gmail.com wrote:
Some thoughts on the two interfaces (grid, xarray) discussed here ...
I am trying to figure out if we can classify different types of usage of vector and matrix data. The classification below is very rough with much gray area in-between.
At some basic level, collections of numbers are either
vectors and arrays to be processed by numerical algorithms just collections of numbers that are will be parsed, processed in some semi-numerical algorithms
Packages such as GSL and LAPACK will deal mostly with the first kind.
For other uses, like when dealing with results from multiple experiments, we are using vectors and arrays as indexed storage with fast access, but there may not be anything `algebraic' (in the sense of linear algebra) to those collections.
In this second case, we may choose to process all the numbers in the collection, or some random subset of them. (In either case, vectorized processing of those collections may be desired - Tamas has published a package that does that).
It seems to me that Tamas' (now abandoned) `affi' package, on top of which `grid' is built upon, is a natural for case 1 above, while xarray is natural for case 2 above.
In addition, someone noted that affi is probably faster than xarray (to be verified), which is of paramount importance for the number crunching libraries (We first use non-numeric tools at the top level when parsing the data, which than may pass the data to the number-crunchers in gsll, lla, where speed is important).
In that case, the two packages may have a valid role each. What would be optimal would be a unified notation, in which case that of grid would be a subset of the xarray.
Mirko