Greetings all,
I'm trying to convert the following Java code in the "Basics" section of the
Spark Programming Guide to ABCL:
JavaRDD<Integer> lineLengths = lines.map(s -> s.length());
I know that "s -> s.length" is a JDK8 style lambda function with one parameter, returning the result of calling length() on 's'. What I'd like to be able to do is write:
(let ((line-lengths (#"map" *lines* (lambda (s) (#"length" s)))))but this isn't getting me anywhere, with Java saying there is no applicable method '
map' on *lines* (an instance of
JavaRDD). There is such a method (if it matters, it is inherited by JavaRDD from interface
JavaRDDLike). Investigating that map method a bit further, it seems to want an
org.apache.spark.api.java.function. Here's a clip from the Spark description:
Spark’s API relies heavily on passing functions in the driver program to run on the cluster. In Java, functions are represented by classes implementing the interfaces in the org.apache.spark.api.java.function package. There are two ways to create such functions:
- Implement the Function interfaces in your own class, either as an anonymous inner class or a named one, and pass an instance of it to Spark.
- Use lambda expressions to concisely define an implementation.
I think what I'm getting from the ABCL lambda expression is a java.util.function:
SPARK> (describe (lambda (s) #"length" s))
#<FUNCTION #<FUNCTION (LAMBDA (S)) {70B82AD0}> {70B82AD0}> is an object of type FUNCTION.
The function's lambda list is:
(S)
I do wonder though how the Java lambda "s -> s.length()" manages to produce the correct result, so this theory may not be correct.
The Spark guide goes on to say:
While much of this guide uses lambda syntax for conciseness, it is easy to use all the same APIs in long-form. For example, we could have written our code above as follows:
JavaRDD<String> lines = sc.textFile("data.txt");
JavaRDD<Integer> lineLengths = lines.map(new Function<String, Integer>() {
public Integer call(String s) { return s.length(); }
});
int totalLength = lineLengths.reduce(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer a, Integer b) { return a + b; }
});
Or, if writing the functions inline is unwieldy:
class GetLength implements Function<String, Integer> {
public Integer call(String s) { return s.length(); }
}
class Sum implements Function2<Integer, Integer, Integer> {
public Integer call(Integer a, Integer b) { return a + b; }
}
JavaRDD<String> lines = sc.textFile("data.txt");
JavaRDD<Integer> lineLengths = lines.map(new GetLength());
int totalLength = lineLengths.reduce(new Sum());
To me, both of those examples look unwieldy. There was a stack overflow discussion on
creating Java classes with ABCL from 9 years ago, and I've read the
java interface examples on abcl.org, but both of those techniques look like they will produce code just as unwieldy as the Java syntax above.
Is there any way to use a lisp-style lambda syntax to produce a function that will satisfy Spark's requirement to implement the org.apache.spark.api.java.function interface? If not, the obvious route would be some macrology to create a defsparkfun, defsparkfun2, etc. that wrap the ABCL function I want to use with something that implements 'call' in the org.apache.spark.api.java.function interface. I'm hoping there's a better way.