I don't know what data you are reading but is there any chance that the difference is that when you read text in lisp as ISO-8859-1 lisp is actually processing the text as unicode, but when you are reading it in Java you are just slamming raw bytes into memory?
Maybe this is relevant? https://stackoverflow.com/questions/979932/read-unicode-text-files-with-java
I don't use Java myself, so I can't say, and I don't have access to your data, but it does seem like the Java code is doing something simpler than the Lisp code.
What happens if you change your Lisp code to read-sequence
of type byte
instead of character
?
On 21 Oct 2022, at 13:43, Garrett Dangerfield wrote:
I don't want to cause a firestore here but I was doing some simple
benchmarks on file i/o between Java, ABCL, and SBCL and I'm a bit shocked,
honestly.Reading a 2.5M file in 16M chunks in (using iso-8859-1):
- abcl takes a tad over 1 second
- sbcl takes 0.04 secondsReading a 5.8G file in 16M chunks in (using iso-8859-1 for Lisp, for Java
it's just bytes):
- abcl takes...too long, I gave up
- sbcl takes between 20 and 21 seconds
- Java takes 1.5 secondsThese are all run on the same computer using the same files, etc.
What's up with this? Thoughts? I'd heard that SBCL should be as fast as C
under at least some circumstances. I'd wager that C is at least as fast as
Java (probably faster).Thanks,
Garrett Dangerfield. (he/him/his)P.S. Don't get me wrong, I *LOVE* Lisp, I'm trying to get away from Java as
fast as I can (the syntax is killing me slowly). I've used ABCL in
projects before (it was wonderful, Java doesn't handle XML well).Lisp code:
(with-open-file (stream "/media/danger/OS/temp/jars.txt" :external-format
:iso-8859-1) ; great_expectations.iso
(let ((size (file-length stream))
(buffer-size (* 16 1024 1024)) ; 16M
)
(time
(loop with buffer = (make-array buffer-size :element-type 'character)
for n-characters = (read-sequence buffer stream)
while (< 0 n-characters)))
)))Java code:
private static final int BUFFER_SIZE = 16 * 1024 * 1024;
try (InputStream in = new
FileInputStream("/media/danger/OS/temp/great_expectations.iso"); ) {
byte[] buff = new byte[BUFFER_SIZE];
int chunkLen = -1;
long start = System.currentTimeMillis();
while ((chunkLen = in.read(buff)) != -1) {
System.out.println("chunkLen = " + chunkLen);
}
double duration = System.currentTimeMillis() - start;
duration /= 1000;
System.out.println(String.format("it took %,2f secs", duration));
} catch (Exception e) {
e.printStackTrace(System.out);
} finally {
System.out.println("Done.");
}
Robert P. Goldman
Research Fellow
Smart Information Flow Technologies (d/b/a SIFT, LLC)
319 N. First Ave., Suite 400
Minneapolis, MN 55401
Voice: (612) 326-3934
Email: rpgoldman@SIFT.net