Hi there,
Prompted by a discussion on #lisp, I've made the following collect-duplicates function: Takes a sequence and a test, and returns as two values a sequence of duplicates and a sequence of the duplicate count (matching their sequence type to the input sequence's).
(defun collect-duplicates (sequence &key (test #'eql)) ;; This will work only for TESTs that are valid hash tests: (let ((dupes (make-hash-table :test test)) (known (make-hash-table :test test))) (map nil (lambda (elt) (if (gethash elt known) (incf (gethash elt dupes 1)) (setf (gethash elt known) t))) sequence) (let ((duplicates (make-sequence (class-of sequence) (hash-table-count dupes))) (counts (make-sequence (class-of sequence) (hash-table-count dupes))) (i 0)) (maphash (lambda (k v) (setf (elt duplicates i) k (elt counts i) v) (incf i)) dupes) (values duplicates counts))))
Issues that I see with it:
* Needs a :key argument. * :test should allow for non-hash-table tests, as well. * (class-of ...) is not really a good idea, as it doesn't preserve the element type for arrays.
If I fixed these issues, would there be interest in having this function in alexandria?
Cheers,
Andreas Fuchs <asf <at> boinkor.net> writes:
Prompted by a discussion on #lisp, I've made the following collect-duplicates function: Takes a sequence and a test, and returns as two values a sequence of duplicates and a sequence of the duplicate count (matching their sequence type to the input sequence's).
Hi Andreas,
well, if you ask me, I would very much like to see this in alexandria.
If it makes any sense, I can offer help writing test cases... But before that, just a question: I was running it on a string. Should the type of the result sequences be overidable?
Best, Kilian
Hi Andreas,
There are cases where the group of duplicate values is of use. How about a more general function GROUP-BY which could be used to implement COLLECT-DUPLICATES.
TMP-PACKAGE> (group-by '((a 1) (a 2) (a 3) (b 4) (b 1) (c 1) (c 4)) #'eq :key #'first) (((A 1) (A 2) (A 3)) ((B 4) (B 1)) ((C 1) (C 4)))
TMP-PACKAGE> (group-by '((a 1) (a 2) (a 3) (b 4) (b 1) (c 1) (c 4)) #'eq :key #'second) (((A 1) (B 1) (C 1)) ((A 2)) ((A 3)) ((B 4) (C 4)))
Mark
(defun separate-according-to (sequence predicate &key (key #'identity)) "Returns two lists as values where the first contains the items in SEQUENCE for which PREDICATE is not NIL and the second contains the rest." (let ((group-equal nil) (group-not-equal nil)) (flet ((assign-to-group (object) (if (funcall predicate (funcall key object)) (push object group-equal) (push object group-not-equal)))) (map nil #'assign-to-group sequence))
(values (nreverse group-equal) (nreverse group-not-equal))))
(defun group-by (sequence test-function &key (key #'identity)) "Groups items in SEQUENCE according to the TEST-FUNCTION. Each item in the returned LIST contains each group. i.e. TEST-FUNCTION returns non NIL for all values in a group." (labels ((group-by/recursive (sequence result) (case (length sequence) (0 (nreverse result)) (1 (nreverse (cons (list (elt sequence 0)) result))) (t (let* ((object (elt sequence 0)) (object-key (funcall key object)) (fn (lambda (other-object) (funcall test-function object-key (funcall key other-object))))) (unless (funcall fn object) (error "Invalid TEST-FUNCTION as one item fails test with itself.")) (multiple-value-bind (equal-to not-equal-to) (separate-according-to (subseq sequence 1) fn) (group-by/recursive not-equal-to (cons (cons object equal-to) result)))))))) (group-by/recursive sequence nil)))
(defun collect-duplicates (sequence &key (test #'eq) (key #'identity) (only-duplicates t)) "Takes a sequence and a test, and returns as two values a sequence of duplicates and a sequence of the duplicate count. The first sequence matches the input sequence's type." (flet ((one-or-less-p (v) (<= v 1))) (let ((groups (group-by sequence test :key key))) (let ((groups (if only-duplicates (remove-if #'one-or-less-p groups :key #'length) groups))) (values (map-into (make-sequence (class-of sequence) (length groups)) #'first groups) (map 'list #'length groups))))))
On 02/12/2010, at 10:03 AM, Andreas Fuchs wrote:
Hi there,
Prompted by a discussion on #lisp, I've made the following collect-duplicates function: Takes a sequence and a test, and returns as two values a sequence of duplicates and a sequence of the duplicate count (matching their sequence type to the input sequence's).
(defun collect-duplicates (sequence &key (test #'eql)) ;; This will work only for TESTs that are valid hash tests: (let ((dupes (make-hash-table :test test)) (known (make-hash-table :test test))) (map nil (lambda (elt) (if (gethash elt known) (incf (gethash elt dupes 1)) (setf (gethash elt known) t))) sequence) (let ((duplicates (make-sequence (class-of sequence) (hash-table-count dupes))) (counts (make-sequence (class-of sequence) (hash-table-count dupes))) (i 0)) (maphash (lambda (k v) (setf (elt duplicates i) k (elt counts i) v) (incf i)) dupes) (values duplicates counts))))
Issues that I see with it:
- Needs a :key argument.
- :test should allow for non-hash-table tests, as well.
- (class-of ...) is not really a good idea, as it doesn't preserve
the element type for arrays.
If I fixed these issues, would there be interest in having this function in alexandria?
Cheers,
Andreas Fuchs, (http://%7Cim:asf@%7Cmailto:asf@)boinkor.net, antifuchs
alexandria-devel mailing list alexandria-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/alexandria-devel
alexandria-devel@common-lisp.net