Gruppen du sender innlegg til, er en Usenet-gruppe. E-postadressen til forfatteren av meldinger som legges inn i denne gruppen, vil vises for alle på Internett.
On Jun 16, 11:08 am, t...@sevak.isi.edu (Thomas A. Russ) wrote:
> Not to mention REMOVE-DUPLICATES, which is typically O(n^2) > in most implementations.
really? I could see that worst case, but it would depend on the test. If the test is eq I would expect an O(n) solution.
I guess it would devolve to O(n^2) if you had something that really didn't hash / a lot of collisions (for a hash table implementation), or really unbalanced data (if you are using a tree instead of a hash to remember what has already been seen). Memory cost would be something to hash it, so maybe that isn't allowed? or tunable depending on if you optimize for speed or space. I don't know what implementations actually do though.
On Jun 16, 9:34 pm, Madhu <enom...@meer.net> wrote:
> * Scott Burson Wrote on Tue, 16 Jun 2009 10:42:48 -0700 (PDT):
> | For FSet I wanted LAST to be symmetrical with FIRST, so I shadow LAST > | and export its functionality under the name LASTCONS.
> Luckily, if you wanted to use have a mutable data structure built with > cons cells, you'd not use FSet in the first place, and CL:LAST would > still behave as the Spec defines it.
Even if you are using FSet, you have the choice of whether to shadowing-import FSET:LAST, or not.
On Jun 16, 9:01 pm, Xah Lee <xah...@gmail.com> wrote:
> Let me give a analogy that illustrate your folly.
> Suppose i'm creating a lisp-like lang, and in my lisp, i have a cons > like thing, except it has 3 cells. Suppose this )cons with 3 celles(, > is called )fons( in my lang. Fons has 3 accessors, called )car(, )cbr > (, )cdr(. These terms are based on lisp tradition. But now suppose, > i'm dissatisfied with these terms, so i looked up 3-body problems in > mathematics. JESUS, I FOUND NONE! Therefore, in actualality, car, cbr, > cdr are the best! When i see them, i don't think of IMB 666 registers. > I could use )first component(, )middle component(, )third component(, > but that constitute the ill of wordygidiss!
No need to imagine, you can crete one in Lisp easily --
Scheme code:
(define (fons x y z) (define (set-x! v) (set! x v)) (define (set-y! v) (set! y v)) (define (set-z! v) (set! z v)) (define (dispatch m) (cond ((eq? m 'car) x) ((eq? m 'cbr) y) ((eq? m 'cdr) z) ((eq? m 'set-car!) set-x!) ((eq? m 'set-cbr!) set-y!) ((eq? m 'set-cdr!) set-z!) (else (error "Undefined operation -- FONS" m)))) dispatch)
(define (cons x y) (define (set-x! v) (set! x v)) (define (set-y! v) (set! y v)) (define (dispatch m) (cond ((eq? m 'car) x) ((eq? m 'cdr) y) ((eq? m 'set-car!) set-x!) ((eq? m 'set-cdr!) set-y!) (else (error "Undefined operation -- CONS" m)))) dispatch)
(define (car z) (z 'car)) (define (cdr z) (z 'cdr))
(define (set-car! z new-value) ((z 'set-car!) new-value) z) (define (set-cbr! z new-value) ((z 'set-cbr!) new-value) z) (define (set-cdr! z new-value) ((z 'set-cdr!) new-value) z)
K Livingston <kevinlivingston.pub...@gmail.com> writes: > On Jun 16, 11:08 am, t...@sevak.isi.edu (Thomas A. Russ) wrote: > > Not to mention REMOVE-DUPLICATES, which is typically O(n^2) > > in most implementations.
> really? I could see that worst case, but it would depend on the > test. If the test is eq I would expect an O(n) solution.
> I guess it would devolve to O(n^2) if you had something that really > didn't hash / a lot of collisions (for a hash table implementation), > or really unbalanced data (if you are using a tree instead of a hash > to remember what has already been seen). Memory cost would be > something to hash it, so maybe that isn't allowed? or tunable > depending on if you optimize for speed or space. I don't know what > implementations actually do though.
Well, with the caveat that it's been a long while since we actually did any performance testing of this, but at one time it seems that remove-duplicates was done using the straightforward O(n^2) algorithm. There was no use of auxiliary data structures such as hash tables or other clever means of detecting duplicates.
So for our software we implemented our own FAST-REMOVE-DUPLICATES function that did use an O(n) algorithm.
In defense of the implementations, it seems like the fallback solution has to be available to use since, unlike hash tables, REMOVE-DUPLICATES has to accept an arbitrary test. So a hash table solution doesn't necessarily work, except for certain specific tests. So it really comes down to whether an implementation uses a special case for the (probably relatively common) cases of EQ, EQL or EQUAL tests.
In any case, while reading the HyperSpec, I also noticed the following interesting constraint. I found it interesting because it is exactly the opposite of what I would have expected:
"The elements of sequence are compared pairwise, and if any two match, then the one occurring earlier in sequence is discarded, unless from-end is true, in which case the one later in sequence is discarded."
That requirement would also seem to put some constraints on the algorithm that can be used to implement REMOVE-DUPLICATES, since the naive hash-table implementation [where you iterate through the sequence and only keep the elements that haven't been see before] would have the opposite behavior.
-- Thomas A. Russ, USC/Information Sciences Institute
On 18 jun, 15:33, t...@sevak.isi.edu (Thomas A. Russ) wrote:
> [...] > In any case, while reading the HyperSpec, I also noticed the following > interesting constraint. I found it interesting because it is exactly > the opposite of what I would have expected:
> "The elements of sequence are compared pairwise, and if any two match, > then the one occurring earlier in sequence is discarded, unless > from-end is true, in which case the one later in sequence is > discarded."
> That requirement would also seem to put some constraints on the > algorithm that can be used to implement REMOVE-DUPLICATES, since the > naive hash-table implementation [where you iterate through the sequence > and only keep the elements that haven't been see before] would have the > opposite behavior.
Since there is nothing saying that the list needs to be constructed from the beginning or from the end, this is a valid (sort of) straight- forward solution (with :test #'eql :from-end nil)
On Jun 18, 3:39 pm, gugamilare <gugamil...@gmail.com> wrote: [...]
> Since there is nothing saying that the list needs to be constructed > from the beginning or from the end, this is a valid (sort of) straight- > forward solution (with :test #'eql :from-end nil) > (defun rem-dups (list &optional (hash-table (make-hash-table :size > (length list)))) > (when list > (let ((tail (rem-dups (cdr list) hash-table))) > (if (gethash (car list) hash-table) > tail > (cons (setf (gethash (car list) hash-table) (car list)) > tail))))) > The only problem is that it is not tail recursive, and I don't see a > way to make it tail recursive without creating an explicit stack.
Just reverse the list to start with, so the first element you see is the last one in the list; that way you don't need to mess with tail pointers (though it's maybe worth recycling your conses).
(defun rem-dups (list) (let ((table (make-hash-table :size (length list)))) (labels ((helper (rev acc) (if (null rev) acc (let ((elt (first rev)) (rest (rest rev))) (cond ((gethash elt table) (helper rest acc)) (t (setf (gethash elt table) t ;; Smash the CDR, because this will ;; be a fresh cons. (cdr rev) acc) (helper rest rev))))))) (helper (reverse list) '()))))
On Jun 18, 1:33 pm, t...@sevak.isi.edu (Thomas A. Russ) wrote:
> In any case, while reading the HyperSpec, I also noticed the following > interesting constraint. I found it interesting because it is exactly > the opposite of what I would have expected:
> "The elements of sequence are compared pairwise, and if any two match, > then the one occurring earlier in sequence is discarded, unless > from-end is true, in which case the one later in sequence is > discarded."
> That requirement would also seem to put some constraints on the > algorithm that can be used to implement REMOVE-DUPLICATES, since the > naive hash-table implementation [where you iterate through the sequence > and only keep the elements that haven't been see before] would have the > opposite behavior.
Huh, that is the exact opposite of what I intuitively expected too. (This probably explains some ordering changes in the output of a few of my pieces of code, that I didn't expect. But it wasn't a problem/ wrong, and i never tracked it down.) Other things I easily think of default to the first one found. For example plists, and destructuring- bind on a keyword list that has multiple keys of the same value.
I looked it up in CLtL2, and after it says what you quoted, it says the output is allowed to share a tail with the original. So it's optimized for getting the biggest tail possible by removing from the front. Saves consing and space.
I wonder if I have assumptions in my code that think remove-duplicates provides a clean list. Got burned by that real bad with sublis once, so I'm aware of that type of problem, thought I wouldn't make it any more. I wonder if there is a list somewhere of functions allowed to share structure? Probably safe to assume all of them I guess.
K Livingston <kevinlivingston.pub...@gmail.com> writes: > On Jun 18, 1:33 pm, t...@sevak.isi.edu (Thomas A. Russ) wrote: > > "The elements of sequence are compared pairwise, and if any two match, > > then the one occurring earlier in sequence is discarded, unless > > from-end is true, in which case the one later in sequence is > > discarded." ... > I looked it up in CLtL2, and after it says what you quoted, it says > the output is allowed to share a tail with the original. So it's > optimized for getting the biggest tail possible by removing from the > front. Saves consing and space.
That sounds like the most likely explanation.
I was thinking that it might also have something to do with the idea that you are REMOVEing elements from the list, and you remove them in the order encountered. So that would also mean one might remove from the front. I suppose the other way would have to be called something like KEEP-UNIQUE. It could have the following trivial definition: