Quantcast
Channel: Active questions tagged set-difference - Stack Overflow
Viewing all articles
Browse latest Browse all 35

Why is the Groovy method minus() so slow with lists of numbers?

$
0
0

Let's say I have two lists left and right of size n, each with unique integers, and I want to find the intersection and differences. This should be able to be done in O(n) or O(nlog(n)). This is how I instantiate the lists:

Integer n = 10_000Integer overlap = (Integer) (0.1 * n)List left = (1..n).toList().shuffled()List right = (n-overlap+1..2*n-overlap).toList().shuffled()

One way to get the intersection and differences would be for example:

Set setLeft = leftSet onlyInRight = right.findAll { !(it in setLeft) }Set inBoth = right.findAll { !(it in onlyInRight) }Set onlyInLeft = left.findAll { !(it in inBoth) }

This works quite well, but I find the following more readable:

List inBoth = left.intersect(right)List onlyInLeft = left - inBothList onlyInRight = right - inBoth

Interestingly, when I played around a bit the latter apporach was pretty slow and felt more like O(n2). So I took a peek at the code base of the minus method:

public static <T> Collection<T> minus(Iterable<T> self, Iterable<?> removeMe, Comparator<? super T> comparator) {        Collection<T> self1 = asCollection(self);        Collection<?> removeMe1 = asCollection(removeMe);        Collection<T> ansCollection = createSimilarCollection(self1);        if (self1.isEmpty())            return ansCollection;        T head = self1.iterator().next();        boolean nlgnSort = sameType(new Collection[]{self1, removeMe1});        // We can't use the same tactic as for intersection        // since AbstractCollection only does a remove on the first        // element it encounters.        if (nlgnSort && (head instanceof Comparable)) {            //n*LOG(n) version            Set<T> answer;            if (head instanceof Number) {                answer = new TreeSet<>(comparator);                answer.addAll(self1);// ------------ Beginning of relevant part --                for (T t : self1) {                     // A) -- loop over self1                    if (t instanceof Number) {                        for (Object t2 : removeMe1) {   // B) -- inner loop over removeMe1                            if (t2 instanceof Number) {                                if (comparator.compare(t, (T) t2) == 0)                                    answer.remove(t);                            }                        }                    } else {                        if (removeMe1.contains(t))                            answer.remove(t);                    }                }// ------------ End of relevant part --            } else {                answer = new TreeSet<>(comparator);                answer.addAll(self1);                answer.removeAll(removeMe1);            }            for (T o : self1) {                if (answer.contains(o))                    ansCollection.add(o);            }        } else {            //n*n version            // ...

The relevant part is between the comments (from me)

  • // -- Beginning of relevant part -- and
  • // -- End of relevant part --.

Is it just me or is this O(n2) for instance of Number? Maybe there is a reason for this, but instead, I would have expected something like

// ------------ Beginning of relevant part --                for (Object t : removeMe1) {                    answer.remove(t);                }// ------------ End of relevant part --

So if I am not mistaken minus() works faster if I wrap the numbers in a custom value class implementing Comparable. Has anybody an idea what I am missing or why it was implemented like this?


Viewing all articles
Browse latest Browse all 35

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>