Let's say I have two lists left
and right
of size n, each with unique integers, and I want to find the intersection and differences. This should be able to be done in O(n) or O(nlog(n)). This is how I instantiate the lists:
Integer n = 10_000Integer overlap = (Integer) (0.1 * n)List left = (1..n).toList().shuffled()List right = (n-overlap+1..2*n-overlap).toList().shuffled()
One way to get the intersection and differences would be for example:
Set setLeft = leftSet onlyInRight = right.findAll { !(it in setLeft) }Set inBoth = right.findAll { !(it in onlyInRight) }Set onlyInLeft = left.findAll { !(it in inBoth) }
This works quite well, but I find the following more readable:
List inBoth = left.intersect(right)List onlyInLeft = left - inBothList onlyInRight = right - inBoth
Interestingly, when I played around a bit the latter apporach was pretty slow and felt more like O(n2). So I took a peek at the code base of the minus method:
public static <T> Collection<T> minus(Iterable<T> self, Iterable<?> removeMe, Comparator<? super T> comparator) { Collection<T> self1 = asCollection(self); Collection<?> removeMe1 = asCollection(removeMe); Collection<T> ansCollection = createSimilarCollection(self1); if (self1.isEmpty()) return ansCollection; T head = self1.iterator().next(); boolean nlgnSort = sameType(new Collection[]{self1, removeMe1}); // We can't use the same tactic as for intersection // since AbstractCollection only does a remove on the first // element it encounters. if (nlgnSort && (head instanceof Comparable)) { //n*LOG(n) version Set<T> answer; if (head instanceof Number) { answer = new TreeSet<>(comparator); answer.addAll(self1);// ------------ Beginning of relevant part -- for (T t : self1) { // A) -- loop over self1 if (t instanceof Number) { for (Object t2 : removeMe1) { // B) -- inner loop over removeMe1 if (t2 instanceof Number) { if (comparator.compare(t, (T) t2) == 0) answer.remove(t); } } } else { if (removeMe1.contains(t)) answer.remove(t); } }// ------------ End of relevant part -- } else { answer = new TreeSet<>(comparator); answer.addAll(self1); answer.removeAll(removeMe1); } for (T o : self1) { if (answer.contains(o)) ansCollection.add(o); } } else { //n*n version // ...
The relevant part is between the comments (from me)
// -- Beginning of relevant part --
and// -- End of relevant part --
.
Is it just me or is this O(n2) for instance of Number
? Maybe there is a reason for this, but instead, I would have expected something like
// ------------ Beginning of relevant part -- for (Object t : removeMe1) { answer.remove(t); }// ------------ End of relevant part --
So if I am not mistaken minus()
works faster if I wrap the numbers in a custom value class implementing Comparable
. Has anybody an idea what I am missing or why it was implemented like this?