Quantcast
Channel: Active questions tagged set-difference - Stack Overflow
Viewing all articles
Browse latest Browse all 35

Find integers in range which are not in a set

$
0
0

I need to find all IDs that are not in my table, between the min and max ID. Unfortunately MySQL has no simple way of generating sequences, so I thought it will be easier to do in the application.

IDs in the table are from an external source, and probably around 80% of the min-max range are empty spaces (around 40mln rows in 200mln range). As a primary key, they are already fetched sorted.

The problem is simply C = A\B, but the set A is somewhat special so I thought of a few ways to implement that:

Generate a std::setA of all integers between min and max ID and then remove from it the values in set/vector B. However I'm not sure if std::set will work well at this memory size and with this many removals...

Populate a std::vector with std::set_difference of vector A and vector B. However, this will almost double the memory requirements (because of B being several times smaller in cardinality than A).

As previous, but replace the vector A with std::ranges::views::iota?

Some custom algorithm, iterating on a range A, checking if not exists in B and pushing to C.

Any other ideas?What is the best choice (and data structure) for this problem? Ideally minimizing both memory and runtime, or at least with small tradeoffs.Should I pre-allocate (reserve) in any of these solutions?


Viewing all articles
Browse latest Browse all 35

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>