I need to find all IDs that are not in my table, between the min and max ID. Unfortunately MySQL has no simple way of generating sequences, so I thought it will be easier to do in the application.
IDs in the table are from an external source, and probably around 80% of the min-max range are empty spaces (around 40mln rows in 200mln range). As a primary key, they are already fetched sorted.
The problem is simply C = A\B
, but the set A
is somewhat special so I thought of a few ways to implement that:
Generate a std::set
A
of all integers between min and max ID and then remove from it the values in set/vector B
. However I'm not sure if std::set
will work well at this memory size and with this many removals...
Populate a std::vector
with std::set_difference
of vector A
and vector B
. However, this will almost double the memory requirements (because of B
being several times smaller in cardinality than A
).
As previous, but replace the vector A
with std::ranges::views::iota
?
Some custom algorithm, iterating on a range A
, checking if not exists in B
and pushing to C
.
Any other ideas?What is the best choice (and data structure) for this problem? Ideally minimizing both memory and runtime, or at least with small tradeoffs.Should I pre-allocate (reserve
) in any of these solutions?