My boss was stomping around agitated because The Business has some heavy data requests. He had written something some time ago that got them something close to what HE thought THEY thought they needed.
I knew that this wasn’t what they really needed, but let him go ahead and send a sample of the data off anyway. Most times when people senior to you have their minds made up, there’s no point in arguing.
“It’s pretty fast,” he said, “if you need the code. It takes about an hour per partition.”
On our system, some of the really big tables are pseudo-partitions – separate tables with identical structures and similar names. Orders1, Orders2, Orders3, etc. He meant it took about an hour to process each one of these tables.
What I really needed was a subset of that data. About 12%, as it turns out. So I politely acknowledged his offer of Free Code! and wrote something that processed the first 16 partitions. It completed in about 2 hours 20 minutes.
As a measure of how much data this is, this is the rowcount of the largest table after the first 16 partitions were done. Remember, this rowcount is about 12% of the data.
I have another set of partitions to process today, which will a bit more than double this total.
Yes, if take the time to do the math you will see that the base data is more than 1 billion rows.