As of this writing, I am working in an environment in which things are much degraded. The company grew very fast, grew too big, didn’t have processes in place but instead threw money at problems in the form of hiring more developers to write more undocumented code…
Sound familiar?
Then the shit hit the fan and growth flattened. Typically when that happens the inefficiencies catch up with you, and they did in this case. There was a massive series of layoffs and the company now employs only about 20% of the headcount it did 4 years ago.
Sound familiar?
If so, you know the sort of mess that resulted. For example, as part of a cleanup, I found a process that dutifully loaded data into a table every day. It had been doing so, without error, for 10 years. That data had never been read. Never been cleaned out. Nobody used it. For 10 years.
There were, and are, some brutally inefficient queries running. Really serious beat-the-shit-out-of-the-server stuff, including one that averaged 1.3 BILLION page reads per execution.
Over time, we have improved things, chopping the heads off the hydra one at a time.
I came to this one last week. After kicking off, it runs for an average of 28 minutes on a server with 384 Gb of RAM and 4 – 8 core processors. It doesn’t read all that many pages (considering) – only 60 million pages, which, believe it or not, isn’t all that bad compared to others that are running.
Digging down into the code, I found this:
WHERE Warehouse.dbo.UTCTimeConvertToPT(f.CreatedUTCDtm) >= @BeginDate
AND Warehouse.dbo.UTCTimeConvertToPT(f.CreatedUTCDtm) < @EndDate
So what a developer had done was to use a UDF to convert a UTC Datetime to local so that this time could be compared to the calculated start and end dates.
This is bad in two ways – first, when you wrap a column reference in a function, the engine cannot know what the result of that function will be until run time. This means that it cannot make use of an index, should one exist. In this case it didn’t, and the query was performing a table scan (because there is no clustered index on this table, which is a separate issue).
But that’s not the worst part of this. The query must run the function for each row it encounters, then compare that result to the two variables.
There are 45 million rows in this table. That means that in order to find all the rows that qualify, the engine must scan the entire table and apply the date conversion function 45 million times. This, somewhat understandably, takes time, and that is why the stored procedure that this is part of was taking 25 minutes to execute every day.
I hope that the solution is obvious, although none of the developers here seemed to have stumbled on it. Calculate @BeginDate and @EndDates in UTC and then use those dates to compare to the native column:
WHERE f.CreatedUTCDtm >= @BeginDate
AND f.CreatedUTCDtm < @EndDate
Et voila. The engine can use an index because it has stats on what sort of values the table contains, and it knows what values it will be looking for. And it doesn’t have to call a function 45 million times.
After this simple change, and the addition of an index on CreatedUTCDtm, execution time time on the procedure in question dropped from 25 minutes to a minute and a half – still too long (there are other inefficiencies to kill off) but a net saving of more than 23 minutes. Which I like.
If You Are Looking For The Other Blog
I tried to upgrade to Wordpress 3.4.1, and it is hosed.
I can’t even roll back. Yay open source software!