If You Are Looking For The Other Blog

I tried to upgrade to Wordpress 3.4.1, and it is hosed.

I can’t even roll back. Yay open source software!

Posted in Site Notes | Leave a comment

The Perils of User Defined Functions

As of this writing, I am working in an environment in which things are much degraded. The company grew very fast, grew too big, didn’t have processes in place but instead threw money at problems in the form of hiring more developers to write more undocumented code…

Sound familiar?

Then the shit hit the fan and growth flattened. Typically when that happens the inefficiencies catch up with you, and they did in this case. There was a massive series of layoffs and the company now employs only about 20% of the headcount it did 4 years ago.

Sound familiar?

If so, you know the sort of mess that resulted. For example, as part of a cleanup, I found a process that dutifully loaded data into a table every day. It had been doing so, without error, for 10 years. That data had never been read. Never been cleaned out. Nobody used it. For 10 years.

There were, and are, some brutally inefficient queries running. Really serious beat-the-shit-out-of-the-server stuff, including one that averaged 1.3 BILLION page reads per execution.

Over time, we have improved things, chopping the heads off the hydra one at a time.

I came to this one last week. After kicking off, it runs for an average of 28 minutes on a server with 384 Gb of RAM and 4 – 8 core processors. It doesn’t read all that many pages (considering) – only 60 million pages, which, believe it or not, isn’t all that bad compared to others that are running.

Digging down into the code, I found this:

FROM   dbo.factObscuredTbl f
WHERE  Warehouse.dbo.UTCTimeConvertToPT(f.CreatedUTCDtm) >= @BeginDate
AND    Warehouse.dbo.UTCTimeConvertToPT(f.CreatedUTCDtm) <  @EndDate
 

So what a developer had done was to use a UDF to convert a UTC Datetime to local so that this time could be compared to the calculated start and end dates.

This is bad in two ways – first, when you wrap a column reference in a function, the engine cannot know what the result of that function will be until run time. This means that it cannot make use of an index, should one exist. In this case it didn’t, and the query was performing a table scan (because there is no clustered index on this table, which is a separate issue).

But that’s not the worst part of this. The query must run the function for each row it encounters, then compare that result to the two variables.

There are 45 million rows in this table. That means that in order to find all the rows that qualify, the engine must scan the entire table and apply the date conversion function 45 million times. This, somewhat understandably, takes time, and that is why the stored procedure that this is part of was taking 25 minutes to execute every day.

I hope that the solution is obvious, although none of the developers here seemed to have stumbled on it. Calculate @BeginDate and @EndDates in UTC and then use those dates to compare to the native column:

FROM   dbo.factObscuredTbl f
WHERE  f.CreatedUTCDtm >= @BeginDate
AND      f.CreatedUTCDtm <  @EndDate
 

Et voila. The engine can use an index because it has stats on what sort of values the table contains, and it knows what values it will be looking for. And it doesn’t have to call a function 45 million times.

After this simple change, and the addition of an index on CreatedUTCDtm, execution time time on the procedure in question dropped from 25 minutes to a minute and a half – still too long (there are other inefficiencies to kill off) but a net saving of more than 23 minutes. Which I like.

Posted in SQL | Leave a comment

Overheard

Across the cubicle wall.

I’ve seen some databases that were so normalized… you couldn’t work with them. You know what was a good database? One the users could work with? Paradox. It had this great Query By Example. I wish SQL Server had that.

If you can’t work with a database because it is too normalized, perhaps you need to work in a different field.

Posted in IT Follies | 1 Comment

This Might Be Useful

In SQL Server 2005 and up, Microsoft provides a set of immensely useful views and functions that let you see how and what the engine is doing.

They are colloquially known as the DMVs, ‘Dynamic Management Views’. I cannot stress how useful they are.

Here is an example, and a very useful query. This uses the DMVs to show you the general state of the indexes in a database. It shows you the table name, the number of indexes on that table, how many of those indexes have been unused since either the last restart or the index was (re)created, and how many indexes SQL Server thought it would like to have on the table. In a well tuned database, the unused and missing counts would both be 0. If they are, you have exactly the right number of indexes for the workloads hitting your tables.

If they aren’t, though, this query will show you where you need to start.

First you’ll need to zero on on those unused indexes. They do nothing but slow down inserts, updates, and deletes, and take space. They are pure drag.

In a future post, I will post a query that will show you all unused indexes on a table and their costs.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

SELECT  CONVERT(VARCHAR(30),so.name)    AS TABLE_NAME,
        COALESCE(Unused.IdxCount, 0)    AS IDXCOUNT,
        COALESCE(Unused.UnusedCount, 0) AS UNUSEDCOUNT,
        COALESCE(Missing.MICOUNT, 0)    AS MISSINGCOUNT,
        COALESCE(CONVERT(DECIMAL(6,1), (CONVERT(DECIMAL(10,2),Unused.UnusedCount)/CONVERT(DECIMAL(10,2),Unused.IdxCount)) * 100), 0) AS UnusedPct

FROM    sys.objects so
LEFT JOIN

    (   SELECT      s.OBJECT_ID,
                    COUNT(*)        AS idxcount,
                    SUM(CASE WHEN   s.user_seeks    = 0
                                AND s.user_scans    = 0
                                AND s.user_lookups  = 0
                            THEN    1
                            ELSE 0 END) AS UnusedCount

            FROM    sys.dm_db_index_usage_stats s
            JOIN    sys.indexes     i
            ON      s.OBJECT_ID     = i.OBJECT_ID
            AND     s.index_id      = i.index_id
            WHERE   s.database_id   = DB_ID()
            AND     OBJECTPROPERTY(s.OBJECT_ID, ‘IsMsShipped’) = 0
            GROUP BY    s.OBJECT_ID
        ) Unused
ON      Unused.OBJECT_ID = so.OBJECT_ID
LEFT JOIN ( SELECT  d.OBJECT_ID,
                    COUNT(*) AS MICOUNT
            FROM    sys.dm_db_missing_index_groups  g
            JOIN    sys.dm_db_missing_index_group_stats s
            ON      s.group_handle                  = g.index_group_handle
            JOIN    sys.dm_db_missing_index_details d
            ON      d.index_handle                  = g.index_handle
            WHERE   d.database_id = DB_ID()
            GROUP BY d.OBJECT_ID
            )Missing
ON      Missing.OBJECT_ID = so.OBJECT_ID
WHERE   so.type_desc = ‘USER_TABLE’
AND     (Missing.MICOUNT > 0
        OR Unused.UnusedCount > 0)
ORDER BY UnusedPct DESC 

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

Posted in Indexing | 2 Comments

NOLOCK

If you’ve been around SQL Server and SQL Server development for a while, you will undoubtedly have encountered the NOLOCK hint.

SQL Server allows you to modify how the optimizer processes a query in a wide variety of ways. One of the things you can change is locking behavior – you can modify behavior in an individual query by applying a lock hint, like this:

SELECT *
FROM dbo.Table WITH (NOLOCK)

This is extremely useful in some cases, but as with all things, it is open to misuse, and the NOLOCK hint is widely, widely misused.

Some background is in order. In an RDBMS, we have to balance all things (hence the title of this blog), and one of the things we have to balance is the line between consistency and concurrency. We could queue all database requests, forcing them to proceed one after another, which would mean that our database would never wind up in an inconsistent state. But this would cause insurmountable concurrency issues, reducing our multi-thousand dollar database server to a single-user machine. At the other end of the scale, we could allow anybody to do anything at any time. This would obviously increase concurrency to its greatest possible value, but would result in chaos. You wouldn’t have a database: you’d have a mess.

To address this, the ANSI committee came up with 4 isolation levels. These are, in increasing order of consistency: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. These are implemented by use of locks. A process updating a row locks it while the update occurs, a process inserting new data inserts it with locks on it and only releases those locks once the entire insert is complete, etc. Otherwise, you never know what you’re getting – the new data or the old data, just some of the data or all of it.

But locks introduce another issue, which is known as blocking. A process that wants to read all last nights orders must wait until the process updating those orders has finished doing so. The reading process is said to be blocking on the updating one, the updater is blocking the reader.

This sounds bad. But blocking is a normal and expected part of any relational database that sees any degree of multiuser activity. As long as blocks are of short duration, the database will handle them transparently, and nobody will be any the wiser.

The problem in SQL Server 2000 (and in DB2) is that databases that are abused and that must serve both long-running read operations in addition to everything an OLTP system has to do (insert, updated, delete) will start to suffer extensive blocking. A long read will block updates and deletes, and depending on other circumstances, can block inserts as well. Updates block readers, which block other updates, inserts, deletes… this is all part of how a proper relational database is designed.

Locks and blocks prevent inconsistency, but most procedural programmers don’t understand database consistency anyway, and so to them a lock is a block is bad. Anything that gets around that is, to those lacking understanding, a good thing.

In both SQL Server and DB2 (and some other products with which I have not worked), a common problem is readers blocking writers. When you have a database that has been poorly designed used for both OLTP and reporting, you tend to get heavy reporting queries holding up OLTP activity: the INSERT, UPDATE, DELETE statements sit and wait until the long-running read finishes with the data. The uninformed programmer, seeing the number of processes blocked, hunts on the web and finds the magical NOLOCK query hint.

What NOLOCK does is move a particular query to the READ UNCOMMITTED isolation level. The query does not set any shared locks1, and it doesn’t honour any exclusive locks held by other processes. On a system suffering from poor design and resulting long-running reporting queries holding up normal business activity, this seems like a godsend: just throw NOLOCK on all queries, and the blocking problem magically goes away.

Or does it? What NOLOCK does is move the problem, from somewhere obvious to somewhere obscure. It doesn’t make much difference to data modification processes – those operations have their isolation levels escalated automatically anyway, because UPDATEs that don’t honour locks that other UPDATEs set makes the whole idea of concurrency a complete mockery. Where it makes a difference is in the read operation: because the read doesn’t honour exclusive locks, it will read another transaction’s uncommitted data.

That should scare you.

It will read data that could be rolled back. Allowing such reads violates both the C and the I of ACID (Atomic, Consistent, Isolated, Durable) – data is not consistent during such a read, and an uncommitted transaction is not isolated.

I have encountered a number of problems with NOLOCK (and with its DB2 counterpart, WITH UR) over the years. In SQL Server, it can result in an existing row either being missed or (more commonly) read twice if there is a data movement routine going on – say a page split, or a DBCC REINDEX – this will often manifest as a primary key violation in a routine that, when you look at it, couldn’t possibly be inserting values that already exist.

But if you don’t get immediate, obvious, in-your-face errors like primary key violations, the data rot that NOLOCK can induce is subtle and nearly impossible to track down. Say you have a summary routine that runs with NOLOCK. Say it reads a row of financial data that is later rolled back and adds the value there to the summary for the day, or hour, or whatever. That aggregate value is now incorrect, and anything derived from it is incorrect. And even a guy like me, who has a lot of experience and skill tracking down things like this, I will never be able to prove that this was caused by reading uncommitted data. The best I will be able to do is to say, well, on this day the summary was out.

NOLOCK is dangerous. The only people who should use it are people who understand the implications, and it should almost never be used in any sort of saved routine (a stored procedure, for example).


  1. This is generally true, but not strictly true: any read, for example, running at READ UNCOMMITTED will still set a schema stability lock – you won’t be able to ALTER the table while the read operation is running.
Posted in ANSI Standard, T-SQL | Leave a comment

Note To Self

If anyone asks why you insist on using surrogate keys, point them to this.

Posted in IT Follies | Leave a comment

I Wonder

I wonder if it is possible to solve Einstein’s Riddle using SQL.

The riddle is attributed to either Albert Einstein or Lewis Carroll, although there is no evidence that either of them had anything to do with the riddle, and it has various forms although the basic riddle remains the same.

The Riddle

In a town, there are five houses, each painted with a different color.
In every house leaves a person of different nationality.
Each homeowner drink a different beverage, smokes a different brand of cigar, and owns a different type of pet.

The Question

Who owns the fishes?

Hints

The Brit lives in a red house.
The Swede keeps dogs as pets.
The Dane drinks tea.
The Green house is next to, and on the left of the White house.
The owner of the Green house drinks coffee.
The person who smokes Pall Mall rears birds.
The owner of the Yellow house smokes Dunhill.
The man living in the center house drinks milk.
The Norwegian lives in the first house.
The man who smokes Blends lives next to the one who keeps cats.
The man who keeps horses lives next to the man who smokes Dunhill.
The man who smokes Blue Master drinks beer.
The German smokes Prince.
The Norwegian lives next to the blue house.
The man who smokes Blends has a neighbor who drinks water.

This is a set-relational problem, and it might be possible to solve it with SQL. Challenging, for sure, but solvable. Most programmatic solutions seem to use recursion or long iterative statements to solve this, and I think it can probably be done in a single SQL statement.

Posted in SQL | 2 Comments

Great Advice

Paul S. Randal knows what he is talking about. And he has written a list of the top 10 things that DBAs should do.

There is sound advice there.

Posted in Administration | Leave a comment

1 Billion

My boss was stomping around agitated because The Business has some heavy data requests. He had written something some time ago that got them something close to what HE thought THEY thought they needed.

I knew that this wasn’t what they really needed, but let him go ahead and send a sample of the data off anyway. Most times when people senior to you have their minds made up, there’s no point in arguing.

“It’s pretty fast,” he said, “if you need the code. It takes about an hour per partition.”

On our system, some of the really big tables are pseudo-partitions – separate tables with identical structures and similar names. Orders1, Orders2, Orders3, etc. He meant it took about an hour to process each one of these tables.

What I really needed was a subset of that data. About 12%, as it turns out. So I politely acknowledged his offer of Free Code! and wrote something that processed the first 16 partitions. It completed in about 2 hours 20 minutes.

As a measure of how much data this is, this is the rowcount of the largest table after the first 16 partitions were done. Remember, this rowcount is about 12% of the data.

———–
109734329

I have another set of partitions to process today, which will a bit more than double this total.

Yes, if take the time to do the math you will see that the base data is more than 1 billion rows.

Posted in Performance, T-SQL | Leave a comment

Procedure Query Plans in Cache

If you remember, SQL Server takes your stored procedure and rewrites it, then prepares a plan. This process is called ‘compilation’. If it can, it then executes the prepared plan rather than interpreting the query anew each time.

These plans are stored in an area called the ‘procedure cache’. Plans are retained in the cache based on a number of factors such as amount of server memory, plan cost, and current memory pressure. The engine will get rid of infrequently used plans before frequently used ones, and cheap plans before expensive ones.

You can, and often will, get multiple plans for some procedures in cache. Predicting how many plans there will be is not an exact science, but in general the more complex the arguments to your procedure, the more plans will be in cache, because the engine will create a different plan if you’re looking for all orders from Chicago than if you’re looking for all orders with total > $1000.

Here is a query that will let you see all plans in cache for a given object (a stored procedure is an object in SQL Server), as well as some useful and interesting information about those plans.

Apologies for the style – the mix of camelcase and under_score names is due to my schizophrenia on this point. I used to be exclusively one, but this place uses the other, and when I write queries that use DMVs I get all split-personality like.

Usecounts = number of times the plan has been used.
Plan_comp_time = the time that this plan was compiled.
Last_execution_time = the time that this plan was last executed.
HoursOld = length of time, in hours, that this plan has been in cache.
HoursUntouched = length of time, in hours, since this plan was last executed.
Execution_count = number of times this plan has been executed.
AvgReads = average number of pages read during an execution.
Max_logical_reads = the highest number of pages read during an execution.
Avg_time = average time, in milliseconds, that this plan took to complete.
Total_elapsed_time = total time – an indicator of total cost, since this is the sum of all times for this plan.
last_elapsed_time = the time for the last execution.
min_elapsed_time = the shortest time for an execution
max_elapsed_time = the longest time for an execution

SELECT  CONVERT(VARCHAR(36),o.name) AS ROUTINE,
        cp.usecounts,
        qs.creation_time AS plan_comp_time,
        qs.last_execution_time,
        CONVERT(VARCHAR(5), DATEDIFF(hh, qs.creation_time, GETDATE())) AS HoursOld,
        CONVERT(VARCHAR(5), DATEDIFF(hh, last_execution_time, GETDATE())) AS HoursUntouched,
        qs.execution_count,
        qs.total_logical_reads/qs.execution_count AS AvgReads,
        qs.max_logical_reads,
        qs.total_elapsed_time/qs.execution_count AS AvgTime,
        qs.total_elapsed_time,
        qs.last_elapsed_time,
        qs.min_elapsed_time,
        qs.max_elapsed_time
FROM    sys.dm_exec_cached_plans    cp
JOIN    sys.dm_exec_query_stats qs
ON      cp.plan_handle = qs.plan_handle
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
JOIN    sys.objects o
ON      o.OBJECT_ID = st.objectid
WHERE   o.name LIKE ‘%SomeProcedureName%’
– ORDER BY hoursuntouched DESC
 
Posted in Performance, T-SQL | 1 Comment