Monday, March 17, 2008

New Isolation Levels Available in SQL Server 2005

New Isolation Levels Available in SQL Server 2005 By : Brad McGeheeFeb 07, 2007 Printer friendly One of the more mysterious features of SQL Server is isolation levels. Whenever a statement is executed, or a data modification is made, it runs under the influence of an isolation level. The isolation level directly impacts the performance of SQL Server, along with the integrity of the data. So what exactly is this mysterious feature all about? Essentially, an isolation level determines the degree to which a data modification is isolated from another data modification, or a database modification is isolated from a statement (such as a SELECT statement), or how a statement is isolated from another statement. Let's look at some examples of this. First, let's say that User A wants to update row number 25 in a particular table. Let's also say that User B also wants to update row number 25 in the same table at the same time. This presents a problem, as we don't want a situation where different users are trying to modify the same row at the same time. This is an example where we want to isolate one data modification from another data modification in some controlled manner in order to prevent a situation where there is bad data stored in the row. Storing bad data is always a bad thing. Second, let's say that User C wants to update row number 50 in a particular table. Let's also say that User D wants to run a SELECT statement against row number 50 in the same table at the same time. So do you allow the SELECT to run against the row while the modification is taking place, or not? This is an example of where we might or might not want to isolate a data modification from a SQL Server statement in order to prevent the possibility of returning bad data. For example, let's say that User C has started a transaction that has updated the values stored in the columns in row 50, but that the transaction has not yet completed. Now, User D wants to run a SELECT against row 50 in order to return the columns in row 50, and does so, returning the values that were just modified by the on-going transaction. But what if User C's transaction is now rolled back and the original data are reverted to in row 50? This now means that the SELECT statement run by User D has returned bad data, as the data returned is not the actual data currently stored in row 50. In some cases, returning bad data like this is a very bad thing. But in some cases, returning some occasional bad data like this is not much of problem (or is so unlikely that the odds of it happening are slim). In other words, isolation levels allow us to control exactly the way SQL Server statements and transactions can work so as to minimize or eliminate bad data. In some cases, a little bad data is OK, but in other cases, there can never be any bad data. More specifically, isolation levels are used to control the following: If any locks are taken when data is read, and what kind of locks are being requested. How long read locks are kept. And last, they determine if a read operation acting on a row that is experiencing a data modification, (1) blocks the data modification until the row is unlocked, (2) retrieves the committed version of the row before the start of the data modification, or (3) reads the uncommitted data modified by the data modification. Traditionally, SQL Server has supported four isolation levels: Read Uncommitted: This is the lowest isolation level. It only isolates transactions and activities to ensure that physically corrupt data is never read. It allows dirty reads, nonrepeatable reads, and phantom reads. Read Committed: This isolation level does not permit dirty reads, but does allow nonrepeatable reads and phantom reads. This is the default isolation level for SQL Server, and is used for each connection to SQL Server unless one of the other isolation levels has manually been set for a connection. Repeatable Read: This isolation level does not permit dirty reads or nonrepeatable reads, but does allow phantom reads. Serializable Read: This is the highest isolation level and ensures that all transactions and statements are completely isolated from each other. It does not allow dirty reads, nonrepeatable reads, or phantom reads. When we look at the above, there is a relationship between data integrity and concurrency. For example, Read Uncommitted provides the best concurrency, but the least data integrity. On the other extreme, Serializable Read provides the worst concurrency, but the highest level of data integrity. As you know, the higher the concurrency, the better SQL Server performs, and the lower the concurrency, the worse SQL Server performs. As a DBA, it is your job to balance the needs between data integrity and concurrency by selecting an appropriate isolation level. Of course, in the real world, most DBAs or developers don't bother setting a specific isolation level for a connection, so the default isolation level of Read Committed is most commonly used. In SQL Server 2005, two new isolation levels are introduced, both of which use row versioning. They include: READ_COMMITTED_SNAPSHOT (statement level) ALLOW_SNAPSHOT_ISOLATION (transaction level) The purpose of these new isolation levels is to give DBAs or developers more granularity in their choice of isolation levels, with the goal of improving read concurrency. When read concurrency is increased, there are fewer locks to be managed by SQL Server. This results in less locking resources required by SQL Server, along with less blocking. This in turn boosts SQL Server's performance. Before we talk more about these two new isolation levels, we first must introduce the concept of row versioning, which is an integral part of how these two isolation levels are implemented. Row versioning is not used by the four older isolation options. The concept of row versioning is not new to SQL Server, as SQL Server has been using it for years with triggers. For example, when a DELETE trigger is executed for a row, a copy of that row is stored in the "deleted table" just in case the trigger is rolled back and the deleted row needs to be "undeleted." In a sense, the row is versioned, and if need be, can be reused. Row versioning for isolation levels is very similar, though not identical to row versioning for triggers. When a row versioning-based isolation level (which includes the two new ones we are now discussing) is enabled at the database level, the database engine maintains versions of each row that is modified (for an entire database). Whenever a transaction modifies any row, an image of the row before the modification is copied into a page of what is called the version store. The version store is located in the tempdb database and is used for temporary storage of versioned rows for all of the databases on a single SQL Server instance. When a transaction or SQL Server statement specifies the use of a row versioning isolation level, it views the data as it existed at the start of the statement or transaction, instead of protecting all reads with locks. Using row versioning reduces the chance that a read operation will block other transactions, boosting performance. Row versioning also prevents users from reading uncommitted data and prevents multiple users from attempting to change the same data at the same time. Now that you know a little bit about row versioning, let's look at each of the new isolation levels, one at a time. READ_COMMITTED_SNAPSHOT is actually a variation of the default READ_COMMITTED isolation level. It uses row versioning, instead of locking, to provide read consistency at the SQL Server statement level. When a statement runs that specifies the READ_COMMITTED isolation level (the default isolation level), and the READ_COMMITTED_SNAPSHOT option is turned on at the database level, all statements see a snapshot of the data as it existed at the start of any current transaction. It uses the row-versioned snapshot of the row to return data, and no locking is needed by the statement, which is normally the case. The biggest benefit of this isolation level is that reads do not block writes and writes do not block reads. Writes can still block writes, but this is necessary to prevent data corruption. Let's look at an example of this. First, we assume that READ_COMMITTED_SNAPSHOT is turned on for the database. I'll show you later how to do this. Second, let's assume that User A has started a transaction that will modify row 25 of a table. When the transaction starts, the database engine takes a snapshot of row 25 and stores it in the data store. Third, let's assume that User A's transaction is still running and that User B runs a SELECT statement to return the columns of row 25. Assuming that User B's SELECT statement is running under the READ_COMMITTED isolation level (the default), then the SELECT statement will return the snapshot of row 25 that was taken before User A's transaction began. The SELECT statement does not need to place any locks on the row, nor does it have to wait until User A's transaction is complete. READ_COMMITTED_SNAPSHOT works well with all databases because when turned on at the database level, any connection that is made to SQL Server that does not specify a specific isolation level uses the default READ_COMITTED isolation level, and will automatically take advantage of row versioning.
ALLOW_SNAPSHOT_ISOLATION is similar to READ_COMMITTED_SNAPSHOT, but it is based at the transaction level, not the statement level. When the ALLOW_SNAPSHOT_ISOLATION is turned on at the database level and the TRANSACTION ISOLATION LEVEL SNAPSHOT isolation level is turned on for the transaction (using the SET command), all statements see a snapshot of the data as it existed at the start of the transaction. The biggest benefit of this isolation level is that it gives you a transactionally consistent view of your data. Any data read will be the most recent committed version of the data. To take advantage of ALLOW_SNAPSHOT_ISOLATION, you not only have to turn it on at the database level, but also must modify your current code so that the TRANSACTION ISOLATION LEVEL SNAPSHOT isolation level is set for it. Since this is a new feature of SQL Server 2005, you have to either modify your old code, or wait until you write new code to take advantage of this feature. Unlike the four older isolation levels, the two new isolation levels are implemented differently, which can be a little confusing at first. Keep the following in mind: To use either of these two new isolation levels, they have to be turned on at the database level. This means that all transactions in this database will implement row versioning. You can turn on either READ_COMMITTED_SNAPSHOT or ALLOW_SNAPSHOT_ISOLATION at the database level, or you can turn on both READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION at the database level. If READ_COMMITTED_SNAPSHOT is turned on, then any statement or transaction that is SET to (or defaults to) READ_COMMITTED will use row versioning. If you SET a different isolation level, then that setting overrides the default behavior. If you want to take advantage of ALLOW_SNAPSHOT_ISOLATION, then this setting must be turned on at the database level, AND you must use the SET command to specify TRANSACTION ISOLATION LEVEL SNAPSHOT for the connection. To turn these options on and off at the database level, you use the ALTER DATABASE command. Below are some examples: ALTER DATABASE AdventureWorks SET READ_COMMITTED_SNAPSHOT ON ALTER DATABASE AdventureWorks SET READ_COMMITTED_SNAPSHOT OFF ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION OFF While both of these new isolation levels have pros and cons, READ_COMMITTED_SNAPSHOT is generally recommended over ALLOW_SNAPSHOT_ISOLATION for the following reasons: It uses less tempdb space. It works with distributed transactions. It works with most pre-existing applications. It is not vulnerable to update conflicts. While the main advantage of these new isolation levels is better overall performance, there can be some downsides to using them. They include: Increased use of the tempdb. You may want to watch its size more closely, and also place it on an isolated disk or array for better performance. Update transactions will be slowed down because of the need to version data rows as updates are made. If your database is very data modification intensive, then these isolation levels may not be appropriate for you. They should not be used if you expect update conflicts in you data (more than one connection will update the same data at the same time). If you decide to use one or both of these two new isolation levels, you can quickly determine the status of the database-wide setting by running this statement: SELECT name, snapshot_isolation_state_desc, is_read_committed_snapshot_on FROM sys.databases This produces results like the following:
In addition, there are a series of DMVs (dynamic management views) that you can watch to monitor the effect of using these isolation levels. They include:
sys.dm_db_file_space_usage
sys.dm_db_session_space_usage
sys.dm_db_task_space_usage
sys.dm_tran_top_version_generators
sys.dm_tran_version_store
sys.dm_tran_active_snapshot_database_transactions
sys.dm_tran_transactions_snapshot
sys.dm_tran_current_transaction
sys.dm_tran_current_snapshot
In addition to the above DMVs, there are a number of performance counters you can also use to monitor performance related to these new isolation levels. They include:
Free Space in tempdb (kb)
Version Store Size (kb)
Version Generation Rate (kb/s)
Version Cleanup Rate (KB/s)
Version Store Unit Creation
Version Store Unit Truncation
Update Conflict Ratio
Longest Transaction Running time
Transactions
Snapshot Transactions
Update Snapshot Transactions
NonSnapshot Version Transactions
As you can see, there is a lot to learn about these new isolation levels. You can find even more information in Books Online. If you feel you might be able to take advantage of one or both of these options, be sure you thoroughly test them in a test database before rolling them out to production. Like most performance enhancement tips, this one may help performance, or it may hurt performance. It is your job as the DBA to sort this out for your particular database applications.






No comments: