Friday, May 16, 2014

How to search for a string in your SQL Server database code


*****************************************************************
EDIT May 27, 2014: Forget this!  Really, just forget it and use the Red Gate SQL Search tool.  I just downloaded it and have been using it all morning. It's free, functional, searches EVERYTHING at the same time (job steps, tables, columns, etc), and has a very usable interface. This will replace ALL of my queries against system objects.

One tip - if you need to search different servers, you may find it difficult to switch database servers within the Red Gate SQL Search tool. The drop down doesn't appear to work. I've found that the only way to do that is to close the tool, go to the server you want in Object Explorer, and then open the tool again.  You will now be searching the correct server.
*****************************************************************

As anyone who's been a developer for any length of time knows, what you're doing most frequently is NOT working with a fresh slate on a brand new project. No, it's less fun but much more common that you'll be working on enhancing, fixing, modifying, or otherwise updating code that's already out there. And this is another thing you figure out very soon in your career - documentation for the average database product that's internal to a business is almost always either non-existent, or inaccurate.

So, a very important skill for the developer is the ability to research, and figure out what's out there already.

I like to use the following query to search for code. It will return all the code (including views, stored procedures, functions, etc) that contain the string (table name, view name, stored procedure name, etc) you're searching for. There are add-ons that have similar functionality, but I'm not always able to install an add-on.

A huge bonus with this query is the XML field that contains the definition. When you click on it, you can see the actual text of the definition in a new window - no sp_helptext needed!

 SELECT  
   Type =  
     case  
       when OBJECTPROPERTY(object_id, 'IsProcedure') = 1 then 'Procedure'  
       when OBJECTPROPERTY(object_id, 'IsView') = 1 then 'View'  
       when OBJECTPROPERTY(object_id, 'IsTableFunction') = 1 then 'TableFunction'  
       when OBJECTPROPERTY(object_id, 'IsScalarFunction') = 1 then 'ScalarFunction'  
       else 'other'  
     end  
   ,Name = OBJECT_NAME(object_id)   
   ,Definition = CAST((SELECT definition AS [processing-instruction(x)] FOR XML PATH('')) AS XML)   
 FROM sys.sql_modules  
 WHERE  
   Object_definition(object_id) LIKE '%hier%'  
 order by 2, 1 
This is the output you'll get. Try clicking on the definition field to open it up in a new window.










Monday, April 14, 2014

Easily parse output of Set Statistics IO in SQL Server


****************************************************************
EDIT June 19, 2014 - I saw a new tool online that looks easier than using my code. Check it out - http://www.statisticsparser.com/
****************************************************************


Recently I was performance tuning a very involved, gigantic stored procedure. Figuring out exactly where the performance problems were was a challenge, so I turned on Statistics IO.  I'm not going to get into the details of Statistics IO here, you can easily find that online.  The bottom line is that I wanted to figure out how to easily get the most important metric, Logical Reads.  But scanning the enormous volume of text generated was tedious, and I looked online to see what options I had to make the parsing easier and faster.

Vicky Harp has an interesting article on using Excel to parse the output of Statistics IO.  I tried it out, and it works, but I preferred to stick with SQL Server tools if possible, instead of needing to copy and paste formulas into Excel.

So I wrote the stored procedure below to parse the output.  After doing some data manipulations, it uses XML to easily parse the desired values out of the Statistics IO output, and returns a result set with the table name and metrics for each line of output in Set Statistics IO.

Note!!  The output of Set Statistics IO includes single quotes around the table names.  For instance:

 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 Table 'sysobjrdb'. Scan count 1, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 Table 'sysschobjs'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 Table 'spt_values'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  

Since you can't use any text with single quotes in it as a parameter to a stored procedure, you'll need to replace the single quotes with an empty string. I'm skipping the details of how to do this - if you know enough about SQL Server to be worrying about Logical Reads, you don't need step-by-step instructions.

Once you've replaced the single quotes, use that text as a parameter for the stored procedure below, sp_StatisticsIO_OutputParse.  For example:

 sp_StatisticsIO_OutputParse  
 '  
 Table Worktable. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 Table sysobjrdb. Scan count 1, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 Table sysschobjs. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 Table spt_values. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 Table Worktable. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 Table sysscalartypes. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
 '  

I hope this helps in your troubleshooting endeavors! And if you can figure out a way to avoid the replacing of the single quotes, please let me know.

 Create procedure dbo.sp_StatisticsIO_OutputParse @Output varchar(max)   
 as   
 /*  
 This stored procedure is designed to parse out the output you get when you turn on   
 Set Statistics IO (under Tools, Options, SQL Server Execution, Advanced), or just run Set Statistics IO in a query window).   
 ! IMPORTANT - I didn't figure out how to deal with the single quotes around the table name in the output of Statistics IO.   
 So, in order to run this properly, you need to replace all single quotes with an empty string.  
 Copy the output without the single quotes, then use the output as the parameter (surrounding the whole output with single quotes) for   
 this procedure, sp_StatisticsIO_OutputParse.  
 */  
 Select @Output = replace (@Output, ' reads', '_reads')  
 Select @Output = replace (@Output, 'Table ', '<Detail Table="')  
 Select @Output = replace (@Output, 'reads ', 'reads="')  
 Select @Output = replace (@Output, '. Scan count', '" Scancount="')  
 Select @Output = replace (@Output, '.', '" />')  
 Select @Output = replace (@Output, ',', '"')  
 Select @Output = replace (@Output, 'lob ', 'lob_')  
 Select @Output = replace (@Output, '-', '_')  
 Declare @OutputXML xml = '<Output> ' + @Output + ' </Output>'  
 SELECT   
   A.value('@Table[1]', 'VARCHAR(20)') as table_name  
   ,A.value('@Scancount[1]', 'int') as scan_count  
   ,A.value('@logical_reads[1]', 'int') as logical_reads  
   ,A.value('@physical_reads[1]', 'int') as physical_reads      
   ,A.value('@read_ahead_reads[1]', 'int') as read_ahead_reads  
   ,A.value('@lob_logical_reads[1]', 'int') as lob_logical_reads    
 into #tmpOutput    
 FROM @OutputXML.nodes('Output/Detail') AS A(A)  
 Select * from #tmpOutput  

Thursday, March 27, 2014

Domain data: three tips for more effective management


I ran into a situation recently that required lots of troubleshooting, inconvenience and painstaking research. I was merging and normalizing some data, and using a particular data set (let's call it the Employee table) that should have been standardized.  In other words, there should have been one set of data that everyone at the company used (ignoring for now issues like sensitive fields, such as Salary).

Instead, different groups in the company were using their own version of the Employee table - with different primary keys (when there were primary keys), categorized differently, some with the latest data, some with data up till the end of last year. Overall, the differences between the data sets weren't huge - probably 15% or less - but when you're creating a dataset that ultimately needs to be able be joined with a dataset from another group, you want to be able to match up without too much pain.  Otherwise the numbers won't match, nobody will trust your data, and reporting and cross-division communication can grind to a halt.

So, what can we do about this?   Here are some potential fixes to try:

  • Be very clear with upper management about the costs of multiple data sources for domain data that should only have one master.  It's tricky because the costs can be hidden - it just takes everyone much longer to do cross-departmental work, everyone needs to research discrepancies in datasets that should match, but don't.
  • Have a company wide "Data" Q & A forum or email alias (a forum is preferable, because it's more easily searchable) where people can post data related question. Transparency is always good, and the more people know about the various domain data sources, the better decisions they can make. 
  • Get the group responsible for domain data management to be as responsive as possible to user requests. Usually the reason why people create their own datasets is because they can't get the original manager of the data to respond to their needs.

The underlying problem, which I believe is related to the lack of choice and competition internally within companies, is more difficult one.  I'm hoping to write up another post on it soon







Thursday, June 20, 2013

Four Tips on Troubleshooting



I just had a few wearying hours of troubleshooting a Sybase linked server issue on SQL Server 2005.   I  used most of the troubleshooting techniques that I know, so I figure it's a pretty good illustration of troubleshooting in general.

Basically, I needed to import some data from Sybase into SQL Server.  Previously, there was a separate extract done from Sybase to text files, but there were issues with that, so I wanted to create a linked server to the Sybase server, and use that instead.

Installing the Sybase drivers was easy.  Creating the ODBC data source went well, I was able to test that the connection worked in the ODBC data source administrator tool.

Setting up the linked server in SQL Server - that's where the problems started.  I had numerous issues, and finally found an article with some basic information on Sybase linked server setup in SQL Server, which helped somewhat.   But I had all kinds of problems that weren't addressed in the article.  The biggest hurdle was the error "The OLE DB provider "MSDASQL" has not been registered".  It was tough finding the right way to register the dll, I went down many dead ends.  Finally I found the correct tool to download - the 64-Bit OLEDB Provider for ODBC (MSDASQL).

Once I'd gotten past that hurdle, I ran into issue with the Sybase error "Could not load code page for requested charset".  I found some information on this fairly quickly here, but the data wasn't complete and I had to do a fair bit of research on which charset was appropriate to use.

Then finally, I figured out the correct way to set up the ODBC DSN to use the appropriate charset and...success.   I was finally able to get data from Sybase into SQL Server via the linked server!

So, what pieces of advice can I glean from this experience?

  • Persistence is your friend.  Keep at the problem, attack it from different angles.  I didn't even write about all the dead ends I came across when working on this, but there were lots.
  • Research is key.  Search for the exact error string on Google, search with more generic terms.  I also found it useful to search company websites (i.e. the Sybase knowledge base), because many times the information on these sites is NOT indexed in Google.  The totally number of distinct Google searches I did related to this issue was about 30 - and that's just the searches, not all the reading I did.
  • Simplify.  Get the basics working before adding complexity. I made sure that before I tried connecting via a linked server, I could connect via the ODBC DSN.  And before connecting via the ODBC DSN, I tested the connection via the Sybase tools.
  • Take a rest from the issue!  My breakthrough moment came when I walked around the office to a window that had a view, and relaxed a bit.  My mind was working on the problem subconsciously and then WHACK - a moment of insight!  

Good luck!





Wednesday, December 5, 2012

How to get thousands of dollars worth of database consulting - free!

Most people who've done some in-depth research online to resolve a technical problem have run into websites like StackOverflow and SQL Server Central.  What many people don't know, though, is that often the people who post answers on sites like these are extremely well-qualified.  You could probably never hire them, but there they are, giving away their knowledge for free.

Why do they do it?  Lots of reasons - social interaction, to build an online presence, to contribute to the  community.

Mainly, though, it's all about the reputation, and getting points when someone marks your answer as correct, or "likes" your comment.

What does this have to do with you?  Simple.  If you have a technical problem that you've researched online and can't solve, go ahead and post a question on one of these sites.  StackOverflow is my personal favorite.

Posting a good question, though, is a huge component of getting a good answer.  Be sure to:

  • Include a script to reproduce the specific problem that you are encountering.  This includes all the data setup (tables, insert statements, etc)
  • Carefully exclude everything that is not critical to explain your problem.  I've seen lots of questions posted that include many extra columns, tables, etc, which didn't contribute to a reader's understanding of the problem.  Including nonessentials actively works again the reader's understanding of the problem.  They probably won't bother trying to figure it out.  This step is time-consuming, but critical.  And you may even figure out the problem yourself once you've cut it down to essentials.
  • Include what you've tried so far and what research you've done.

For an example of a question presents a very straightforward picture, take a look at this StackOverflow question that I posted:  Pivot query to return multiple repeating groups.   Notice that I've included code that can be run with no modification, the desired output, and a potential solution that I'd like to improve on.

I got two great answers from this that the posters put a substantial amount of time into.  Regardless of whether the answer works for me, I always put in a comment with some words of appreciation.

Here's a blog post I wrote recently: How to use a column name as an argument in a TOP clause.  I wasn't looking for answers here, but this is a good example of taking out everything extraneous until you have just the core of the example.  For instance, instead of using 2 columns, ID and a Name, I just use the name (for both the Customer table and the Market table), just to make the example easier to follow.

Monday, November 26, 2012

Quick script to grant read only and view definition on all databases



This is a handy little script to use for development databases, when you want all members of a certain domain group to be able to see all data in all databases, and view all definitions.  Feel free to modify as desired!


EXEC sp_MSForEachDb ' USE ? IF NOT EXISTS (SELECT * FROM DBO.SYSUSERS WHERE NAME = ''sea\DevUsers'' ) CREATE USER [sea\DevUsers] FROM LOGIN [sea\DevUsers]
EXEC sp_addrolemember db_datareader, [sea\DevUsers] GRANT VIEW DEFINITION TO [sea\DevUsers] '

Friday, November 9, 2012

Wildcard pattern matching with metadata table

We had a situation recently where test records, created by processes upstream of our data warehouse, were causing issues with our data. Since the test records were created by many different groups, their names didn't follow any pattern, so they were not easy to exclude. We ended up having where clauses like this in many different stored procedures:
Where
    StoreName not like '% test st%'
    and StoreName not like '% test 2%'
Keeping these updated across all the stored procedures was a hassle, so test records were continually getting into our data and causing problems. I decided to create one metadata table for all the test patterns, and then create a view to join to that table. The view would be used to exclude all the test records whenever necessary. First, let's create the sample data:
Create Table TestStoreNameWildcard(WildcardPattern varchar(50))
Insert into TestStoreNameWildcard 
values 
    ('% test st%')
    ,('% test 2%')
    ,('% test 3%')
    ,('%test region%')

Create table Store (StoreID int, StoreName varchar(50))
Insert into Store values 
    (1, 'Corner Store')
    ,(2, 'ABC Store')
    ,(3, 'Region A Test Store')
    ,(4, 'Test Region 1 Store')
    ,(5, 'Region 5 Test 2 Store')
    ,(6, 'Target')
    ,(7, 'Contest store')
    ,(8, 'Ann''s Book Store')
Now create a view that returns only those stores which fit our pattern, joining with "like".
Create view vTestStore as
Select
    StoreID
    ,StoreName
from Store
    join TestStoreNameWildcard    
        on Store.StoreName like TestStoreNameWildcard.WildcardPattern

Now, when you run this select:
Select * from vTestStore    
You'll see only those stores that we've identified as test stores. The view vTestStore can be used to exclude test records wherever needed. If performance is a problem, it could be created as a table or indexed view.
StoreID     StoreName
----------- --------------------------------------------------
3           Region A Test Store
5           Region 5 Test 2 Store
4           Test Region 1 Store