Wednesday, December 5, 2012

How to get thousands of dollars worth of database consulting - free!

Most people who've done some in-depth research online to resolve a technical problem have run into websites like StackOverflow and SQL Server Central.  What many people don't know, though, is that often the people who post answers on sites like these are extremely well-qualified.  You could probably never hire them, but there they are, giving away their knowledge for free.

Why do they do it?  Lots of reasons - social interaction, to build an online presence, to contribute to the  community.

Mainly, though, it's all about the reputation, and getting points when someone marks your answer as correct, or "likes" your comment.

What does this have to do with you?  Simple.  If you have a technical problem that you've researched online and can't solve, go ahead and post a question on one of these sites.  StackOverflow is my personal favorite.

Posting a good question, though, is a huge component of getting a good answer.  Be sure to:

  • Include a script to reproduce the specific problem that you are encountering.  This includes all the data setup (tables, insert statements, etc)
  • Carefully exclude everything that is not critical to explain your problem.  I've seen lots of questions posted that include many extra columns, tables, etc, which didn't contribute to a reader's understanding of the problem.  Including nonessentials actively works again the reader's understanding of the problem.  They probably won't bother trying to figure it out.  This step is time-consuming, but critical.  And you may even figure out the problem yourself once you've cut it down to essentials.
  • Include what you've tried so far and what research you've done.

For an example of a question presents a very straightforward picture, take a look at this StackOverflow question that I posted:  Pivot query to return multiple repeating groups.   Notice that I've included code that can be run with no modification, the desired output, and a potential solution that I'd like to improve on.

I got two great answers from this that the posters put a substantial amount of time into.  Regardless of whether the answer works for me, I always put in a comment with some words of appreciation.

Here's a blog post I wrote recently: How to use a column name as an argument in a TOP clause.  I wasn't looking for answers here, but this is a good example of taking out everything extraneous until you have just the core of the example.  For instance, instead of using 2 columns, ID and a Name, I just use the name (for both the Customer table and the Market table), just to make the example easier to follow.

Monday, November 26, 2012

Quick script to grant read only and view definition on all databases



This is a handy little script to use for development databases, when you want all members of a certain domain group to be able to see all data in all databases, and view all definitions.  Feel free to modify as desired!


EXEC sp_MSForEachDb ' USE ? IF NOT EXISTS (SELECT * FROM DBO.SYSUSERS WHERE NAME = ''sea\DevUsers'' ) CREATE USER [sea\DevUsers] FROM LOGIN [sea\DevUsers]
EXEC sp_addrolemember db_datareader, [sea\DevUsers] GRANT VIEW DEFINITION TO [sea\DevUsers] '

Friday, November 9, 2012

Wildcard pattern matching with metadata table

We had a situation recently where test records, created by processes upstream of our data warehouse, were causing issues with our data. Since the test records were created by many different groups, their names didn't follow any pattern, so they were not easy to exclude. We ended up having where clauses like this in many different stored procedures:
Where
    StoreName not like '% test st%'
    and StoreName not like '% test 2%'
Keeping these updated across all the stored procedures was a hassle, so test records were continually getting into our data and causing problems. I decided to create one metadata table for all the test patterns, and then create a view to join to that table. The view would be used to exclude all the test records whenever necessary. First, let's create the sample data:
Create Table TestStoreNameWildcard(WildcardPattern varchar(50))
Insert into TestStoreNameWildcard 
values 
    ('% test st%')
    ,('% test 2%')
    ,('% test 3%')
    ,('%test region%')

Create table Store (StoreID int, StoreName varchar(50))
Insert into Store values 
    (1, 'Corner Store')
    ,(2, 'ABC Store')
    ,(3, 'Region A Test Store')
    ,(4, 'Test Region 1 Store')
    ,(5, 'Region 5 Test 2 Store')
    ,(6, 'Target')
    ,(7, 'Contest store')
    ,(8, 'Ann''s Book Store')
Now create a view that returns only those stores which fit our pattern, joining with "like".
Create view vTestStore as
Select
    StoreID
    ,StoreName
from Store
    join TestStoreNameWildcard    
        on Store.StoreName like TestStoreNameWildcard.WildcardPattern

Now, when you run this select:
Select * from vTestStore    
You'll see only those stores that we've identified as test stores. The view vTestStore can be used to exclude test records wherever needed. If performance is a problem, it could be created as a table or indexed view.
StoreID     StoreName
----------- --------------------------------------------------
3           Region A Test Store
5           Region 5 Test 2 Store
4           Test Region 1 Store

Saturday, September 29, 2012

Develop better database code in one-tenth the time

Okay, maybe one-tenth the time is an exaggeration.  But in some cases, it's possible to cut down development time tremendously.

When working on complex ETL processes, or huge stored procedures, the sheer size of the data and processes often make the development process much slower than it needs to be.  For instance, if when making a fix you need to run a stored procedure that takes 20 minutes, then every single minor mistake you make in your code will cost you at the very least more than 20 minutes to find and fix.

However, if you're able to cut down your code or data in such a way that it runs in one minute in the development environment, you've just improved the speed of development tremendously, because each iteration of fixes will take much less time.

So, what are some ways to do this?  The first would be to, if possible, reduce the size of the data you're working with in your development environment.  Can you truncate most of the data, leaving only what's critical to test your fix?  This can be a little tricky in a database with a lot of foreign keys set up.  But on the other hand, once you have a script to trim your data, you can reuse it.  Of course, before actually releasing code tested in an environment with limited data, you'd want to test it in an integration environment with a full data set.

The next best way is to ruthlessly trim the process until you have the chunk of code that matters, and only work with that.  For instance, recently I was working on a stored procedure that usually takes about five minutes to run.  Five minutes isn't that bad, but I was working on some tricky algorithm changes and I knew I'd need to be iterating a lot before I got it right.  So, instead of putting in my changes and running them (and leaving myself 5 minutes each run to get distracted), I just isolated the subset of the process that really mattered, commented out everything else I could, and wrote the data to a global temporary table that I could then examine for problems.  So instead of having every single problem show up only after 5 minutes, I could find it after just a few seconds.  I ended up getting it done very quickly, because there was no down-time.

I find it's more fun to work this way, too.  If you constantly  have these 5 or 10 minute chunks when you're just waiting for something to happen, it's easy to get bored and distracted, and forget the details of what you're working on.












Monday, March 19, 2012

How to use a column name as an argument in a TOP clause

I needed to use a column name as an argument in a TOP clause recently in SQL Server.  I had to research and experiment quite a bit to find what I was looking for, so I put together some sample code that should help get you started, if you're trying to do the same thing.

Note that in the last result set, with the Top clause, the number of records from each Market corresponds to the associated value in the MarketType.TopCustomersToGet. I'm using a cross apply here, which worked for me. Be sure and test performance, some sources indicate that you might get better performance with a CTE.

Declare @MarketType table (MarketTypeID tinyint, MarketImportance varchar(10), TopCustomersToGet int)
insert into @MarketType  values
(1  , 'High', 4)
,(2 , 'Low' , 2)

Declare @Market Table (MarketName varchar(20), MarketTypeID int) 
insert into @Market values
('Paris'    ,1)
,('London'  ,1)
,('Miami'   ,2)
,('Seattle' ,2)

Declare @Customer table (MarketName varchar(20), CustomerName varchar(20), CustomerRank int)
insert into @Customer  values
('Paris','Wayne',1)
,('Paris','Colleen',2)
,('Paris','Manuel',3)
,('Paris','Michelle',4)
,('Paris','Jesse',5)
,('London','Jenny',1)
,('London','Patrick',2)
,('London','Megan',3)
,('London','Alice',4)
,('London','Olga',5)
,('Miami','Brandon',1)
,('Miami','Alfonso',2)
,('Miami','Benjamin',3)
,('Miami','Harry',4)
,('Miami','Stephen',5)
,('Seattle','Willie',1)
,('Seattle','Allen',2)
,('Seattle','Megan',3)
,('Seattle','Danny',4)
,('Seattle','Manuel',5)

Select * from @MarketType
Select * from @Market
Select * from @Customer

Select 
    MarketType.MarketImportance
    ,Market.MarketName
    ,TopCustomers.CustomerName
From @Market Market
    join @MarketType MarketType
        on Market.MarketTypeID = MarketType.MarketTypeID 
Cross apply (
    Select top 
        (MarketType.TopCustomersToGet)
        CustomerName
    from @Customer Customer
    where 
        Customer.MarketName = Market.MarketName
    order by
        Customer.CustomerRank desc
    ) as TopCustomers