Tuesday, March 31, 2009

Enhancing the readability of your code: Table aliasing in sql

Aliasing tables can be a topic that database developers feel very strongly about. I'm actually one of the developers that do use aliases, but there's a right way to do it, and there's a wrong way.
First, the wrong way (using the pubs sample database):

select
c.Au_id
,c.au_lname
,c.au_fname
,a.title
,a.title_id
,b.royaltyper
from titles a
join titleauthor b
on a.title_id = b.title_id
join authors c
on c.au_id = b.au_id


Don't laugh. I've actually seen some sql done exactly like this. Reading and understanding it later on takes far longer than it should (now what did b stand for again?).

Most people would agree that the aliasing in the above example is terrible, and most sql isn't written that way. However, I'd say about 90% of sql is written like the example below:

select
au.Au_id
,au.au_lname
,au.au_fname
,ti.title
,ti.title_id
,ta.royaltyper
from titles ti
join titleauthor ta
on ti.title_id = ta.title_id
join authors au
on ta.au_id = au.au_id


This is an improvement over using aliases that have no relation whatsoever with the tablename. But in this particular sql statement, there's no reason to alias your tables names - it would just save a few keystrokes, at the cost of making it much less readable. Here's what I would do.

select
authors.Au_id
,au_lname
,au_fname
,title
,titles.title_id
,royaltyper
from titles
join titleauthor
on titles.title_id = titleauthor.title_id
join authors
on titleauthor.au_id = authors.au_id


In this case - just a simple select statment - it really doesn't make any sense to use aliases.

It's in more complex sql, with cross database joins, subqueries, and derived tables that you really want to use aliases in order to make your code more readable. For instance, the following from clause contains tables from 4 different databases:

from BookingStg.dbo.TSData TSData
join BookingStg.dbo.Booking Booking
on Booking.BookingID = TSData.BookingID
join Domain.dbo.BookingWin BookingWin
on TSData.DateOfStayTimeID = BookingWin.DateOfStayTimeID
left join WarehouseFactStore.dbo.Customer_Dim Customer_Dim
on Customer_Dim.TUID = Booking.TUID
and Customer_Dim.ProductID = TSData.ProductID
left join TServerImp.dbo.TLRPostalCode TLRPostalCode
on TLRPostalCode.TLR = TSData.TLR
and TLRPostalCode.ProductID = TSData.ProductID


When using tables from multiple databases, you alias the fully qualitifed names (i.e. DatabaseName.SchemaName.TableName) with just the tablename. Like this:

join BookingStg.dbo.Booking Booking


This cuts down of the length of table references, without obscuring them.

Also, in a situation where you're creating a derived table, you'll need to pick a name for it. Make sure it reflects what the table is used for. For instance, here I'm creating a derived table in order to group transactions, so the alias name reflects that:

from #TransactionAdFact
left join
(
Select
TLR
,MinAdID= min(AdID)
from RetailOperations.dbo.OmTransaction OmTransaction
join #BaseData
on #BaseData.OmniProductMapping = OmTransaction.OmProductID
where
TransDateKey between @StartDateKey and @EndDateKey
and AdID is not null
group by
TLR
,OmTransaction.OmProductID
) GroupedTransactions
on GroupedTransactions.TLR = #TransactionAdFact.TLR


Please, don't call the derived table something like Temp1. I see this all the time. It may save you the minute it would cost to think of a name that would be understandable to other people, but for the sake of the people who will be reading this code - take the minute, and make your code more readable.

Chosing the right tables aliases in SQL Server can either make your queries far more readable...or you can save yourself some thought and a couple keystrokes. Your choice.

Thursday, March 26, 2009

Quick SQL - getting the size of all tables in a database

I use this little chunk of SQL code almost every day. It gives you a relatively accurate report of the size of each table in the database, including indexes. Very handy!

select
TableName = convert(varchar(100) ,sysobjects.name)
,TotalRows = max(sysindexes.rows)
,MbData = floor(sum(convert(real ,sysindexes.dpages)) * spt_values.low / 1048576)
,MbTotal = floor(sum(convert(real ,sysindexes.used)) * spt_values.low / 1048576 )
from sysobjects
join sysindexes
on sysobjects.id = sysindexes.id
join master.dbo.spt_values spt_values
on spt_values.number = 1
and spt_values.type = 'E'
where
sysobjects.type = 'U'
and indid in (0,1,255)
group by
sysobjects.name
,spt_values.low
order by 4 desc

Thursday, March 19, 2009

All database outputs need business owners

This is more applicable to the data warehouse world, because in an OLTP world, it's much more obvious what the purpose of database objects are.

However, in the data warehouse/reporting/business intelligence world, frequently reports and fact tables and data feeds will be created, and people will use them for a while, then not find them useful anymore and stop (or leave the company). And they won't be used again, or people don't know about them.

Aside from the problem of having a bunch of unused reports, fact tables or data feeds cluttering up your system, you have the larger problem of people assuming that they ARE still used, and need to be accounted for. As in, "If we modify table ABC, we also need to fix reports DEF and GHI because they depend on them". Well, if it turns out that the reports aren't being actively used anymore, you've taken on a lot of extra wasted work.

One neat idea is to put in is a usage tracking tool for all reports and data feeds. So, from whatever UI that the reports are run, have it also log a record to a table called something like ReportUsageLog. Then regularly monitor this table to see who's using what. If there's any reports that aren't being used, figure out why and see if you can potentially decommission them.

Thursday, March 5, 2009

Having a great development environment

I'm a big fan of checklists, and one of the ones I like is the one by Joel Spolsky (Joel on Software) called The Joel Test: 12 Steps to Better Code.

He goes through a whole set of things that you need to be doing right. The one I want to focus on today is the "Can you make a build in one step", and how it translates to the database world.

What Joel Spolsky focuses on mainly are software products. A build is not a big deal in his world. However, in the world of complex database systems, with all kinds of inputs and outputs from external vendors, things are different. In this world, doing a build, to me, means not just having the database objects set up correctly, but also having data. I'm talking about a large set of data, probably extracted from the production environment - enough to develop and test everything that you might need to.

This is a lot harder than it may seem, and most companies don't do too well at it. The problem is that almost everybody starts out getting database backups from production, and using those in their development environment. This works fine, until the database starts getting too large. Then, obviously, it's not so easy to get backups from production because the files are just so darn huge.

What a lot of companies do is set up the development environment, with all the database objects, and leave the transferring of data up to the individual developer. That's a big mistake. It can take a very long time to specify exactly what data you need, transfer from a production environment, and then import into a development environment. Multiple this by every developer, for every project they're working on, and you're talking about a lot of wasted time.

The better approach is to write up some code to get a reasonable subset of data. There's lots of ways to do this, and I may write more about it at some point, but it's absolutely critical to get data into your dev/test database systems. The biggest difficulty is getting management to approve spending time on a project like this.