The Database Bottleneck
Why is my database so flaming slow?
Have you heard this before? I have. I’ve also heard it as a good reason not to use relational databases. There are a number of reasons why a particular database could be slow, but lets take a step back and try to understand what a relational database is actually trying to do. Relational databases are designed to be a mathematically structured, transactionally consistent and optimized way to store and retrieve large amounts of “relational” data. If you are not storing relational data, then don’t use a relational database, it is not optimized for your usage scenario. Problem solved… well, perhaps not.
Let’s assume you are storing relational data, but the database still seems slow. Well, I could go in to detail about database tuning, but I’m not a DBA, and there are people out there who are far better at that than I. Also database tuning tends to get very vendor specific at its deepest levels. Besides, that’s not what I really want to focus on here. Maybe you could look at the performance of your Network, or the disks and other hardware your database server is running on, but again, I’m not network or hardware engineer, I’m a software engineer. So let’s fix it in software… or at least, to the best of our ability. I’m going to assume then that your database is well tuned and maintained by someone who is experienced in doing so, and knows all the different ways in your organization that this database is going to be used. Once we make these assumptions, then we can no longer say “the database is slow” I would go further and say that any modern database is actually very efficient for what it does. Still I have seen developers in this case trying to blame the database for the poor performance of the application.
Most often the database resides on another box entirely to the one you’re application is running on. This is where the problem begins. To actually get data from a database you have to create a database connection, that then opens a network port, which then negotiates a conversation over a protocol, asks the database server for some data. The database engine then does some processing to find your requested data or perform your operation, and finally returns the results as data packets over the network back to your application. Now that mightn’t seem like much, but compared to grabbing that same data from your local memory, or even a file on your local disk, we are talking many orders of magnitude difference. Even if you are using a database engine on your local machine, it will almost certainly be running in a separate process which means that although you don’t have a physical network to navigate, you still have inter-process communications protocols to deal with, and this can still be a few orders of magnitude difference to getting the data directly from memory.
The most important thing about developing against a database is to have respect for the overhead required when querying the database. Based on this respect we can establish a few basic rules about how to interact with a database.
1. Don’t keep asking the database for the same thing over and over again. If it’s something you need regularly cache it. Caching comes with its own problems, the biggest of them being when to invalidate it, but the one thing you can be certain of is that it won’t be slow.
2. Try to ask the database as few times as possible for information during any one operation. Ideally it would be great if you could simply have a single SQL statement that gets everything you need to perform the desired operation, and it is worth investing a few (thousand) CPU cycles trying to figure out exactly what data you are going to need for a particular operation before hitting the database. It is also usually better to return a little more data than you might need, than to find you have to hit the database again because you forgot something. Of course this can be taken too far, so be sensible here, you shouldn’t be returning your entire customer and products tables just to process a single order.
3. Learn SQL, or your vendor specific implementation of it at least. Understand how to form the queries that get you the data you want, and take advantage of the well tuned database (assuming it’s well tuned). If you are using a tool that generates SQL for you, like LINQ2SQL or Entity Framework, understand how it generates SQL. These tools tend to produce sub-optimal SQL because they are trying to solve very generic problems, however, they can be coaxed into doing things more efficiently if you understand how they work.
If you still think the database is slow, I would suggest doing some profiling of your application. SQL Server comes with a profiler that is more than adequate. Other database engines have their own profiling tools. The Redgate ANTS performance profiler I talked about in my previous post has a SQL profiler (warning it doesn’t work with SQL Express), that is integrated with the code performance analyser. Also just a hint if you’re using ANTS performance profiler, you need to look at the “Wall Clock” setting when you are analysing a piece of code that has a lot of SQL interaction as the CPU clock will give you a false impression.
There are many other possible performance issues that can occur with databases, but this is all I really want to talk about at this stage. I have seen many projects in my time that have suffered because the 3 rules above have not been followed.