Thursday, December 20, 2012

No DBA is an Island

Introduction

I gave an OakTalk at the UKOUG2012 conference. This article is based upon the notes for that talk. It is not a technical article, just an opinion piece.
I have always described myself as a DBA, although it has been many years since I had day-to-day responsibility for a production database.  My day job is about performance, helping people to get the best possible performance from their PeopleSoft systems.  That can involve anything from tuning a specific SQL statement, to changing how the database, middleware, or application has been configured, through to standing back and looking at the way the architecture has been put together.  Lots of people have to work together on any system, and I frequently work with people for whom the database is not their home
The concept of the division of labour is nothing new.  Plato suggested the ‘minimum state would consist of four or five men’.  Adam Smith (possibly inspired by Henri-Louis Duhamel du Monceau) famously took the manufacture of pins as his example.  Businesses must break tasks down into manageable units, and as they do each person’s job becomes progressively more specific and specialised. 
IT is no exception.  The modern IT department is broken down into teams of administrators.  We have desktop teams, network teams, application administrators, middleware teams, DBA, operating system admins, storage area network teams.  In some places, I have seen DBA teams broken down into a core database team and separate application DBA teams for each major application.  Every tier and technology has its team, each with specialist knowledge and skills. Such are the tribes of IT.
I have heard these groupings described as verticals, or stove-pipes, or silos.  The clue is in the name.  The problem that I perceive is that these groups are not always good at communicating with each other, and when something goes wrong, they are not always above blaming one another.

What is IT for? 

A modern company simply couldn’t function without IT.  The business systems that we, as IT professionals design, build and manage are fundamental to the companies in which we work.  There will be lots of different systems that do stuff.  Different companies will do slightly different stuff.  Absolute minimum, you need a PC that can connect to things.  There will always be e-mail. There will be a Finance system to track the money.  There will be some form of HR system to track employees.  There will be a payroll system to pay the employees.  If a company sells widgets, there will be some form of stock control, and some form of logistics to manage its movement.  Those systems will communicate with each other. Lots of systems, and it doesn’t really matter what the system is.  The point is that every system will need the services provided by several of those IT silos. Each system is a chain of technology that is laid horizontally across those IT silos. At one end you have users, and at the other end, you probably have physical spinning disks (for now).  Over time, those chains have come to exist of every more links.

The DBA’s view?

I described myself as a DBA, because that is the tribe I came from. However, I firmly believe, if I was still a day-to-day production DBA, that in order to do my job properly, I would occasionally need to look out from my silo and not just to look at the neighbouring silos along the chains of technology, but all the silos on the chains, and sometimes even outside IT department.
I believe that the DBA is well placed and appropriately skilled to asses most aspects of a system.  Let me illustrate this with 2 examples

Example 1: Two Temporary Tablespace

I have seen more than one HR system where the temporary tablespace has been allowed to grow hugely, sometimes hundreds of GB, in response to space errors. Each time, the root cause has been users running ad-hoc queries concurrently with critical batch processes. So, I have suggested that they use two temporary tablespaces.
•    One for regular processing – on-line transaction processing, regular and scheduled batch processes.
•    Another for ad-hoc queries whether submitted by ad-hoc database users or by the application’s ad-hoc query tool.
As a DBA, that seems like a straight forward technical decision.  It will protect the critical batches from out of space errors that occur when the temp space has been consumed by the ad-hoc queries.  It allows me to regulate bad queries by restricting their temporary tablespace consumption limiting their effect on the system while they run for hours on end while they copy data into the temporary tablespace (and before anybody asks, we are also thinking about Oracle resource manager).
But it isn’t as simple as that.  It requires a different relationship between the users and their application support team.  Now, when a query crashes with an out of space error we don’t just extend the temporary tablespace, we need to examine the query and consider how it can be improved. Meanwhile, the users have to understand that this approach is for the greater good of the system, and in time it will also improve their query performance.
It is no longer just a technical problem.  It has become political.  It needs to be sold to the users, and it needs a change in the business process.  So the DBA needs to talk to people outside their silo, and sometimes even outside IT.

Example 2: Archiving

I work on an Time & Labor and Payroll system that has run for 4 years without any archiving.  We have 4 years of schedule data.  4 years of reported time.  4 years of payroll data. 4 years of audit data.  We now have a 4TB OLTP database. Archiving was de-scoped from the initial implementation.
How have we survived this long?  Many of the tables that hold this data have been partitioned, mostly on a time basis – that is to say that different periods of time are different partitions.  In many cases, but by no means all, the long tail of history is hidden by partition elimination.  Partitioning was introduced to improve performance, but the design of the partitioning always had an eye to future archiving.
With or without partitioning, keeping all the data forever is not an option.  Quite apart from data security and performance, the size of the database is becoming a challenge.  Periodically, we have to copy the production database into various development and test environments.  We trim some history from some of those environments, but size is still a challenge.
So we had to look at archive and purge.  So the first step it to understand the application, and then work out with the business users at what point do they at least no longer need the data to operate their process, and at what point can we ultimately purge the data entirely from the database. 
We are about to deliver the first tranche of archiving into one functional area.  Conceptually our process is very simple.  Some data we can purge, often by dropping partitions.  Some data we have had to keep for ad-hoc query.  Where possible, we use partition exchange to exchange historical partitions into archive tables.  We have also been very successful in using table compression to further reduce the overall database size.
Archive and purge is going to be a regular, rolling process - at least monthly.  It is going to be done without any system outage, and it will as far as possible be automated.  Therefore, we have had to build some of these processes into the application not least so the application support team has visibility. The archive process is sensitive to the system date and some application data. So it has had to involve developers.
We have created a meta-data driven PL/SQL package, effectively an API, to do all the partition handling and compression. The developers call that from their processes.
It takes someone who is, if not a DBA, then very database aware to design the concept (and you’ve probably guessed correctly that in this case that DBA is me). That person also needs to have some understanding of the application and how the application data fits together. I think it is also essential, at least for the largest and most critical systems, that the DBA understands how the business uses the application.  The DBA can’t do this hiding in their silo.

Conclusion

I have heard it said that some production DBAs think their job is to make sure nothing happens; by making sure that nothing happens. Business is not about maintaining the status quo.  Businesses talk about bigger, better, cheaper, faster. You can track the pulse of a business by measuring its critical IT systems. A DBA’s job is not to hold the tide back, but to make sure that the systems we tend are fit for purpose, and continue so to be. We need to be proactive. Sometimes, we should go out and be consultants, look for improvements, initiate changes, and advise on developments.
Therefore, if you are a production DBA hiding in your silo, send not to know for whom the bell tolls, it tolls for thee.