I am currently involved with performance tuning on a large database, almost a terabyte of data. One of the things that we found was that a lot of that data was held in two tables. Both with over a billion rows and growing. These tables held financial type information (budgets and estimates etc) and had data going back 15 years. A lot of this data is not required on a day to day basis and in fact only the last 5 years is used, but the rest is required to be kept for legal purposes. If we reduced the size of these tables and a few others that had large amounts of historical data then backup sizes would decrease, database maintenance would be quicker and that queries would be quicker. But what is the right strategy for archiving data?
The following are some strategies that have been suggested to the vendor of the application.
Separate tables - this step involves creating separate archive tables from the main tables. But here do you do a straight 1:1 relationship or do you go and create multiple archive tables 1:N, maybe one for every year. You need to think about this an on-going exercise to archive on a regular basis. How is this going to be achieved and how is it going to be maintained ?
Placement of the archive tables. Do you create them in their own separate archive database or keep them in the current database? If they are in their own database then you can tailor backups and maintenance to exclude this database. Keeping them in the current database doesn't reduce the database backup size and maintenance could still be an issue. Do you create these tables in their own archive schema to logically separate them. Do you place these archive tables on their own partition so they could be potentially moved to slower, less expensive storage?
Indexing of the archived tables. Once you have your separate archive tables you could consider dropping a lot of indexes to reduce space, especially if they will be accessed seldomly.
Archiving is an overlooked area of database design. Retro fitting a solution is not easy. It is something you should consider if you are designing a database that will hold large amounts of historical data (or just a lot of data), to keep your operational database trim and efficient as possible without losing any data.