Photo by Jan Antonin Kolar on Unsplash |
At that time, the data was being stored in a single SQL table. The volume was around 160 million records. Data ageing was being done by a simple DELETE FROM statement with a condition on the indexed date column.
The solution seemed pretty obvious to me - I could partition the table by days and then, during the daily maintenance, simply truncate the partition containing the oldest data. Since it's a meta-data operation, it should take only milliseconds.
Because the whole operation had to be done without any downtime, I decided to back up the historical data, and create a brand new empty table with the same columns definitions. The Historical data could be retrieved on demand from the backup, I wasn't planning to restore it ever. Making an existing table partitioned would require to rebuild the indexes (at least the clustered one). I've checked on the test environment that it would take more than 2 hours to do that. With the Enterprise version of MS SQL Server, the ONLINE option is available when recreating indexes but that was no option here, as the DB was a Standard Edition.
I decided not to partition directly over a column with the reference date which was the datetime2 type. Instead, I've added an additional column of the tinyint type that would state for a partition index. I couldn't make it a computed column because you cannot build a partition function on such columns.
I've come up with a function that would map the current date to the partitionIndex column value. It is a pretty simple epoch-like function: take the number of days since 29/05/1982, modulo the number of partitions, plus two (one for the working day and one empty for the next day). Although that approach makes the daily maintenance really easy I realized later that it wasn't the best idea after all. I will explain why in the next article.
Next, I had to update a stored procedure through which all the inserts are made. I've added a call to f_getHistoricalDataPartitionIndex(referenceDate) as a value of the additional column.
Now we can introduce a partition function and a partition scheme. For us, it was enough to partition the table over a single filegroup (PRIMARY). The DB is actually an AWS RDS service. I'm not sure if it's even possible to add additional files there. To check whether the definitions have been created you can use the following query
We're ready to make the HistoricalData table partitioned. I've done it by creating a clustered index based on the partition scheme. I've also created additional indexes that were existing in the original table. This time based on the partition scheme. I've also added the partitionIndex column to all of them.
You can verify whether everything has been created properly with the following query. Among other details, it also shows how many records there are in each partition. I've found it very useful also during the daily maintenance procedure tests.
Finally, we can take advantage of all the effort and implement the daily maintenance procedure that would simply truncate the partition that will be used the next day. The procedure, on our environment, is called by the DB job executed every day at 2:00 AM.
All the steps that were taken to partition the existing non-partitioned table:
- Backup - rename the existing table and use SELECT INTO query to create a twin empty table without any indexes
- Add additional column over which the table with be partitioned
- Create a function to retrieve the partition index value based on a given date
- Alter the procedure that inserts historical data
- Create a partition function and a partition scheme
- Create a clustering index over the partition scheme (and all the indexes you previously had)
- Alter/Create a procedure to execute daily maintenance - truncate the partition that will be used on the following day.
What to do when you want to keep the data and you can afford a downtime
- Add additional column over which the table with be partitioned
- Create a function to retrieve the partition index value based on a given date
- Alter the procedure that inserts historical data
- Create a partition function and a partition scheme
- Recreate the indexes with DROP_EXISTING option set to ON this time based on the partition scheme. You can also try the ONLINE option if you have an enterprise edition
- Alter/Create a procedure to execute daily maintenance - truncate the partition that will be used on the following day.
What to do when you don't want to have a clustering index
It is still possible to partition the table. Haven't tried it on any working system but if you have a unique column (or unique columns combination) you could ALTER the table adding the constraint and then DROP it while placing the table over the partition scheme.
What to do when you have an older version of MS SQL Server
Truncating chosen partitions is a feature available since SQL Server 2016. If you have an older version you could use the SWITCH PARTITION option of ALTER TABLE statement to achieve immediate data ageing. The idea is simple: Create two twin tables on the same partition scheme, with the same columns and indexes, in the same filegroup. Insert your data to one of them, keep the second empty. During the daily maintenance just switch corresponding partitions between tables and truncate the second table.
Comments
Post a Comment