![]() |
Photo by Jan Antonin Kolar on Unsplash |
At that time, the data was being stored in a single SQL table. The volume was around 160 million records. Data ageing was being done by a simple DELETE FROM statement with a condition on the indexed date column.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DELETE FROM HistoricalData WHERE referenceDate < dateadd(day, -35, sysutcdatetime()) |
The solution seemed pretty obvious to me - I could partition the table by days and then, during the daily maintenance, simply truncate the partition containing the oldest data. Since it's a meta-data operation, it should take only milliseconds.
Because the whole operation had to be done without any downtime, I decided to back up the historical data, and create a brand new empty table with the same columns definitions. The Historical data could be retrieved on demand from the backup, I wasn't planning to restore it ever. Making an existing table partitioned would require to rebuild the indexes (at least the clustered one). I've checked on the test environment that it would take more than 2 hours to do that. With the Enterprise version of MS SQL Server, the ONLINE option is available when recreating indexes but that was no option here, as the DB was a Standard Edition.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
EXEC sp_rename 'HistoricalData', 'HistoricalDataUnpartitioned'; | |
GO | |
SELECT * INTO [HistoricalData] FROM [HistoricalDataUnpartitioned] WHERE 1 = 2; | |
GO |
I decided not to partition directly over a column with the reference date which was the datetime2 type. Instead, I've added an additional column of the tinyint type that would state for a partition index. I couldn't make it a computed column because you cannot build a partition function on such columns.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ALTER TABLE [HistoricalData] ADD [PartitionIndex] tinyint; | |
GO | |
ALTER TABLE [HistoricalData] ALTER COLUMN [PartitionIndex] tinyInt NOT NULL; | |
GO |
I've come up with a function that would map the current date to the partitionIndex column value. It is a pretty simple epoch-like function: take the number of days since 29/05/1982, modulo the number of partitions, plus two (one for the working day and one empty for the next day). Although that approach makes the daily maintenance really easy I realized later that it wasn't the best idea after all. I will explain why in the next article.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE FUNCTION [dbo].[f_getHistoricalDataPartitionIndex](@date datetime2) | |
RETURNS int with schemabinding | |
AS | |
begin | |
-- We use 37 partitions to keep 35 full days plus a working day. | |
-- We want to always maintain one empty partition for the following day. | |
-- the partition function used by table indexes is based on 37 discrete values 0, 1...36 | |
RETURN convert(int, DATEDIFF(DAY, '5/29/1982', @date) % 37); | |
end; | |
GO |
Next, I had to update a stored procedure through which all the inserts are made. I've added a call to f_getHistoricalDataPartitionIndex(referenceDate) as a value of the additional column.
Now we can introduce a partition function and a partition scheme. For us, it was enough to partition the table over a single filegroup (PRIMARY). The DB is actually an AWS RDS service. I'm not sure if it's even possible to add additional files there.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE PARTITION FUNCTION HistoricalDataPF (tinyint) | |
AS RANGE LEFT FOR VALUES ( | |
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, | |
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, | |
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, | |
31, 32, 33, 34, 35, 35) | |
GO | |
CREATE PARTITION SCHEME HistoricalDataScheme | |
AS PARTITION HistoricalDataPF ALL TO ([PRIMARY]) | |
GO |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT ps.name, pf.name, boundary_id,value | |
FROM sys.partition_schemes ps | |
INNER JOIN sys.partition_functions pf ON pf.function_id=ps.function_id | |
INNER JOIN sys.partition_range_values prf ON pf.function_id=prf.function_id | |
where ps.name = 'HistoricalDataScheme' |
We're ready to make the HistoricalData table partitioned. I've done it by creating a clustered index based on the partition scheme. I've also created additional indexes that were existing in the original table. This time based on the partition scheme. I've also added the partitionIndex column to all of them.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE CLUSTERED INDEX [cidx] ON [HistoricalData] | |
( | |
[ReferenceDate] ASC, | |
[PartitionIndex] ASC | |
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) | |
ON HistoricalDataScheme(PartitionIndex) | |
GO |
You can verify whether everything has been created properly with the following query. Among other details, it also shows how many records there are in each partition. I've found it very useful also during the daily maintenance procedure tests.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT OBJECT_SCHEMA_NAME(pstats.object_id) AS SchemaName | |
, OBJECT_NAME(pstats.object_id) AS TableName | |
, ps.name AS PartitionSchemeName | |
, ds.name AS PartitionFilegroupName | |
, pf.name AS PartitionFunctionName | |
, CASE pf.boundary_value_on_right WHEN 0 THEN 'Range Left' ELSE 'Range Right' END AS PartitionFunctionRange | |
, CASE pf.boundary_value_on_right WHEN 0 THEN 'Upper Boundary' ELSE 'Lower Boundary' END AS PartitionBoundary | |
, prv.value AS PartitionBoundaryValue | |
, c.name AS PartitionKey | |
, CASE | |
WHEN pf.boundary_value_on_right = 0 | |
THEN c.name + ' > ' + CAST(ISNULL(LAG(prv.value) | |
OVER (PARTITION BY pstats.object_id ORDER BY pstats.object_id, pstats.partition_number), | |
'Infinity') AS VARCHAR(100)) + ' and ' + c.name + ' <= ' + | |
CAST(ISNULL(prv.value, 'Infinity') AS VARCHAR(100)) | |
ELSE c.name + ' >= ' + CAST(ISNULL(prv.value, 'Infinity') AS VARCHAR(100)) + ' and ' + c.name + ' < ' + CAST( | |
ISNULL(LEAD(prv.value) | |
OVER (PARTITION BY pstats.object_id ORDER BY pstats.object_id, pstats.partition_number), | |
'Infinity') AS VARCHAR(100)) | |
END AS PartitionRange | |
, pstats.partition_number AS PartitionNumber | |
, pstats.row_count AS PartitionRowCount | |
, p.data_compression_desc AS DataCompression | |
FROM sys.dm_db_partition_stats AS pstats | |
INNER JOIN sys.partitions AS p ON pstats.partition_id = p.partition_id | |
INNER JOIN sys.destination_data_spaces AS dds ON pstats.partition_number = dds.destination_id | |
INNER JOIN sys.data_spaces AS ds ON dds.data_space_id = ds.data_space_id | |
INNER JOIN sys.partition_schemes AS ps ON dds.partition_scheme_id = ps.data_space_id | |
INNER JOIN sys.partition_functions AS pf ON ps.function_id = pf.function_id | |
INNER JOIN sys.indexes AS i ON pstats.object_id = i.object_id AND pstats.index_id = i.index_id AND | |
dds.partition_scheme_id = i.data_space_id AND | |
i.type <= 1 /* Heap or Clustered Index */ | |
INNER JOIN sys.index_columns AS ic | |
ON i.index_id = ic.index_id AND i.object_id = ic.object_id AND ic.partition_ordinal > 0 | |
INNER JOIN sys.columns AS c ON pstats.object_id = c.object_id AND ic.column_id = c.column_id | |
LEFT JOIN sys.partition_range_values AS prv ON pf.function_id = prv.function_id AND pstats.partition_number = | |
(CASE pf.boundary_value_on_right | |
WHEN 0 | |
THEN prv.boundary_id | |
ELSE (prv.boundary_id + 1) END) | |
WHERE | |
ps.name = 'HistoricalDataScheme' | |
ORDER BY TableName, PartitionNumber; | |
Go |
Finally, we can take advantage of all the effort and implement the daily maintenance procedure that would simply truncate the partition that will be used the next day.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE proc [p_HistoricalDataDailyMaintenance] | |
as | |
begin | |
set nocount on; | |
declare @partitionToDelete int | |
-- we need truncate the following partition (based on partition number, not the partitionIndex column value, that's why we need to increment the value) | |
set @partitionToDelete = dbo.f_getHistoricalDataPartitionIndex(DATEADD(DAY, 1, sysutcdatetime())) + 1 | |
truncate table [HistoricalData] with (partitions (@partitionToDelete)) | |
end | |
GO |
All the steps that were taken to partition the existing non-partitioned table:
- Backup - rename the existing table and use SELECT INTO query to create a twin empty table without any indexes
- Add additional column over which the table with be partitioned
- Create a function to retrieve the partition index value based on a given date
- Alter the procedure that inserts historical data
- Create a partition function and a partition scheme
- Create a clustering index over the partition scheme (and all the indexes you previously had)
- Alter/Create a procedure to execute daily maintenance - truncate the partition that will be used on the following day.
What to do when you want to keep the data and you can afford a downtime
- Add additional column over which the table with be partitioned
- Create a function to retrieve the partition index value based on a given date
- Alter the procedure that inserts historical data
- Create a partition function and a partition scheme
- Recreate the indexes with DROP_EXISTING option set to ON this time based on the partition scheme. You can also try the ONLINE option if you have an enterprise edition
- Alter/Create a procedure to execute daily maintenance - truncate the partition that will be used on the following day.
What to do when you don't want to have a clustering index
It is still possible to partition the table. Haven't tried it on any working system but if you have a unique column (or unique columns combination) you could ALTER the table adding the constraint and then DROP it while placing the table over the partition scheme.
What to do when you have an older version of MS SQL Server
Truncating chosen partitions is a feature available since SQL Server 2016. If you have an older version you could use the SWITCH PARTITION option of ALTER TABLE statement to achieve immediate data ageing. The idea is simple: Create two twin tables on the same partition scheme, with the same columns and indexes, in the same filegroup. Insert your data to one of them, keep the second empty. During the daily maintenance just switch corresponding partitions between tables and truncate the second table.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
create proc [p_HistoricalDataDailyMaintenance] | |
as | |
begin | |
set nocount on; | |
declare @partitionToDelete int; | |
-- we need truncate the following partition (based on partition number, not the partitionIndex column value, that's why we need to increment the value) | |
set @partitionToDelete = dbo.f_getHistoricalDataPartitionIndex(DATEADD(DAY, 1, sysutcdatetime())) + 1; | |
alter table [HistoricalData] switch partition @partitionToDelete to [HistoricalDataWork] partition @partitionToDelete; | |
truncate table [HistoricalDataWork]; | |
end | |
go |
Comments
Post a Comment