
Why that was not the best idea?
It's true that maintenance improvement was a success. It wasn't long before I realized introducing an additional column with rolling integer values by which the table had been partitioned maybe is a good idea to ease up moving the sliding window of data to another day but now writing ad-hoc queries in a way they would take advantage of partitioning is not straightforward hard. One would need to add an additional condition in each query on partition index column. Furthermore each time we would have to establish what's today's partition index value.
The tradeoff
If I were to do it again I would certainly not create an additional column with partitionIndex. I would rely directly on datetime2 column. The daily maintenance code would be way more complex but writing ad-hoc SQL queries would be as easy as possible.
Changes in daily maintenance stored procedure
Instead of simply truncating a single partition, every night we would have to:- Truncate the left-most partitionThis file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
TRUNCATE TABLE [HistoricalData] with (partitions (1)); - Split the right-most partition with new boundary (an additional day)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersALTER Partition Scheme HistoricalDataScheme NEXT USED PRIMARY; ALTER Partition Function HistoricalDataPF () SPLIT RANGE ('2020-07-03'); - Merge the old partition with the new boundaryThis file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
ALTER Partition Function HistoricalDataPF() MERGE RANGE ('2020-05-30');
It definitely makes the maintenance code a more complexed - I've just shown an example but in order to really implement that solution we would have to automate the splitting/merge in a way the boundaries are dynamically calculated.
The recap
Let's summarize. We have a table called HistoricalData with among some others there is a referenceDate datetime2 column. We want to maintain 35 partitions with some full days of past data plus a working day.
The goal is to perform daily maintenance within milliseconds and speed up ad-hoc SQL queries.
Let's create an example table this time for the sake of this tutorial. Then let's create a partition scheme, partition function and then a clustered index based on that scheme.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TABLE [dbo].[HistoricalData]( | |
[id] [int] NOT NULL, | |
[name] [nvarchar](50) NOT NULL, | |
[value] [int] NOT NULL, | |
[referenceDate] [datetime2](7) NOT NULL | |
) ON [PRIMARY] | |
GO | |
CREATE PARTITION FUNCTION HistoricalDataPF (datetime2) | |
AS RANGE LEFT FOR VALUES ( | |
'2020-06-01', '2020-06-02', '2020-06-03', '2020-06-04', '2020-06-05', '2020-06-06' | |
,'2020-06-07', '2020-06-08', '2020-06-09', '2020-06-10', '2020-06-11', '2020-06-12', '2020-06-13' | |
,'2020-06-14', '2020-06-15', '2020-06-16', '2020-06-17', '2020-06-18', '2020-06-19', '2020-06-20' | |
,'2020-06-21', '2020-06-22', '2020-06-23', '2020-06-24', '2020-06-25', '2020-06-26', '2020-06-27' | |
,'2020-06-28', '2020-06-29', '2020-06-30', '2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04' | |
) | |
GO | |
CREATE PARTITION SCHEME HistoricalDataScheme | |
AS PARTITION HistoricalDataPF ALL TO ([PRIMARY]) | |
GO | |
CREATE CLUSTERED INDEX [cidx] ON [dbo].[HistoricalData] | |
( | |
[id] ASC, | |
[referenceDate] ASC | |
) | |
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON HistoricalDataScheme(referenceDate) | |
GO |
After that let's create daily maintenance procedure that would be executed each night right after midnight.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
create procedure DailyMaintenance | |
as | |
begin | |
declare @dayToMerge as datetime2 | |
declare @nextDay as datetime2 | |
select @nextDay = cast(cast(sysutcdatetime() as date) as datetime2); | |
select @dayToMerge = dateadd(day, -34, @nextDay); | |
truncate table [HistoricalData] with (partitions (1)); | |
alter partition scheme HistoricalDataScheme next used [PRIMARY]; | |
alter partition function HistoricalDataPF () split range (@nextDay); | |
alter partition function HistoricalDataPF() merge range (@dayToMerge); | |
end |
Let's test the solution. First, let's create a helper procedure to query partition details.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE PROCEDURE PartitionDetails | |
AS | |
SELECT OBJECT_NAME(pstats.object_id) AS TableName | |
, ps.name AS PartitionSchemeName | |
, ds.name AS PartitionFilegroupName | |
, pf.name AS PartitionFunctionName | |
, prv.value AS PartitionBoundaryValue | |
, c.name AS PartitionKey | |
, CASE | |
WHEN pf.boundary_value_on_right = 0 | |
THEN c.name + ' > ' + CAST(ISNULL(LAG(prv.value) | |
OVER (PARTITION BY pstats.object_id ORDER BY pstats.object_id, pstats.partition_number), | |
'Infinity') AS VARCHAR(100)) + ' and ' + c.name + ' <= ' + | |
CAST(ISNULL(prv.value, 'Infinity') AS VARCHAR(100)) | |
ELSE c.name + ' >= ' + CAST(ISNULL(prv.value, 'Infinity') AS VARCHAR(100)) + ' and ' + c.name + ' < ' + CAST( | |
ISNULL(LEAD(prv.value) | |
OVER (PARTITION BY pstats.object_id ORDER BY pstats.object_id, pstats.partition_number), | |
'Infinity') AS VARCHAR(100)) | |
END AS PartitionRange | |
, pstats.partition_number AS PartitionNumber | |
, pstats.row_count AS PartitionRowCount | |
FROM sys.dm_db_partition_stats AS pstats | |
INNER JOIN sys.partitions AS p ON pstats.partition_id = p.partition_id | |
INNER JOIN sys.destination_data_spaces AS dds ON pstats.partition_number = dds.destination_id | |
INNER JOIN sys.data_spaces AS ds ON dds.data_space_id = ds.data_space_id | |
INNER JOIN sys.partition_schemes AS ps ON dds.partition_scheme_id = ps.data_space_id | |
INNER JOIN sys.partition_functions AS pf ON ps.function_id = pf.function_id | |
INNER JOIN sys.indexes AS i ON pstats.object_id = i.object_id AND pstats.index_id = i.index_id AND | |
dds.partition_scheme_id = i.data_space_id AND | |
i.type <= 1 | |
INNER JOIN sys.index_columns AS ic | |
ON i.index_id = ic.index_id AND i.object_id = ic.object_id AND ic.partition_ordinal > 0 | |
INNER JOIN sys.columns AS c ON pstats.object_id = c.object_id AND ic.column_id = c.column_id | |
LEFT JOIN sys.partition_range_values AS prv ON pf.function_id = prv.function_id AND pstats.partition_number = | |
(CASE pf.boundary_value_on_right | |
WHEN 0 | |
THEN prv.boundary_id | |
ELSE (prv.boundary_id + 1) END) | |
WHERE ps.name = 'HistoricalDataScheme' | |
ORDER BY TableName, PartitionNumber; |
It is the 5th of July at the time I write this sentence. A perfect time to execute our daily maintenance procedure. The effect is as expected, the oldest day has been truncated and the partition has been merged. A new partition has been created for future data.
That's pretty much it. There is one thing that I've never mentioned. Our partitions operate within a single filegroup which is not the most effective way of partitioning the data. I didn't care about that as my DB is a part of AWS RDS service where I'm not sure whether it is even possible to define multiple filegroups.
Comments
Post a Comment