SQL Server : Table partitioning

Partitioning in SQL refers to the process of dividing a large table into smaller, more manageable pieces called partitions. Each partition is a subset of the table's data, and SQL queries can access these partitions separately. The main table still appears as a single entity to the user, but behind the scenes, the data is stored in multiple physical units.

Types of Partitioning

Range Partitioning: Data is partitioned based on a specified range of values in a column. For example, you can partition data based on a date range. Example: Orders in Q1, Q2, Q3, Q4.
List Partitioning: Data is partitioned based on a list of predefined values. Example: Partition orders by country or region.
Hash Partitioning: Data is partitioned by applying a hash function to a column's value. It distributes the data evenly across partitions. Example: Distribute data across 4 partitions based on the hash value of a customer ID.
Composite Partitioning: Combines two or more types of partitioning, like range-hash or range-list partitioning. Example: Partition data by range first, and then by hash within those ranges.

Advantages of Partitioning

Improved Query Performance:
- Faster Access: When a query accesses only a specific partition (partition pruning), it reduces the amount of data scanned, speeding up retrieval.
- Parallel Processing: SQL engines can process different partitions in parallel, which improves query performance for large datasets.
Enhanced Manageability:
- Easier Maintenance: Large tables can be difficult to maintain, but with partitioning, operations like backups, data loads, or deletions can be performed on individual partitions, reducing impact on the whole table.
- Partition-Specific Indexing: You can create indexes on specific partitions, reducing storage and maintenance overhead for unused data.
Scalability:
- Efficient Growth: As tables grow over time, partitioning helps manage large volumes of data more effectively, distributing data and reducing the load on storage systems.
- Retention Policies: You can easily archive or delete older partitions without impacting current data.
Improved Data Availability:
- Partition Independence: If one partition becomes corrupted, the rest of the partitions (and therefore the majority of the data) remain accessible, improving overall data availability.
Optimized Data Maintenance:
- Partition Switching: You can easily move partitions between tables (useful in data warehousing). For example, you can load data into a staging table and then switch partitions into the main table.

Why Partitioning is Needed

Handling Large Datasets: Partitioning becomes critical when working with large tables containing millions or billions of rows. Query performance can degrade without partitioning due to the sheer amount of data.
Managing Historical Data: Partitioning by date or other logical ranges makes it easier to manage historical data (e.g., by dropping old partitions).
Performance Improvements: With partitioning, you can tune specific partitions for performance, improving the overall system response time.

Example of Partitioning

Let's assume we have a table orders with columns order_id, order_date, customer_id, and total_amount. We want to partition the table by order_date on yearly basis.

We will starts entire process step by step.

Step 1 : Create a Database

Step 2 : Create file groups

Step 3 : Add files to the filegroups

Step 4 : Create the partition function

Step 5 : Create the orders table

Step 6 : Create Sample orders table

Step 7 : Insert/Add sample record to Order table

Step 8 : Final Output.

Step 1 : Create a Database

-- In case database already exists, drop
Drop database test_database
-- Create the database
CREATE DATABASE test_database;
GO

Step 2 : Create file groups

A filegroup in SQL Server is a logical grouping of one or more data files that allows you to manage how the data is stored across the physical storage media. It's primarily used to organize and distribute data across multiple files or disks, improving performance, manageability, and recovery options. Here's how it works:

ALTER DATABASE test_database 
ADD FILEGROUP FG_2001;

ALTER DATABASE test_database
ADD FILEGROUP FG_2002;

ALTER DATABASE test_database
ADD FILEGROUP FG_2003;

In this example:

We are adding three filegroups named FG_2001, FG_2002, and FG_2003 to the database test_database.
Later, we could allocate different tables, partitions, or indexes to these filegroups. For instance, if you are working with an "Orders" table, you could store orders from 2001 in FG_2001, orders from 2002 in FG_2002, and so on

Step 3 : Add files to the filegroups

-- Add files to the filegroup FG_2001
ALTER DATABASE test_database
ADD FILE (
	-- Logical name for the file
    NAME = 'File_2001',
	-- Physical file path on the disk
    FILENAME = 'D:\SQLData\File_2001.ndf', 
	 -- Initial size of the file
    SIZE = 5MB,   
	-- Maximum size the file can grow to                      
    MAXSIZE = 100MB, 
	-- How much the file grows when more space is needed                    
    FILEGROWTH = 5MB
 -- Assign the file to filegroup FG_2001                     
) TO FILEGROUP FG_2001;        

-- Add files to the filegroup FG_2002
ALTER DATABASE test_database
ADD FILE (
    NAME = 'File_2002',
    FILENAME = 'D:\SQLData\File_2002.ndf',  
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
) TO FILEGROUP FG_2002;     

-- Add files to the filegroup FG_2003
ALTER DATABASE test_database
ADD FILE (
    NAME = 'File_2003',
    FILENAME = 'D:\SQLData\File_2003.ndf',  
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
) TO FILEGROUP FG_2003;

ADD FILE:
Specifies that you are adding a new file to the database. This new file will hold data associated with the filegroup.
NAME = 'File_2001':
This is the logical name of the file inside the SQL Server. SQL Server uses this name to reference the file internally. Logical names help differentiate between multiple files in the same filegroup or database
FILENAME = 'D:\SQLData\File_2001.ndf':
This is the physical file path where the file is stored on your server’s file system.
In this case, the file will be created on the D: drive in the SQLData folder, and the file will be called File_2001.ndf. The .ndf extension stands for "Non-primary Data File".
Modify the path according to your environment (the disk and folder where you want to store this file).
SIZE = 5MB:
This specifies the initial size of the file, which will start at 5MB when it is created. It reserves 5MB on the disk for this file immediately.
MAXSIZE = 100MB:
This sets the maximum size the file is allowed to grow to. In this case, the file can grow up to 100MB. Once it reaches this limit, no more data can be added to the file unless you increase the maximum size.
If you omit MAXSIZE, the file will grow indefinitely, depending on available disk space.
FILEGROWTH = 5MB:
This defines how much space will be added to the file when it becomes full and needs more space. In this example, the file will grow in increments of 5MB when needed.
This auto-growth mechanism ensures that the file grows gradually instead of throwing an error when it runs out of space.
TO FILEGROUP FG_2001:
This assigns the newly created file to the FG_2001 filegroup. All data associated with this filegroup will be stored in the file File_2001.ndf.

How It Works:

After you run this command, SQL Server will create a physical file (File_2001.ndf) in the specified directory (D:\SQLData).
This file will start with an initial size of 5MB but can grow automatically in 5MB increments up to a maximum size of 100MB.
This file is now part of the FG_2001 filegroup, so any data that is directed to this filegroup will be stored in File_2001.ndf.

Step 4 : Create the partition function

This is used to divide data into partitions based on a range of values—in this case, by date. Partitioning helps in managing large tables by splitting them into smaller, more manageable pieces without affecting the logical structure of the table.

USE test_database; -- Select the test_database to work with

-- Name of the partition function and data type (DATE)
CREATE PARTITION FUNCTION orderDateRangePF (DATE)  
 -- Define the range boundaries for each partition
AS RANGE RIGHT FOR VALUES('2001-12-31', '2002-12-31', '2003-12-31');

CREATE PARTITION FUNCTION orderDateRangePF (DATE):
This creates a partition function named orderDateRangePF. The partition function is used to define how data should be split into partitions based on a specified column (in this case, a column of type DATE).
The partition function will divide the data based on date ranges.
AS RANGE RIGHT:
This specifies that the partition boundaries use a right boundary. In a RANGE RIGHT partition function, each value in the partition boundary belongs to the partition on the right-hand side.
Example: For a boundary value of '2001-12-31', all rows with the date '2001-12-31' will be placed in the partition for that boundary and will be considered part of the next partition (i.e., the partition for the year 2001 includes '2001-12-31').
FOR VALUES ('2001-12-31', '2002-12-31', '2003-12-31'):
This defines the boundary values for the partitions. The values specify the upper limit of each partition for a given year.
For example:
Partition 1: Contains all rows with dates before or on '2001-12-31'.
Partition 2: Contains all rows with dates between '2002-01-01' and '2002-12-31'.
Partition 3: Contains all rows with dates between '2003-01-01' and '2003-12-31'.
Partition 4: Contains all rows with dates from '2004-01-01' and beyond.

Step 5 : Create the Partition Schema

Creates a partition scheme in SQL Server, which maps the partitions (defined in the partition function) to specific filegroups. The partition scheme is what ties the data in each partition to the corresponding filegroup, determining where the data will be physically stored.

-- Name of the partition scheme
CREATE PARTITION SCHEME orderDateRangePS  
-- Use the previously defined partition function (orderDateRangePF)
AS PARTITION orderDateRangePF             
-- Map partitions to filegroups
TO (FG_2001, FG_2002, FG_2003, FG_2003);

CREATE PARTITION SCHEME orderDateRangePS:
This creates a partition scheme named orderDateRangePS. A partition scheme is responsible for defining which filegroups each partition will be stored in.
A partition scheme is a necessary part of partitioning because it links the partition function (which defines how data is split) to the physical storage (filegroups).
AS PARTITION orderDateRangePF:
This tells SQL Server that the partition scheme orderDateRangePS will use the partition function orderDateRangePF (which we defined earlier).
The orderDateRangePF function splits data into partitions based on date ranges, as defined by the boundary values '2001-12-31', '2002-12-31', '2003-12-31'.
TO (FG_2001, FG_2002, FG_2003, FG_2003):
This part specifies the filegroups where each partition's data will be physically stored.
SQL Server uses these filegroups to place the data from each partition into specific files on the disk.

Step 6 : Create Sample orders table

USE test_database;
GO
-- Create the orders table with order_id as part of the primary key
CREATE TABLE orders (
    order_id INT,
    order_date DATE,
    customer_id INT,
    total_amount DECIMAL(10, 2),
	-- Including order_date in the primary key
    PRIMARY KEY CLUSTERED (order_id, order_date)  
)
-- Using the partition scheme
ON orderDateRangePS (order_date);

Step 7 : Insert/Add sample record to Order table

-- Insert 5 records for each partition

-- Records for 2001
INSERT INTO orders (order_id, order_date, customer_id, total_amount) VALUES
(1, '2001-01-15', 101, 150.00),
(2, '2001-03-20', 102, 250.50),
(3, '2001-06-05', 103, 300.00),
(4, '2001-09-10', 104, 175.25),
(5, '2001-11-25', 105, 400.75);

-- Records for 2002
INSERT INTO orders (order_id, order_date, customer_id, total_amount) VALUES
(6, '2002-02-15', 106, 120.00),
(7, '2002-04-20', 107, 220.50),
(8, '2002-07-05', 108, 350.00),
(9, '2002-09-10', 109, 275.25),
(10, '2002-12-25', 110, 450.75);

-- Records for 2003
INSERT INTO orders (order_id, order_date, customer_id, total_amount) VALUES
(11, '2003-01-10', 111, 130.00),
(12, '2003-03-15', 112, 230.50),
(13, '2003-05-20', 113, 320.00),
(14, '2003-08-25', 114, 180.75),
(15, '2003-11-30', 115, 390.00);

-- Records for 2005. 
-- For testing, we will add 5 records beyond our Filegroup/partion range 
INSERT INTO orders (order_id, order_date, customer_id, total_amount) VALUES
(16, '2005-01-05', 116, 160.00),
(17, '2005-02-20', 117, 260.50),
(18, '2005-05-15', 118, 350.00),
(19, '2005-07-10', 119, 240.25),
(20, '2005-12-01', 120, 500.00);

Step 8 : Final Output.

After inserting all the records into the Orders table, let's check the record count in each partition. Run the following query to view the counts for each partition:

SELECT t.NAME  AS TableName,
       i.NAME  AS IndexName,
       p.partition_number,
       p.partition_id,
       i.data_space_id,
       f.function_id,
       f.type_desc,
       fg.NAME AS [filegroup],
       r.boundary_id,
       r.value AS BoundaryValue,
       p.rows,
       r.*
FROM   sys.tables AS t
       INNER JOIN sys.indexes AS i
               ON t.object_id = i.object_id
       INNER JOIN sys.partitions AS p
               ON i.object_id = p.object_id
                  AND i.index_id = p.index_id
       INNER JOIN sys.allocation_units a
               ON a.container_id = p.hobt_id
       INNER JOIN sys.filegroups fg
               ON fg.data_space_id = a.data_space_id
       INNER JOIN sys.partition_schemes AS s
               ON i.data_space_id = s.data_space_id
       INNER JOIN sys.partition_functions AS f
               ON s.function_id = f.function_id
       LEFT OUTER JOIN sys.partition_range_values AS r
                    ON f.function_id = r.function_id
                       AND r.boundary_id = p.partition_number
WHERE  t.NAME = 'orders'
       AND i.type <= 1
       AND a.type = 1 --in row data only

Summary :

Partitioning in SQL Server divides large tables into smaller, manageable parts, improving query performance, manageability, and scalability. By partitioning on the order_date, data is spread across partitions, making operations like querying, updating, and archiving faster and more efficient. Benefits include quicker access to relevant data, optimized resource usage, and the ability to manage older data separately (e.g., moving or archiving). The stages involved are:

Define a partition function to specify range boundaries.
Create a partition scheme to map the partitions to filegroups.
Create the table, specifying the partitioning column and scheme.