Tuesday, November 17, 2020

Retrofitting Partitioning into Existing Applications: Example 3. Workflow: Separate Active and Inactive Rows, and Partial Indexing.

This post is part of a series about the partitioning of database objects.

Workflow

Workflow is an example of a case where you have a roughly constant volume of active data and an increasing quality of historical inactive data that builds up until such time as it is archived.  Workflow requests are created, worked and closed.  

The PeopleSoft workflow table has four statuses.

INSTSTATUS Description
0 Available
1 Selected
2 Worked
3 Cancelled
Over time the majority of rows in the table end up with status 2 as they are worked and closed, and a few end up being cancelled. These rows are now inactive. All the workflow activity focuses on statuses 0 and 1. Partitioning can be used to separate the active rows from the inactive. I chose to range partition the worklist table by status, creating a partition of active worklist rows where the status is less than 2, and a partition of inactive rows where status is greater than or equal to 2. I could also have used list partitioning to create the same effect.
CREATE TABLE PSWORKLIST (
   BUSPROCNAME VARCHAR2(30) NOT NULL,
   ACTIVITYNAME VARCHAR2(30) NOT NULL,
   EVENTNAME VARCHAR2(30) NOT NULL,
   WORKLISTNAME VARCHAR2(30) NOT NULL,
…
   OPRID VARCHAR2(30) NOT NULL,
…
   INSTSTATUS SMALLINT NOT NULL,
…
)
PARTITION BY RANGE (INSTSTATUS) 
(
 PARTITION WL_OPEN VALUES LESS THAN (2) PCTFREE 20, 
 PARTITION WL_CLOSED VALUES LESS THAN (MAXVALUE) PCTFREE 0
) 
ENABLE ROW MOVEMENT
/
Operators query their worklist queue by their operator ID and the open request status; therefore there is an index to support this query. This index can be locally partitioned, i.e. on INSTSTATUS. 
The optimizer prunes the partition containing closed worklist requests because it knows the open requests can't be found there, and only queries the open partition. 
The open partition remains small because as worklist rows are updated to the closed status they are moved to the closed partition. Therefore, row movement must be enabled on the table. Thus, queries for open worklist requests remain small more efficient. 
There is an additional overhead of moving the rows between partitions as the status is updated to closed, but this is outweighed by the savings of only looking for open records in the open partition. 
Additional free space is specified on the open partition because that is where all the application update activity occurs. Conversely, no free space is required for the closed partition because after the rows move there, they are not updated until they are purged. 
From Oracle 12c, it is also possible to partially index a partitioned table. You can choose to build specific partitions in a local index by marking indexing on or off on the table partitions. In this example, it is only necessary to index the open workflow records. The application will never query the closed ones by operator ID, so indexing can be disabled on the closed partition.  Thus saving space and index management overhead.
ALTER TABLE PSWORKLIST MODIFY PARTITION WL_OPEN INDEXING ON;
ALTER TABLE PSWORKLIST MODIFY PARTITION WL_CLOSED INDEXING OFF;

CREATE INDEX PSBPSWORKLIST ON PSWORKLIST (OPRID, INSTSTATUS) 
LOCAL 
INDEXING PARTIAL
/
Here is my worklist table with two partitions, and some sample data.  You can see over 90% of the rows are in the closed partition.
SELECT table_name, partition_name, num_rows, blocks
FROM   dba_tab_statistics
WHERE  table_name = 'PSWORKLIST'
ORDER BY partition_name nulls first
/
TABLE_NAME         PARTITION_NAME                   NUM_ROWS     BLOCKS
------------------ ------------------------------ ---------- ----------
PSWORKLIST                                            100000       2711
PSWORKLIST         WL_CLOSED                           90742       2430
PSWORKLIST         WL_OPEN                              9258        281
There is a notional entry in DBA_IND_STATISTICS for the index on the closed partition, but it says that it holds no rows and consumes no blocks. The index-level statistics for index PSBPSWORKLIST are an estimate of the total for the index if all partitions were indexed (although in fact, the B-tree level would actually still have been 1 if I had built the index in my test case).
SELECT index_name, partition_name, num_rows, blevel, leaf_blocks
FROM   dba_ind_statistics
WHERE  table_name = 'PSWORKLIST'
ORDER BY index_name, partition_name nulls first
/
INDEX_NAME         PARTITION_NAME                   NUM_ROWS     BLEVEL LEAF_BLOCKS
------------------ ------------------------------ ---------- ---------- -----------
PSBPSWORKLIST                                         100000          2         318
                   WL_CLOSED                               0          0           0
                   WL_OPEN                              9258          1          27

PS_PSWORKLIST                                         100000          2         814
The index segment for the closed partition does not physically exist, so it is not reported in DBA_SEGMENTS.
SELECT segment_type, segment_name, partition_name, blocks
FROM   dba_segments
WHERE  segment_name like 'PS_PSWORKLIST'
ORDER BY segment_name
/
SEGMENT_TYPE       SEGMENT_NAME                   PARTITION_NAME                     BLOCKS
------------------ ------------------------------ ------------------------------ ----------
INDEX PARTITION    PSBPSWORKLIST                  WL_OPEN                               128
INDEX              PS_PSWORKLIST                                                        896
When active working requests are queried, the index may be used.
SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus IN(1);

Plan hash value: 3105966310
----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |               |     9 |  1953 |    11   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |               |     9 |  1953 |    11   (0)| 00:00:01 |     1 |     1 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PSWORKLIST    |     9 |  1953 |    11   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN                        | PSBPSWORKLIST |     9 |       |     1   (0)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------------------------------
However, the optimizer may still judge that it is easier to full scan the small table partition.
SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus IN(0,1);

Plan hash value: 1913856494
-----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            |     9 |  1953 |    86   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE INLIST|            |     9 |  1953 |    86   (0)| 00:00:01 |KEY(I) |KEY(I) |
|*  2 |   TABLE ACCESS FULL    | PSWORKLIST |     9 |  1953 |    86   (0)| 00:00:01 |KEY(I) |KEY(I) |
-----------------------------------------------------------------------------------------------------
A query on closed requests can only full scan the unindexed partition.
SELECT * FROM psworklist WHERE oprid = 'OPRID042' AND inststatus = 2;

Plan hash value: 597831193
-----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            |    86 | 18662 |   718   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|            |    86 | 18662 |   718   (1)| 00:00:01 |     2 |     2 |
|*  2 |   TABLE ACCESS FULL    | PSWORKLIST |    86 | 18662 |   718   (1)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------------
A query across both partitions may choose to use the index where it is available and full scan where it is not.
SELECT * FROM psworklist WHERE oprid = 'OPRID042';

Plan hash value: 3927567812
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |               |   101 | 21917 |   730   (1)| 00:00:01 |       |       |
|   1 |  VIEW                                        | VW_TE_2       |   101 | 58580 |   730   (1)| 00:00:01 |       |       |
|   2 |   UNION-ALL                                  |               |       |       |            |          |       |       |
|   3 |    PARTITION RANGE SINGLE                    |               |    10 |  2170 |    12   (0)| 00:00:01 |     1 |     1 |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PSWORKLIST    |    10 |  2170 |    12   (0)| 00:00:01 |     1 |     1 |
|*  5 |      INDEX RANGE SCAN                        | PSBPSWORKLIST |    10 |       |     2   (0)| 00:00:01 |     1 |     1 |
|   6 |    PARTITION RANGE SINGLE                    |               |    91 | 19747 |   718   (1)| 00:00:01 |     2 |     2 |
|*  7 |     TABLE ACCESS FULL                        | PSWORKLIST    |    91 | 19747 |   718   (1)| 00:00:01 |     2 |     2 |
------------------------------------------------------------------------------------------------------------------------------
In this case, I cannot partition the unique index because the partitioning column does not appear in it. So that must remain a global non-partitioned index.
CREATE UNIQUE  INDEX PS_PSWORKLIST ON PSWORKLIST (BUSPROCNAME,
   ACTIVITYNAME,
   EVENTNAME,
   WORKLISTNAME,
   INSTANCEID) 
/

Conclusion

  • Make sure you understand what your application is doing. 
  • Match the partitioning to the way the application accesses data so that the application queries prune partitions. Even if that means that it is harder to archive data. 
  • If you are not getting partition elimination, you probably should not be partitioning. 
  • Range and list partitioning keep similar data values together, so it follows that dissimilar data values are kept apart in different segments. That can avoid I/O during scans, but if it keeps transactions apart it can also avoid read consistency. 
  • Hash partitioning spreads data out across segments and can be used to avoid some forms of contention.
  • Partitioning can separate data with different usage profiles, such as active rows from inactive rows. They might then have different indexing requirements. 
  • Partial indexing of partitioned tables allows you to choose which partitions should be built in a locally partitioned index.

No comments :