Friday, January 26, 2024

Just because the execution plan says INMEMORY, it doesn't mean it is using In-Memory

Parallel Query

If you are using RAC, and you have in-memory objects populated across nodes (i.e. distribution by ROWID range) or you have objects populated in only 1 node (i.e. distribution by partition or sub-partition) then you need to use parallel query to access data populated on a node to which the query is not connected. 

  • There is no cache fusion with Database In-Memory.  Oracle does not ship In-Memory Compression Units (IMCUs) across the RAC interconnect.
  • Similarly, if you have set PARALLEL_FORCE_LOCAL=TRUE the parallel query will not be able to access the remote nodes.
In-memory improves performance by avoiding physical I/O, but the reduction in CPU consumption can be more significant.  In the cloud, this can save money by reducing your cloud subscription costs.  However, parallel query can be a brutal way of using CPU to complete a query faster.  It often increases total CPU consumption, thus negating some of the benefits of in-memory.

Options:

A query that is not executing in parallel will only be able to access objects in the local in-memory store.  You can ensure that a segment is stored in the in-memory store on every RAC node by specifying DUPLICATE ALL.  Parallel queries will also use the local in-memory store.  
  • This option can improve performance but the in-memory stores consume more memory.  On a 2-node RAC database, it doubles the memory consumption of In-Memory.
  • The DUPLICATE option is only available on Exadata.  On other platforms, it is ignored (see also Oracle Database In-Memory on RAC - Part 3).
Alternatively, you can use database services to create node affinity.  
  • A process can connect using a database service that specifies a specific node or nodes.  
  • Parallel queries can be restricted to specific nodes by setting PARALLEL_INSTANCE_GROUP to use a service (see also Oracle Database In-Memory on RAC - Part 2).
  • In-memory segments can be placed in the in-memory store on specific nodes by distributing them with a specific service (see also How to control where objects are populated into [In-]memory on RAC).
  • You may prefer to create different services for the query processes and in-memory population processes.  In the case of node failure, you probably want the query process connection to fail over to another node.  However, you may not want that to happen for in-memory distribution processes because of the additional memory overhead.
Otherwise, on a 2-node RAC, a non-parallel query has a 50% chance of finding the segment in the in-memory store because it has a 50% chance of connecting to the node where it is stored!

Is It Using In Memory?

I am going to demonstrate this using a table with 2 partitions.
CREATE TABLE t (a number, b number, c VARCHAR2(1000)) PARTITION BY RANGE (b)
(partition t1 VALUES LESS THAN(50)       
,partition t2 VALUES LESS THAN(MAXVALUE)
) INMEMORY;
INSERT INTO t SELECT level, MOD(level,100),  RPAD(TO_CHAR(TO_DATE(level,'j'),'Jsp'),100,'.')
FROM DUAL CONNECT BY LEVEL <= 1e5;
commit;

Serial Query

I am going to generate execute plans for two similar queries that each query different partitions of a table. The execution plans have the same plan hash value. The only difference is that the first query accesses only the first partition, and the second query only accesses the second partition.  
Both plans claim they are doing an INMEMORY full scan of the table.  However, this is only a statement of intent.
explain plan for SELECT sum(a), sum(b), count(*) FROM t WHERE b=42;
…
Plan hash value: 2993254470
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |    26 |    11   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE              |      |     1 |    26 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE     |      |   942 | 24492 |    11   (0)| 00:00:01 |     1 |     1 |
|*  3 |    TABLE ACCESS INMEMORY FULL| T    |   942 | 24492 |    11   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------------
explain plan for SELECT sum(a), sum(b), count(*) FROM t WHERE b=56;
…
Plan hash value: 2993254470
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |    26 |    11   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE              |      |     1 |    26 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE     |      |   926 | 24076 |    11   (0)| 00:00:01 |     2 |     2 |
|*  3 |    TABLE ACCESS INMEMORY FULL| T    |   926 | 24076 |    11   (0)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------------
Oracle, distributes the partitions across the in-memory stores in RAC nodes.  In my case, the first partition is on instance 1, and the second partition is on instance 2.
select inst_id, owner, segment_name, partition_name, inmemory_size, bytes, bytes_not_populated, populate_status, inmemory_duplicate
from gv$im_segments where segment_name = 'T' order by inst_id;

   INST_ID OWNER      SEGMENT_NAME PARTITION_NAME INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED POPULATE_STAT INMEMORY_DUPL
---------- ---------- ------------ -------------- ------------- ---------- ------------------- ------------- -------------
         1 SYSADM     T            T1                   6422528    8241152                   0 COMPLETED     NO DUPLICATE
         2 SYSADM     T            T2                   6422528    8241152                   0 COMPLETED     NO DUPLICATE
I will run the queries, capturing the session statistics to a temporary table. I use the Oracle-delivered global temporary table plan_table so that I don't have to create my own table.
delete from plan_table;
insert into plan_table (statement_id, plan_id, id, cost, parent_id) select '_', s.* from v$mystat s;

SELECT sum(a), sum(b), count(*) FROM t WHERE b=42;

insert into plan_table (statement_id, plan_id, id, cost, parent_id) select 'A', s.* from v$mystat s;

SELECT sum(a), sum(b), count(*) FROM t WHERE b=56;

insert into plan_table (statement_id, plan_id, id, cost, parent_id) select 'B', s.* from v$mystat s;
Then I can simply query where the IM statistics are different.
with x (scenario, sid, statistic#, value) as (select statement_id, plan_id, id, cost from plan_table)
select x.statistic#, n.name 
, a.value-x.value diff_a
, b.value-a.value diff_b
from v$statname n, x, x a, x b
where x.scenario = '_'
and x.sid = a.sid and x.statistic# = a.statistic# and a.scenario = 'A'
and x.sid = b.sid and x.statistic# = b.statistic# and b.scenario = 'B'
and (x.value < a.value OR a.value < b.value)
and n.statistic# = x.statistic#
and n.name like 'IM %' and not n.name like 'IM %populate%'
order by x.statistic#;
I only got an in-memory query on instance 2. On instance 1, there is a single IM scan segments disk operation. This is the 'number of times a segment marked for in-memory was accessed entirely from the buffer cache/direct read' (see Popular Statistics with Database In-Memory). This indicates that there was no in-memory query. 
STATISTIC# NAME                                                   DIFF_A     DIFF_B
---------- -------------------------------------------------- ---------- ----------
       772 IM scan CUs no cleanout                                     0          1
       802 IM scan CUs current                                         0          1
       830 IM scan CUs readlist creation accumulated time              0          2
       832 IM scan CUs readlist creation number                        0          1
       838 IM scan delta - only base scan                              0          1
      1376 IM scan CUs pcode aggregation pushdown                      0          3
      1377 IM scan rows pcode aggregated                               0       1000
      1379 IM scan CUs pcode pred evaled                               0          1
      1385 IM scan dict engine results reused                          0          3
      1480 IM scan CUs memcompress for query low                       0          1
      1493 IM scan segments disk                                       1          0
      1494 IM scan bytes in-memory                                     0    5940559
      1495 IM scan bytes uncompressed                                  0    5444950
      1496 IM scan CUs columns accessed                                0          2
      1498 IM scan CUs columns theoretical max                         0          3
      1505 IM scan rows                                                0      50000
      1506 IM simd compare calls                                       0          3
      1512 IM simd decode unpack calls                                 0          6
      1513 IM simd decode symbol calls                                 0          2
      1520 IM simd decode unpack selective calls                       0          6
      1527 IM scan rows valid                                          0      50000
      1533 IM scan rows projected                                      0          1
      1538 IM scan CUs split pieces                                    0          1
      1571 IM scan CUs predicates received                             0          1
      1572 IM scan CUs predicates applied                              0          1
      1577 IM scan segments minmax eligible                            0          1
      1611 IM SubCU-MM CUs Examined                                    0          1

Parallel Query 

I will repeat the test, but use a parallel hint to enable parallel query.
SELECT /*+PARALLEL*/ sum(a), sum(b), count(*) FROM t WHERE b=42;
Now, I get a parallel execution plan
Plan hash value: 943991435
 
-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |     1 |    26 |     6   (0)| 00:00:01 |       |       |        |      |            |
|   1 |  SORT AGGREGATE                 |          |     1 |    26 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR                |          |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)          | :TQ10000 |     1 |    26 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE              |          |     1 |    26 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR          |          |   942 | 24492 |     6   (0)| 00:00:01 |     1 |     1 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS INMEMORY FULL| T        |   942 | 24492 |     6   (0)| 00:00:01 |     1 |     1 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------
The IM statistics show that both the queries performed an in-memory query.
STATISTIC# NAME                                                   DIFF_A     DIFF_B
---------- -------------------------------------------------- ---------- ----------
       772 IM scan CUs no cleanout                                     3          3
       802 IM scan CUs current                                         3          3
       830 IM scan CUs readlist creation accumulated time              4          4
       832 IM scan CUs readlist creation number                        3          3
       838 IM scan delta - only base scan                              3          3
      1376 IM scan CUs pcode aggregation pushdown                      9          9
      1377 IM scan rows pcode aggregated                            1000       1000
      1379 IM scan CUs pcode pred evaled                               3          3
      1385 IM scan dict engine results reused                          9          9
      1480 IM scan CUs memcompress for query low                       3          3
      1494 IM scan bytes in-memory                              17819283   17821701
      1495 IM scan bytes uncompressed                           16328826   16334850
      1496 IM scan CUs columns accessed                                6          6
      1498 IM scan CUs columns theoretical max                         9          9
      1505 IM scan rows                                           150000     150000
      1506 IM simd compare calls                                       9          9
      1512 IM simd decode unpack calls                                18         18
      1513 IM simd decode symbol calls                                 6          6
      1520 IM simd decode unpack selective calls                      18         18
      1527 IM scan rows valid                                      50000      50000
      1529 IM scan rows range excluded                            100000     100000
      1533 IM scan rows projected                                      3          3
      1538 IM scan CUs split pieces                                    6          3
      1571 IM scan CUs predicates received                             3          3
      1572 IM scan CUs predicates applied                              3          3
      1577 IM scan segments minmax eligible                            3          3
      1611 IM SubCU-MM CUs Examined                                    3          3

Duplicate In-Memory Store 

This time, I will repeat the test with a duplicate in-memory store. The DUPLICATE option stores the segment in the in-memory store on one other RAC node, the DUPLICATE ALL option stores it on all RAC nodes. On a 2-node RAC they come to the same thing.
CREATE TABLE t (a number, b number, c VARCHAR2(1000)) PARTITION BY RANGE (b)
(partition t1 VALUES LESS THAN(50)       
,partition t2 VALUES LESS THAN(MAXVALUE)
) INMEMORY DUPLICATE ALL;
Now, both partitions are stored on both instances.
select inst_id, owner, segment_name, partition_name, inmemory_size, bytes, bytes_not_populated, populate_status, inmemory_duplicate
from gv$im_segments where segment_name = 'T' order by inst_id, segment_name, partition_name;

   INST_ID OWNER      SEGMENT_NAME PARTITION_NAME INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED POPULATE_STAT INMEMORY_DUPL
---------- ---------- ------------ -------------- ------------- ---------- ------------------- ------------- -------------
         1 SYSADM     T            T1                   6422528    8241152                   0 COMPLETED     DUPLICATE
         1 SYSADM     T            T2                   6422528    8241152                   0 COMPLETED     DUPLICATE
         2 SYSADM     T            T1                   6422528    8241152                   0 COMPLETED     DUPLICATE
         2 SYSADM     T            T2                   9568256    8241152                   0 COMPLETED     DUPLICATE 
I will return to the original queries without the parallel hints
SELECT sum(a), sum(b), count(*) FROM t WHERE b=42;
SELECT sum(a), sum(b), count(*) FROM t WHERE b=56;
The in-memory statistics are the same for both queries, indicating that an in-memory query was successfully performed for both partitions because both partitions are stored in in-memory on both instances.
STATISTIC# NAME                                                   DIFF_A     DIFF_B
---------- -------------------------------------------------- ---------- ----------
       772 IM scan CUs no cleanout                                     1          1
       802 IM scan CUs current                                         1          1
       830 IM scan CUs readlist creation accumulated time              3          2
       832 IM scan CUs readlist creation number                        1          1
       838 IM scan delta - only base scan                              1          1
      1376 IM scan CUs pcode aggregation pushdown                      3          3
      1377 IM scan rows pcode aggregated                            1000       1000
      1379 IM scan CUs pcode pred evaled                               1          1
      1385 IM scan dict engine results reused                          3          3
      1480 IM scan CUs memcompress for query low                       1          1
      1494 IM scan bytes in-memory                               5939777    5940563
      1495 IM scan bytes uncompressed                            5442942    5444950
      1496 IM scan CUs columns accessed                                2          2
      1498 IM scan CUs columns theoretical max                         3          3
      1505 IM scan rows                                            50000      50000
      1506 IM simd compare calls                                       3          3
      1512 IM simd decode unpack calls                                 6          6
      1513 IM simd decode symbol calls                                 2          2
      1520 IM simd decode unpack selective calls                       6          6
      1527 IM scan rows valid                                      50000      50000
      1533 IM scan rows projected                                      1          1
      1538 IM scan CUs split pieces                                    2          2
      1571 IM scan CUs predicates received                             1          1
      1572 IM scan CUs predicates applied                              1          1
      1577 IM scan segments minmax eligible                            1          1
      1611 IM SubCU-MM CUs Examined                                    1          1

TL;DR 

The presence of an in-memory operation in an execution plan does not mean that the statement is definitely using in-memory. Rather, it means that it will use an in-memory query if it finds the segment in the in-memory store, and that content is up to date. 
Look at the session level statistics to determine whether the query really did use in-memory as I have demonstrated in this blog, or a SQL Monitor active report (see Oracle Database In-Memory on RAC - Part 1 (revised)). 
Parallel query is used to access an object stored in in-memory on a different node remote to where the session is connected. If the query is not run in parallel it will not be able to access it. This will be indicated by the ' IM scan segments disk' statistics. Alternatives are to duplicate the in-memory store on Exadata or to use services to create node affinity.

My thanks to Andy Rivenes for the initial comment that sent me off into this subject, and to the various articles that he and Maria Colgan have posted on Oracle Database In-Memory blog that I have linked in this note.

No comments :