Friday, July 15, 2016

Complete Refresh of Materialized Views: Atomic, Non-Atomic and Out-of-Place

Recently, and more than once, I have found myself explaining exactly how an Oracle database performs a complete refresh of a materialized view, and what are the implications.  So I thought I would set it out in a blog post.
I am only looking at the case where it is not possible to incrementally refresh using on a materialized view log because the materialized view joins tables together, groups data, or breaches one of the other restrictions (see General Restrictions on Fast Refresh).

Behaviour of DBMS_MVIEW.REFRESH

Materialized views are managed by the delivered DBMS_MVIEW package.  The REFRESH procedure can refresh one or more materialized views.  I want to discuss at the effect of the ATOMIC_REFRESH parameter.  This parameter has been present since at least Oracle 8i when materialized views were called snapshots, so none of this is new.
DBMS_MVIEW.REFRESH (
   { list                 IN     VARCHAR2,
   | tab                  IN     DBMS_UTILITY.UNCL_ARRAY,}
…
   atomic_refresh         IN     BOOLEAN        := true,
…
);
This is what the Oracle 12c documentation says:
"If this parameter is set to true, then the list of materialized views is refreshed in a single transaction. All of the refreshed materialized views are updated to a single point in time. If the refresh fails for any of the materialized views, none of the materialized views are updated.
If this parameter is set to false, then each of the materialized views is refreshed non-atomically in separate transactions.
As part of complete refresh, if truncate is used (non-atomic refresh), unique index rebuild is executed. INDEX REBUILD automatically computes statistics. Thus, statistics are updated for truncated tables."
It is possible to see the behaviour of the refresh procedure by using SQL trace.  My tests were run on a 12c database.  I have cut a lot out of the traces to make them easier to read.

Trace of Atomic Refresh

  • All constraints are deferred to the end of the transaction which is not until all the specified materialized views are refreshed.
  • For each materialized view, all rows are deleted
  • Then the complete content is reinserted.
  • If more than one materialised view is refreshed in a refresh group, or by specifying them in a single execution of refresh, they are refreshed in a single transaction.
Trace file /home/oracle/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_27100_ATOMIC.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
System name: Linux
Node name: localhost.localdomain
Release: 3.8.13-68.1.3.el7uek.x86_64
Version: #2 SMP Wed Apr 22 11:51:54 PDT 2015
Machine: x86_64
=====================
PARSING IN CURSOR #140647220341720 len=73 dep=0 uid=156 oct=47 lid=156 tim=2204121046481 hv=3058798929 ad='c86e9600' sqlid='1rz3cakv534aj'
BEGIN DBMS_MVIEW.REFRESH('MY_EXAMPLE2','C', atomic_refresh=>TRUE); END;
END OF STMT
PARSE #140647220341720:c=0,e=139,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=2204121046475
=====================
PARSING IN CURSOR #140647220211192 len=28 dep=1 uid=156 oct=90 lid=156 tim=2204121059425 hv=3879155062 ad='c88f49d0' sqlid='231qv0rmmfdbq'
SET CONSTRAINTS ALL DEFERRED
END OF STMT
PARSE #140647220211192:c=0,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=2204121059418
EXEC #140647220211192:c=0,e=72,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=2204121059610
CLOSE #140647220211192:c=0,e=9,dep=1,type=0,tim=2204121059737
=====================
PARSING IN CURSOR #140647218302768 len=34 dep=1 uid=156 oct=7 lid=156 tim=2204121175365 hv=812824424 ad='c86e1658' sqlid='46t32c0s75dv8'
 delete from "SCOTT"."MY_EXAMPLE2"
END OF STMT
PARSE #140647218302768:c=0,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2149302637,tim=2204121175360
EXEC #140647218302768:c=0,e=303,p=0,cr=3,cu=2,mis=0,r=1,dep=1,og=1,plh=2149302637,tim=2204121175801
pw=0 time=93 us cost=2 size=0 card=1)'
=====================
PARSING IN CURSOR #140647218302768 len=177 dep=1 uid=156 oct=2 lid=156 tim=2204121185717 hv=1441390243 ad='c86d91c8' sqlid='9ctg639aymrp3'
INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "SCOTT"."MY_EXAMPLE2" SELECT 2 id
,      sysdate my_system_date
,      count(*) X
from kill_cpu
connect by n > prior n
start with n = 1
END OF STMT
PARSE #140647218302768:c=9000,e=8028,p=0,cr=8,cu=0,mis=1,r=0,dep=1,og=1,plh=2203518424,tim=2204121185711
*** 2016-07-11 14:44:24.152
EXEC #140647218302768:c=11189000,e=11191474,p=0,cr=4,cu=3,mis=0,r=1,dep=1,og=1,plh=2203518424,tim=2204132377380
CLOSE #140647218302768:c=0,e=18,dep=1,type=0,tim=2204132377809

Trace of Non-Atomic Refresh

  • Again, all constraints are deferred to the end of the transaction, which this time is only as far as the truncate statement.
  • An exclusive lock is obtained on the materialized view to ensure that there are no in-flight transactions on it.
  • The materialized view is then truncated.  The purge materialized view log option is specified in case another materialized view is based on this materialized view
  • The materialized view is repopulated, but in direct path mode because the APPEND hint is specified.
  • Stats are automatically collected on a 12c database (note this statement is one level deeper).
Trace file /home/oracle/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_27100_NOTATOMIC.trc
=====================
PARSING IN CURSOR #140647224204240 len=74 dep=0 uid=156 oct=47 lid=156 tim=2204109695504 hv=2544739527 ad='c8c758b8' sqlid='fm9xzbfbuv967'
BEGIN DBMS_MVIEW.REFRESH('MY_EXAMPLE1','C', atomic_refresh=>FALSE); END;
END OF STMT
PARSE #140647224204240:c=0,e=124,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=2204109695500
=====================
PARSING IN CURSOR #140647223494264 len=28 dep=1 uid=156 oct=90 lid=156 tim=2204109716722 hv=3879155062 ad='c88f49d0' sqlid='231qv0rmmfdbq'
SET CONSTRAINTS ALL DEFERRED
END OF STMT
PARSE #140647223494264:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=2204109716718
EXEC #140647223494264:c=0,e=8,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=2204109716836
CLOSE #140647223494264:c=0,e=15,dep=1,type=0,tim=2204109717294 
=====================
PARSING IN CURSOR #140647218294296 len=59 dep=2 uid=156 oct=26 lid=156 tim=2204109881174 hv=846098398 ad='c8831a50' sqlid='8td4wj8t6wvyy'
LOCK TABLE "SCOTT"."MY_EXAMPLE1" IN EXCLUSIVE MODE  NOWAIT 
END OF STMT
PARSE #140647218294296:c=1000,e=759,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=1,plh=0,tim=2204109881168
EXEC #140647218294296:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=1,plh=0,tim=2204109881317
CLOSE #140647218294296:c=0,e=11,dep=2,type=0,tim=2204109881369
=====================
PARSING IN CURSOR #140647218302768 len=56 dep=1 uid=156 oct=85 lid=156 tim=2204109882394 hv=4078491682 ad='c8838320' sqlid='1hkq5cgtjjq12'
 truncate table "SCOTT"."MY_EXAMPLE1" purge snapshot log
END OF STMT
PARSE #140647218302768:c=3000,e=3255,p=0,cr=6,cu=1,mis=1,r=0,dep=1,og=1,plh=1592878302,tim=2204109882390
=====================
PARSING IN CURSOR #140647218302768 len=185 dep=1 uid=156 oct=2 lid=156 tim=2204109979170 hv=3331603390 ad='c8824c90' sqlid='bag14pr398fxy'
INSERT /*+ BYPASS_RECURSIVE_CHECK APPEND  */ INTO "SCOTT"."MY_EXAMPLE1" SELECT 1 id
,      sysdate my_system_date
,      count(*) X
from kill_cpu
connect by n > prior n
start with n = 1
END OF STMT
PARSE #140647218302768:c=58000,e=58028,p=0,cr=178,cu=0,mis=1,r=0,dep=1,og=1,plh=1420092967,tim=2204109979166
CLOSE #140647224167168:c=1000,e=608,dep=2,type=1,tim=2204120957626
=====================
PARSING IN CURSOR #140647224532056 len=124 dep=2 uid=0 oct=47 lid=0 tim=2204120958103 hv=1731287439 ad='d54ed0e0' sqlid='827tjqdmm2qcg'
BEGIN    dbms_stats.postprocess_stats(:owner, :tabname, :partname,      :subpartname, 2, :rawstats, :selmap, :clist);  END; 
END OF STMT
PARSE #140647224532056:c=0,e=128,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=0,tim=2204120958098

Comparison

While there are lots of advantages to non-atomic refresh, but perhaps the most significant disadvantage for developers is that the data disappears during refresh.  I think it is this that leads to widespread use of atomic refresh, but in the next section I show how this can be overcome.

Atomic Refresh Non-Atomic Refresh
Delete, Insert, Commit Truncate, Insert /*+APPEND*/
Space occupied by deleted rows not available to be reused until the delete is committed, which is not until after the insert. High water marks reset by truncate. All space in table released immediately. Table likely to be about half size of atomically refreshed materialized view.
Significant undo (and therefore also redo on undo) overhead. Minimal undo overhead.
Delete and Insert performed row-by-row. Indexes are maintained during DML. Insert performed in direct path mode. Faster. Indexes rebuilt when insert committed.
Table can be compressed by explicitly rebuilding after refresh, or requires Advanced Compression Licence. Table data can be compressed using basic compression without needing Advanced Compression Licence.
This would also work well with Hybrid Columnar Compression (HCC) on Exadata.
Statistics are not collected. It may be necessary to collect them manually. In 12c, table statistics will automatically be updated.
Content of materialized view always available due to read consistency. Materialized view will appear to be empty while refresh process is running.
Refresh of multiple materialized views can occur in a single transaction so that they remain consistent. Each materialized view refreshed independently.

Non-Atomic Refresh Without Temporarily Losing the Data

In this section I want to explain how to evade the problem of the data disappearing from the materialized view during non-atomic refresh.  Like all the best conjuring tricks, you require two materialized views!  First, I will describe a workaround for use with Oracle 11g or earlier, then I will look at a new feature in Oracle 12c.
I want to create a simple stand-alone demonstration.  I need a materialized view that will take several seconds to refresh so I can check whether the data is still available during that process.  So, I need a long running query, for which I am using Jonathan Lewis's kill_cpu script.  On my database, it runs for around 10 seconds.
set timi on
DROP TABLE kill_cpu PURGE
/
create table kill_cpu(n primary key) organization index as 
select rownum from dual connect by level <=1
/
--this is Jonathan Lewis' killcpu query - duration doubles for every row in kill_cpu table - so the MVs take time to refresh
select count(*) X 
from kill_cpu 
connect by n > prior n 
start with n = 1 
/
I am going to create two identical materialized views based on (almost) the same query.
--create 2 identical MVs.  this time they will be fast to build
DROP MATERIALIZED VIEW my_example1;
DROP MATERIALIZED VIEW my_example2;

CREATE MATERIALIZED VIEW my_example1
COMPRESS PCTFREE 0
BUILD IMMEDIATE
AS 
SELECT 1 id
,      sysdate my_system_date
,      count(*) X 
from kill_cpu 
connect by n > prior n 
start with n = 1 
/

CREATE MATERIALIZED VIEW my_example2
COMPRESS PCTFREE 0
BUILD IMMEDIATE
AS 
SELECT 2 id
,      sysdate my_system_date
,      count(*) X 
from kill_cpu 
connect by n > prior n 
start with n = 1 
/
--this makes the query in the MVs slow to execute
TRUNCATE TABLE kill_cpu;
INSERT INTO kill_cpu
SELECT rownum FROM dual CONNECT BY LEVEL <=24
/
commit
/
I can see my materialized views and when they were last refreshed
alter session set nls_date_Format = 'hh24:mi:ss dd.mm.yyyy';
--list MVs and last refresh dates
column mview_name format a20
select mview_name, last_refresh_date, last_refresh_end_time
from user_mviews
order by last_refresh_date
/

MVIEW_NAME           LAST_REFRESH_DATE   LAST_REFRESH_END_TI
-------------------- ------------------- -------------------
MY_EXAMPLE1          14:44:01 11.07.2016 14:44:12 11.07.2016
MY_EXAMPLE2          14:44:12 11.07.2016 14:44:23 11.07.2016
The following PL/SQL block
  • refreshes the older version of the two materialized views
  • It then recreates a view on the freshly refreshed materialized view.
    • The application should reference this view, and so it will always point to the most up-to-date materialized view.
    • The view will refer to the older materialized until the refresh is complete so the data will never disappear.
    • Complex view merging means that any query will seamlessly reference the underlying table and the performance will be as if the application had referred to the materialized view.
    • In-flight queries will not be affected by the view being rebuilt. They will continue to run against the now older materialized view.
--refresh MVs, collect stats, rebuild view, 
set serveroutput on timi on
DECLARE
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
  l_mview_owner VARCHAR2(128) := 'SCOTT';
  l_mview_name  VARCHAR2(128);
  l_sql CLOB;
BEGIN
  dbms_application_info.read_module(module_name=>l_module, action_name=>l_action);
  dbms_application_info.set_module(module_name=>'MV_EXAMPLE', action_name=>'Initialise');
  WITH x AS (
    SELECT ROW_NUMBER() OVER (ORDER BY last_refresh_date) as ranking
    ,      mview_name, last_refresh_date  
    FROM   user_mviews
    WHERE  owner = l_mview_owner
    AND    mview_name IN('MY_EXAMPLE1','MY_EXAMPLE2')
  ) 
  SELECT mview_name
  INTO   l_mview_name
  FROM   x
  WHERE  ranking = 1;

  dbms_application_info.set_action(action_name=>'Refresh '||l_mview_name);  
  dbms_output.put_line('About to refresh mview '||l_mview_name);
  DBMS_MVIEW.REFRESH(l_mview_name,'C', atomic_refresh=>FALSE);

  --add stats collection here – not necessary in 12c
  --dbms_application_info.set_action(action_name=>'Collect Stats '||l_mview_name);  
  --dbms_stats.gather_Table_stats(ownname=>l_mview_owner, tabname=>l_mview_name, cascade=>TRUE);

  --rebuild view to point to MV just refreshed  
  dbms_application_info.set_action(action_name=>'Rebuild view MY_EXAMPLE');  
  l_sql:= 'CREATE OR REPLACE VIEW my_example AS SELECT * FROM '||l_mview_name;
  dbms_output.put_line('SQL:'||l_sql);
  EXECUTE IMMEDIATE l_sql;

  --reinstate initial settings for module and action
  dbms_application_info.set_module(module_name=>l_module, action_name=>l_action);
END;
/
select mview_name, last_refresh_date, last_refresh_end_time
from user_mviews
order by last_refresh_date
/
The first time I ran the script MY_EXAMPLE1 was refreshed and the view was rebuilt to point to it.
About to refresh mview MY_EXAMPLE1
SQL:CREATE OR REPLACE VIEW my_example AS SELECT * FROM MY_EXAMPLE1

MVIEW_NAME           LAST_REFRESH_DATE   LAST_REFRESH_END_TI
-------------------- ------------------- -------------------
MY_EXAMPLE2          14:44:12 11.07.2016 14:44:23 11.07.2016
MY_EXAMPLE1          10:39:57 12.07.2016 10:40:06 12.07.2016
The next time I ran the script MY_EXAMPLE2 was refreshed and the view was again rebuilt.
About to refresh mview MY_EXAMPLE2
SQL:CREATE OR REPLACE VIEW my_example AS SELECT * FROM MY_EXAMPLE2

MVIEW_NAME           LAST_REFRESH_DATE   LAST_REFRESH_END_TI
-------------------- ------------------- -------------------
MY_EXAMPLE1          10:39:57 12.07.2016 10:40:06 12.07.2016
MY_EXAMPLE2          10:40:50 12.07.2016 10:40:59 12.07.2016

Summary 

There is a trade-off to this approach, but I suggest that the advantages of this approach to using non-atomic refresh outweigh the additional complexities.
Advantages Disadvantanges
Data is always available during materialized view refresh. The refresh and view rebuild process is more complex, though it could be encapsulated into a PL/SQL procedure.
Each materialized view will be half the size of the atomically refresh materialized view because space is freed by the delete is only available after the commit. So the overall space requirement is about the same as atomic refresh. Two copies of the materialized view and its indexes are required.
Refresh will be faster because indexes are not maintained during the insert, but are rebuilt afterwards.
Queries no longer run on view being refreshed, so database does not have to do consistent reads of data blocks. This should prevent query performance being degraded during refresh.
Basic table compression (or HCC on Exadata) could make it considerably smaller. This should also improve query performance by reducing I/O.
I have not reserved any free space in the table in my demonstration because I know the table will never be updated. Further saving space and reducing I/O.

Out-of-Place Refresh in Oracle 12c

Out-of-place refresh is a new option on DBMS_MVIEW.REFRESH in Oracle 12c.  The complete refresh process builds a new table which seamlessly becomes the materialized view, the old table is dropped.  It only works in conjunction with non-atomic refresh.  It also works with incremental (or fast) refresh.
As with my 11g workaround described above, two copies of the data are used. The refresh process creates and populates a new table with indexes, exchanges the new with the old, and drops the old table. For this example, I have enabled the recycle bin so we can see the dropped table.
exec DBMS_MVIEW.REFRESH('MY_EXAMPLE1','C', atomic_refresh=>FALSE, out_of_place=>TRUE);
select object_id, object_type, object_name, last_ddl_time
from user_objects
where object_name LIKE 'MY_EXAMPLE1'
order by 1
/
select original_name, createtime, droptime, dropscn, base_object, purge_object
from user_recyclebin
order by dropscn
/
Each time I refresh the materialized view, the table gets a new object ID, and a new object appears in the recycle bin.
 OBJECT_ID OBJECT_TYPE             OBJECT_NAME     LAST_DDL_TIME
---------- ----------------------- --------------- -------------------
    112366 MATERIALIZED VIEW       MY_EXAMPLE1     09:50:06 14.07.2016
    112382 TABLE                   MY_EXAMPLE1     08:40:34 14.07.2016

ORIGINAL_NAME   CREATETIME          DROPTIME               DROPSCN BASE_OBJECT
--------------- ------------------- ------------------- ---------- -----------
RV$1B6FD        2016-07-14:09:43:43 2016-07-14:09:50:05   68878082      112381
The object ID of the new entry is the previous object ID of the materialized view. A new table was created and the old table was dropped. The intermediate table name is RV$ followed by the object ID in hexadecimal.
 OBJECT_ID OBJECT_TYPE             OBJECT_NAME     LAST_DDL_TIME
---------- ----------------------- --------------- -------------------
    112366 MATERIALIZED VIEW       MY_EXAMPLE1     09:51:41 14.07.2016
    112383 TABLE                   MY_EXAMPLE1     08:40:34 14.07.2016

ORIGINAL_NAME   CREATETIME          DROPTIME               DROPSCN BASE_OBJECT
--------------- ------------------- ------------------- ---------- -----------
RV$1B6FD        2016-07-14:09:43:43 2016-07-14:09:50:05   68878082      112381
RV$1B6FE        2016-07-14:09:50:03 2016-07-14:09:51:40   68878546      112382
A trace of the refresh process shows the data being inserted into another table. However, note that it is a conventional insert, not a direct-path insert. So statistics will not be collected automatically. The DDL statements at create and drop interim tables are truncated in the trace, but can be reinstated with event 10979, level 1 (see Jonathan Lewis's 12c MView refresh blog).  This event also causes additional progress information about the refresh process to be written to the trace file.
PARSING IN CURSOR #139759670421744 len=178 dep=1 uid=156 oct=2 lid=156 tim=2412454493901 hv=3909484904 ad='d56b62a8' sqlid='0k8chjrnhc0b8'
/* MV_REFRESH (ITB) */INSERT INTO "SCOTT"."RV$1B6E0"  SELECT 1 id
,      sysdate my_system_date
,      n
--,      count(*) X
from kill_cpu
connect by n > prior n
start with n = 1
END OF STMT

Summary

Out-of-Place refresh has some advantages and disadvantages over my 11g method.
AdvantagesDisadvantanges
The entire refresh process is encapsulated within the DBMS_MVIEW.REFRESH procedure.
Physical tablespace is still required to hold two copies of the table, but now only during the refresh. The original table is dropped after the new one is built.  If the recycle bin is enabled, every out-of-place refresh will preserve another copy of the table and any indexes. I recommend disabling the recycle bin.
The data is always available while the materialized view is being refreshed. Long running queries running across the out-of-place refresh will continue to run correctly, at least until the data blocks used to hold the dropped table are overwritten, when an "ORA_08103: object no longer exists" error will be raised (again, see Jonathan Lewis's 12c MView refresh blog).
The table is populated in conventional rather than direct path mode. Although the indexes are built after population. This has a number of consequences.
  • The performance of the conventional insert will not be as good as direct-path insert.
  • Simple table compression will not have any effect. HCC is probably not an option.
  • Table statistics are not automatically collected by On-Line Statistics Gathering, but the indexes will have statistics computed as they are rebuilt during each refresh.