Thursday, March 26, 2026

Loading and Processing JSON with PL/SQL

This blog is part of a series about aspects and features of Oracle 26 and Autonomous Transaction Processing Database.

"JSON (JavaScript Object Notation) is a text-based format for storing and exchanging data in a way that’s both human-readable and machine-parsable. … it has grown into a very capable data format that simplifies data interchange across diverse platforms and programming languages."

I have created a demo project that integrates Strava (an app that tracks athletes' activities – in my case, cycling) with an Oracle Autonomous database that does some spatial data processing.  The Strava APIs all return data in JSON.  My application, written in PL/SQL, parses those messages.

Mostly, I want to hold that data in a regular database table, structured conventionally in reasonably named columns.  In different places, I have loaded that JSON data in different ways, depending on requirements and circumstances.  There are three options to choose from:
  1. Directly through a JSON Duality View
  2. Convert each name-value pair in explicit code
  3. Extract Values from JSON in Virtual Columns

Directly through a JSON Duality View

Oracle introduced JSON Relational Duality in Oracle 23ai.  A JSON duality view is a mapping between table data and JSON documents.  It is possible to extract data from a table as a JSON document simply by querying a duality view based on that table.  It is also possible to insert data into the table through the duality views.

For example, Strava tracks my gear (the bike I ride, or the shoes I wear).  I can extract details of each item with the Strava API, and I get a simple JSON document in return.  This is what I get for one of my bikes.
{
  "id" : "b4922223",
  "primary" : false,
  "name" : "Saracen",
  "nickname" : "Saracen",
  "resource_state" : 3,
  "retired" : false,
  "distance" : 1321925,
  "converted_distance" : 1321.9,
  "brand_name" : null,
  "model_name" : null,
  "frame_type" : 3,
  "description" : "",
  "weight" : 14
}
I want to import that into a table in my database that corresponds to that document.
CREATE TABLE gear 
(gear_id            VARCHAR2(20) NOT NULL
,primary            BOOLEAN
,name               VARCHAR2(60) 
,nickname           VARCHAR2(60) 
,resource_state     INTEGER
,retired            BOOLEAN
,distance_m         INTEGER      
,distance_km        NUMBER       
,brand_name         VARCHAR2(60) 
,model_name         VARCHAR2(60)
--frame_type
,description        CLOB
,weight             NUMBER       
,last_updated       TIMESTAMP DEFAULT SYSTIMESTAMP
,CONSTRAINT gear_pk PRIMARY KEY(gear_id)
);
I can use a JSON duality view to make the JSON document correspond to my table structure, rather than code it explicitly.
My table uses the column GEAR_ID as the primary key, but the Strava JSON document just has an ID. A JSON duality view must have a key value called '_id'.  I will map that to the primary key column in this table.  
All the other name-value pairs each map to the corresponding column in the table.  However, I am not bothering to import 'frame type'
CREATE OR REPLACE JSON RELATIONAL DUALITY VIEW gear_dv AS
SELECT JSON {'_id'    : g.gear_id
,'primary'            : g.primary
,'name'               : g.name
,'nickname'           : g.nickname
,'resource_state'     : g.resource_state
,'retired'            : g.retired
,'distance'           : g.distance_m
,'converted_distance' : g.distance_km
,'brand_name'         : g.brand_name
,'model_name'         : g.model_name
--frame_type
,'description'        : g.description
,'weight'             : g.weight
}
FROM gear g
WITH INSERT UPDATE
/
The JSON document arrives as an HTTP response and is held in a CLOB variable.  That has to be parsed into a JSON object with JSON_OBJECT_T.PARSE(). Then I either update an existing record or insert a new one.  In either case, that is done via the duality view.  Note that
  • The Strava id name is updated to _id to match the duality view, and it has to be _id.  
  • I have removed the frame_type and notification_distance name-value pairs.
  j_obj := JSON_OBJECT_T.parse(l_clob);
  l_id  := j_obj.get_string('id');

  BEGIN
    SELECT * INTO r_gear FROM gear WHERE gear_id = p_gear_id FOR UPDATE;
  EXCEPTION
    WHEN no_data_found THEN null;
  END;
  
  IF r_gear.gear_id = p_gear_id THEN
    UPDATE gear_dv d
    SET    d.data = JSON_TRANSFORM
           (value
           ,RENAME '$.id' = '_id'
           ,REMOVE '$.frame_type'
           ,REMOVE '$.notification_distance'
           )
    FROM JSON_TABLE(
           l_clob,
           '$[*]'
           COLUMNS (
             value CLOB FORMAT JSON PATH '$'
           ))
    WHERE d.data."_id" = l_id;
  ELSE
    INSERT INTO gear_dv
    SELECT JSON_TRANSFORM
           (value
           ,RENAME '$.id' = '_id'
           ,REMOVE '$.frame_type'
           ,REMOVE '$.notification_distance'
           )
    FROM JSON_TABLE(
           l_clob,
           '$[*]'
           COLUMNS (
             value CLOB FORMAT JSON PATH '$'
           )
     );
  END IF;
This approach works well where the JSON document structure closely matches the database table structure, and where I don't have to convert the inbound data with any function.  
However, if, for example, the name of the gear had to be upper case, I might put that into the definition of the duality view, thus
CREATE OR REPLACE JSON RELATIONAL DUALITY VIEW gear_dv AS
SELECT JSON {'_id'    : g.gear_id
,'name'               : UPPER(g.name)
…
}
FROM gear g WITH INSERT UPDATE
/
But then I would not be able to insert name via the duality view.  I wouldn't get an error, but it simply wouldn't process the name column.   It would be null after the insert, or would not be updated.
While the duality view is a very elegant way to map the data, it has limitations.  As soon as you need to transform data during the import, you probably have to go back to coding the mapping for each name-value pair.

See also 

Convert each name-value pair

The more conventional approach is to copy each name-value pair to a column using one of the get functions, sometimes passing the value through a function, and possibly with logic to determine whether to copy the data.
Here, I have used a row type variable and selected the whole current row from the database before updating data values 
BEGIN
    SELECT * INTO r_activities 
    FROM   activities 
    WHERE  activity_id = p_activity_id 
    FOR UPDATE;
  EXCEPTION
    WHEN no_data_found THEN r_activities.activity_id := p_activity_id;
  END;
…
  j_obj := JSON_OBJECT_T.parse(l_clob);

  r_activities.activity_id       := j_obj.get_number('id');
  r_activities.athlete_id        := j_obj.get_object('athlete').get_number('id');
  r_activities.start_date_utc    := iso8601_utc(j_obj.get_string('start_date'));
  r_activities.start_date_local  := iso8601_tz(j_obj.get_string('start_date_local'), j_obj.get_string('timezone'));
…
  r_activities.distance_km       := j_obj.get_number('distance')/1000;

  r_activities.gear_id           := j_obj.get_string('gear_id');
  IF r_activities.type IN('Ride','Walk','Hike','VirtualRide','Run') THEN
    j_subobj                     := j_obj.get_object('gear');
    IF j_subobj IS NOT NULL THEN
      r_activities.gear_name     := j_subobj.get_string('name');
    END IF;
  END IF;
…
  r_activities.photo_count       := j_obj.get_object('photos').get_number('count');
…
Then the entire row can be inserted or updated at the end from the row-type variable.
  BEGIN  
    INSERT INTO activities VALUES p_activities;
    dbms_output.put_line(sql%rowcount||' activity inserted');
    COMMIT;

  EXCEPTION 
    WHEN DUP_VAL_ON_INDEX THEN
      UPDATE activities
      SET ROW = p_activities
      WHERE  activity_id = p_activities.activity_id;
      dbms_output.put_line(sql%rowcount||' activity updated');   
      COMMIT;
  END;

Extract Values from JSON in Virtual Columns

The other option is to store the JSON in a CLOB column in the database and convert it on demand via virtual columns.  Whenever I log a new Strava activity or update, or delete an existing activity, I have subscribed to receive a message from Strava. That message is received by the database using a REST service.  
The message from Strava just tells me that an activity has been created, updated or deleted.  Then I have to process it.  Sometimes, I get multiple messages for the same activity in quick succession.
{
    "aspect_type": "update",
    "event_time": 1516126040,
    "object_id": 1360128428,
    "object_type": "activity",
    "owner_id": 134815,
    "subscription_id": 120475,
    "updates": {
        "title": "Messy"
    }
}
Strava requires that the REST service respond within 2 seconds, so any processing in it must be kept light.  I want to avoid:
  • spending time converting the JSON data while the REST service is running,
  • any malformed or unexpected variation in JSON causing an error in the REST service,
  • concurrent processing of different requests relating to the same activity causing one REST service handler to block another.  
Therefore, my REST service just stores the JSON in a CLOB column on a table and then triggers a scheduler job to process the message.  The subsequent processing needs to access the name-values in the JSON, so I have created virtual columns on the queue table that will only be evaluated on demand.
CREATE TABLE webhook_events
(ID                NUMBER GENERATED ALWAYS AS IDENTITY
,PAYLOAD           CLOB
,processing_status NUMBER DEFAULT 0 NOT NULL
…
,CONSTRAINT webhook_events_pk PRIMARY KEY (id)
);

ALTER TABLE webhook_events ADD aspect_type      GENERATED ALWAYS AS (JSON_VALUE(payload, '$."aspect_type"')) VIRTUAL;
ALTER TABLE webhook_events ADD object_type      GENERATED ALWAYS AS (JSON_VALUE(payload, '$."object_type"')) VIRTUAL;
ALTER TABLE webhook_events ADD object_id NUMBER GENERATED ALWAYS AS (JSON_VALUE(payload, '$."object_id"'  )) VIRTUAL;
In Strava, times are held in Unix 'Epoch Time' (the number of non-leap seconds since midnight UTC on 1st Jan 1970).  I have created a deterministic PL/SQL function to convert it to an Oracle timestamp and have referenced it in my virtual column definition.  
One virtual column cannot reference another.  So, I could not reference the virtual column EVENT_TIME in another virtual column EVENT_TIMESTAMP.  Instead, I had to reference the event_time name-value pair in both column definitions.
CREATE OR REPLACE FUNCTION strava.epoch_to_tstz 
(p_epoch_seconds IN NUMBER
) RETURN TIMESTAMP DETERMINISTIC IS
BEGIN
  RETURN TO_TIMESTAMP_TZ('1970-01-01 00:00:00 UTC', 'YYYY-MM-DD HH24:MI:SS TZR') 
       + NUMTODSINTERVAL(p_epoch_seconds, 'SECOND');
END epoch_to_tstz;
/

ALTER TABLE webhook_events ADD event_time NUMBER 
   GENERATED ALWAYS AS (JSON_VALUE(payload, '$."event_time"' )) VIRTUAL;
ALTER TABLE webhook_events ADD event_timestamp TIMESTAMP WITH TIME ZONE 
   GENERATED ALWAYS AS (epoch_to_tstz(JSON_VALUE(payload, '$."event_time"'))) VIRTUAL;
…
If the message is an update, it contains a JSON object listing the updated items and their new values.  The column updates contains this JSON document.
ALTER TABLE webhook_events ADD updates GENERATED ALWAYS AS (JSON_QUERY(payload, '$."updates"' )) VIRTUAL;
I can now reference the virtual columns in SQL in the queue handler without converting and storing the values in regular columns.
…
  FOR i IN ( --interate requests
    SELECT h.*, a.activity_id
    FROM webhook_events h
      LEFT OUTER JOIN activities a ON a.activity_id = h.object_id 
    WHERE h.processing_status = 0
    AND   h.object_type = 'activity'
    ORDER BY h.id
    FOR UPDATE OF h.processing_status 
  ) LOOP
…

No comments :