"JSON (JavaScript Object Notation) is a text-based format for storing and exchanging data in a way that’s both human-readable and machine-parsable. … it has grown into a very capable data format that simplifies data interchange across diverse platforms and programming languages."
I have created a demo project that integrates Strava (an app that tracks athletes' activities – in my case, cycling) with an Oracle Autonomous database that does some spatial data processing. The Strava APIs all return data in JSON. My application, written in PL/SQL, parses those messages.
Mostly, I want to hold that data in a regular database table, structured conventionally in reasonably named columns. In different places, I have loaded that JSON data in different ways, depending on requirements and circumstances. There are three options to choose from:
- Directly through a JSON Duality View
- Convert each name-value pair in explicit code
- Extract Values from JSON in Virtual Columns
Directly through a JSON Duality View
Oracle introduced JSON Relational Duality in Oracle 23ai. A JSON duality view is a mapping between table data and JSON documents. It is possible to extract data from a table as a JSON document simply by querying a duality view based on that table. It is also possible to insert data into the table through the duality views.
For example, Strava tracks my gear (the bike I ride, or the shoes I wear). I can extract details of each item with the Strava API, and I get a simple JSON document in return. This is what I get for one of my bikes.
{ "id" : "b4922223", "primary" : false, "name" : "Saracen", "nickname" : "Saracen", "resource_state" : 3, "retired" : false, "distance" : 1321925, "converted_distance" : 1321.9, "brand_name" : null, "model_name" : null, "frame_type" : 3, "description" : "", "weight" : 14 }I want to import that into a table in my database that corresponds to that document.
CREATE TABLE gear (gear_id VARCHAR2(20) NOT NULL ,primary BOOLEAN ,name VARCHAR2(60) ,nickname VARCHAR2(60) ,resource_state INTEGER ,retired BOOLEAN ,distance_m INTEGER ,distance_km NUMBER ,brand_name VARCHAR2(60) ,model_name VARCHAR2(60) --frame_type ,description CLOB ,weight NUMBER ,last_updated TIMESTAMP DEFAULT SYSTIMESTAMP ,CONSTRAINT gear_pk PRIMARY KEY(gear_id) );
I can use a JSON duality view to make the JSON document correspond to my table structure, rather than code it explicitly.
My table uses the column GEAR_ID as the primary key, but the Strava JSON document just has an ID. A JSON duality view must have a key value called '_id'. I will map that to the primary key column in this table.
All the other name-value pairs each map to the corresponding column in the table. However, I am not bothering to import 'frame type'
CREATE OR REPLACE JSON RELATIONAL DUALITY VIEW gear_dv AS
SELECT JSON {'_id' : g.gear_id
,'primary' : g.primary
,'name' : g.name
,'nickname' : g.nickname
,'resource_state' : g.resource_state
,'retired' : g.retired
,'distance' : g.distance_m
,'converted_distance' : g.distance_km
,'brand_name' : g.brand_name
,'model_name' : g.model_name
--frame_type
,'description' : g.description
,'weight' : g.weight
}
FROM gear g
WITH INSERT UPDATE
/The JSON document arrives as an HTTP response and is held in a CLOB variable. That has to be parsed into a JSON object with JSON_OBJECT_T.PARSE(). Then I either update an existing record or insert a new one. In either case, that is done via the duality view. Note that- The Strava id name is updated to _id to match the duality view, and it has to be _id.
- I have removed the frame_type and notification_distance name-value pairs.
j_obj := JSON_OBJECT_T.parse(l_clob); l_id := j_obj.get_string('id'); BEGIN SELECT * INTO r_gear FROM gear WHERE gear_id = p_gear_id FOR UPDATE; EXCEPTION WHEN no_data_found THEN null; END; IF r_gear.gear_id = p_gear_id THEN UPDATE gear_dv d SET d.data = JSON_TRANSFORM (value ,RENAME '$.id' = '_id' ,REMOVE '$.frame_type' ,REMOVE '$.notification_distance' ) FROM JSON_TABLE( l_clob, '$[*]' COLUMNS ( value CLOB FORMAT JSON PATH '$' )) WHERE d.data."_id" = l_id; ELSE INSERT INTO gear_dv SELECT JSON_TRANSFORM (value ,RENAME '$.id' = '_id' ,REMOVE '$.frame_type' ,REMOVE '$.notification_distance' ) FROM JSON_TABLE( l_clob, '$[*]' COLUMNS ( value CLOB FORMAT JSON PATH '$' ) ); END IF;This approach works well where the JSON document structure closely matches the database table structure, and where I don't have to convert the inbound data with any function.
However, if, for example, the name of the gear had to be upper case, I might put that into the definition of the duality view, thus
CREATE OR REPLACE JSON RELATIONAL DUALITY VIEW gear_dv AS
SELECT JSON {'_id' : g.gear_id
,'name' : UPPER(g.name)
…
}
FROM gear g WITH INSERT UPDATE
/But then I would not be able to insert name via the duality view. I wouldn't get an error, but it simply wouldn't process the name column. It would be null after the insert, or would not be updated.While the duality view is a very elegant way to map the data, it has limitations. As soon as you need to transform data during the import, you probably have to go back to coding the mapping for each name-value pair.
See also
- Oracle 26 Documentation: JSON-Relational Duality Developer's Guide
- That Jeff Smith:
Convert each name-value pair
The more conventional approach is to copy each name-value pair to a column using one of the get functions, sometimes passing the value through a function, and possibly with logic to determine whether to copy the data.
Here, I have used a row type variable and selected the whole current row from the database before updating data values
BEGIN SELECT * INTO r_activities FROM activities WHERE activity_id = p_activity_id FOR UPDATE; EXCEPTION WHEN no_data_found THEN r_activities.activity_id := p_activity_id; END; … j_obj := JSON_OBJECT_T.parse(l_clob); r_activities.activity_id := j_obj.get_number('id'); r_activities.athlete_id := j_obj.get_object('athlete').get_number('id'); r_activities.start_date_utc := iso8601_utc(j_obj.get_string('start_date')); r_activities.start_date_local := iso8601_tz(j_obj.get_string('start_date_local'), j_obj.get_string('timezone')); … r_activities.distance_km := j_obj.get_number('distance')/1000; r_activities.gear_id := j_obj.get_string('gear_id'); IF r_activities.type IN('Ride','Walk','Hike','VirtualRide','Run') THEN j_subobj := j_obj.get_object('gear'); IF j_subobj IS NOT NULL THEN r_activities.gear_name := j_subobj.get_string('name'); END IF; END IF; … r_activities.photo_count := j_obj.get_object('photos').get_number('count'); …Then the entire row can be inserted or updated at the end from the row-type variable.
BEGIN INSERT INTO activities VALUES p_activities; dbms_output.put_line(sql%rowcount||' activity inserted'); COMMIT; EXCEPTION WHEN DUP_VAL_ON_INDEX THEN UPDATE activities SET ROW = p_activities WHERE activity_id = p_activities.activity_id; dbms_output.put_line(sql%rowcount||' activity updated'); COMMIT; END;
Extract Values from JSON in Virtual Columns
The other option is to store the JSON in a CLOB column in the database and convert it on demand via virtual columns. Whenever I log a new Strava activity or update, or delete an existing activity, I have subscribed to receive a message from Strava. That message is received by the database using a REST service.
The message from Strava just tells me that an activity has been created, updated or deleted. Then I have to process it. Sometimes, I get multiple messages for the same activity in quick succession.
{ "aspect_type": "update", "event_time": 1516126040, "object_id": 1360128428, "object_type": "activity", "owner_id": 134815, "subscription_id": 120475, "updates": { "title": "Messy" } }Strava requires that the REST service respond within 2 seconds, so any processing in it must be kept light. I want to avoid:
- spending time converting the JSON data while the REST service is running,
- any malformed or unexpected variation in JSON causing an error in the REST service,
- concurrent processing of different requests relating to the same activity causing one REST service handler to block another.
Therefore, my REST service just stores the JSON in a CLOB column on a table and then triggers a scheduler job to process the message. The subsequent processing needs to access the name-values in the JSON, so I have created virtual columns on the queue table that will only be evaluated on demand.
CREATE TABLE webhook_events (ID NUMBER GENERATED ALWAYS AS IDENTITY ,PAYLOAD CLOB ,processing_status NUMBER DEFAULT 0 NOT NULL … ,CONSTRAINT webhook_events_pk PRIMARY KEY (id) ); ALTER TABLE webhook_events ADD aspect_type GENERATED ALWAYS AS (JSON_VALUE(payload, '$."aspect_type"')) VIRTUAL; ALTER TABLE webhook_events ADD object_type GENERATED ALWAYS AS (JSON_VALUE(payload, '$."object_type"')) VIRTUAL; ALTER TABLE webhook_events ADD object_id NUMBER GENERATED ALWAYS AS (JSON_VALUE(payload, '$."object_id"' )) VIRTUAL;In Strava, times are held in Unix 'Epoch Time' (the number of non-leap seconds since midnight UTC on 1st Jan 1970). I have created a deterministic PL/SQL function to convert it to an Oracle timestamp and have referenced it in my virtual column definition.
One virtual column cannot reference another. So, I could not reference the virtual column EVENT_TIME in another virtual column EVENT_TIMESTAMP. Instead, I had to reference the event_time name-value pair in both column definitions.
CREATE OR REPLACE FUNCTION strava.epoch_to_tstz (p_epoch_seconds IN NUMBER ) RETURN TIMESTAMP DETERMINISTIC IS BEGIN RETURN TO_TIMESTAMP_TZ('1970-01-01 00:00:00 UTC', 'YYYY-MM-DD HH24:MI:SS TZR') + NUMTODSINTERVAL(p_epoch_seconds, 'SECOND'); END epoch_to_tstz; / ALTER TABLE webhook_events ADD event_time NUMBER GENERATED ALWAYS AS (JSON_VALUE(payload, '$."event_time"' )) VIRTUAL; ALTER TABLE webhook_events ADD event_timestamp TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS (epoch_to_tstz(JSON_VALUE(payload, '$."event_time"'))) VIRTUAL; …If the message is an update, it contains a JSON object listing the updated items and their new values. The column updates contains this JSON document.
ALTER TABLE webhook_events ADD updates GENERATED ALWAYS AS (JSON_QUERY(payload, '$."updates"' )) VIRTUAL;
I can now reference the virtual columns in SQL in the queue handler
without converting and storing the values in regular columns.…
FOR i IN ( --interate requests
SELECT h.*, a.activity_id
FROM webhook_events h
LEFT OUTER JOIN activities a ON a.activity_id = h.object_id
WHERE h.processing_status = 0
AND h.object_type = 'activity'
ORDER BY h.id
FOR UPDATE OF h.processing_status
) LOOP
…
No comments :
Post a Comment