Thursday, February 18, 2021

Spatial Data 3. Analyse a track in proximity to a GPS route

This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava.  All of my files are available on GitHub.

Swain's Lane, Highgate
Now I have loaded some data, I am going to start to do something useful with it.  I go out on my bike most mornings, and I usually ride up Swain's Lane in Highgate three times.  How long did each one take?  Over time, have I got faster or slower?

I need a definition of Swain's Lane that I can compare to.  I will start by drawing a route with my favourite GPS software.  A route is just a sequence of route points.  I can then export that as a GPX file.

<?xml version="1.0" encoding="UTF-8"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="ViewRanger - //www.viewranger.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
  <rte>
    <name><![CDATA[Swain's World]]></name>
    <rtept lat="51.569613039632" lon="-0.14770468632509"></rtept>
    <rtept lat="51.569407978151" lon="-0.14832964102552"></rtept>
    <rtept lat="51.567090552402" lon="-0.14674177328872"></rtept>
    <rtept lat="51.567080548869" lon="-0.14592101733016"></rtept>
    <rtept lat="51.569618041121" lon="-0.14773419062425"></rtept>
  </rte>
</gpx>

Geometries Table

I will load the GPX route into a table much as I did with the track files. 
drop table my_geometries purge;

createg table my_geometries
(geom_id    NUMBER NOT NULL 
,descr      VARCHAR2(64)
,gpx        XMLTYPE
,geom       mdsys.sdo_geometry
,geom_27700 mdsys.sdo_geometry
,mbr        mdsys.sdo_geometry
,constraint my_geometries_pk PRIMARY KEY (geom_id)
)
XMLTYPE COLUMN gpx STORE AS SECUREFILE BINARY XML (CACHE DISABLE STORAGE IN ROW)
/
The difference is that I have a series of route points instead of track points, so the paths in extract() and extractvalue() are slightly different.
delete from my_geometries where geom_id = 2;
INSERT INTO my_geometries (geom_id, descr, gpx) 
VALUES (2,'Swains World Route', XMLTYPE(strava_pkg.getClobDocument('STRAVA','swainsworldroute.gpx')));

UPDATE my_geometries
SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
cast(multiset(
  select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
  from (
    SELECT /*+MATERIALIZE*/ rownum rn
    ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon')) as lng
    ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat')) as lat
    FROM   my_geometries g,
           TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t
    where g.geom_id = 2
    ) pt,
    (select 1 rn from dual union all select 2 from dual) n
	order by pt.rn, n.rn
  ) AS mdsys.sdo_ordinate_array))
WHERE gpx IS NOT NULL
AND   geom IS NULL
/
UPDATE my_geometries
SET mbr = sdo_geom.sdo_mbr(geom)
,   geom_27700 = sdo_cs.transform(geom,27700)
/

Commit;
Set pages 99 lines 180 
Select geom_id, descr, gpx, geom 
from my_geometries
where geom_id = 2;

   GEOM_ID DESCR
---------- ----------------------------------------------------------------
GPX
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GEOM(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         2 Swains World Route
<?xml version="1.0" encoding="US-ASCII"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="ViewRanger - //www.viewranger.com" xml
SDO_GEOMETRY(2002, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(-.14651114, 51.5670769, -.14649237, 51.567298, -.1465782, 51.567563, -.14680618, 51.5680165, -.14697
516, 51.5682533, -.14754379, 51.5688701, -.14807219, 51.5694887))
I am going to build spatial indexes on the geometry columns, so I need to define the upper and lower bound values on the coordinates.
delete from user_sdo_geom_metadata where table_name = 'ACTIVITIES';
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values ( 
  'ACTIVITIES' , 'GEOM_27700',
  sdo_dim_array(
    sdo_dim_element('Easting',-1000000,1500000,0.05), 
    sdo_dim_element('Northing', -500000,2000000,0.05)),
  27700);
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values ( 
  'ACTIVITIES' , 'GEOM',
  sdo_dim_array(
    sdo_dim_element('Longitude',-180,180,0.05), 
    sdo_dim_element('Latgitude',-90,90,0.05)),
  4326);
commit;

CREATE INDEX activities_geom ON ACTIVITIES (geom) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;
CREATE INDEX activities_geom_27700 ON ACTIVITIES (geom_27700) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;

Compare Geometries

Now I can compare my Swain's Lane geometry to my activity geometries.  Let's start by looking for rides in December 2020 that went up Swain's Lane
Column activity_id heading 'Activity|ID'
Column activity_name format a30
Column geom_relate heading 'geom|relate' format a6
With a as (
SELECT a.activity_id, a.activity_date, a.activity_name
,      SDO_GEOM.RELATE(a.geom,'anyinteract',g.geom,25) geom_relate
FROM   activities a
,      my_geometries g
WHERE  a.activity_type = 'Ride'
--And    a.activity_id IN(4468006769)
And    a.activity_date >= TO_DATE('01122020','DDMMYYYY')
and    g.geom_id = 2 /*Swains World Route*/
)
Select *
From   a
Where  geom_relate = 'TRUE'
Order by activity_date
/
Where there is a relation between the two geometries then I have a hit.
  Activity                                                    geom
        ID ACTIVITY_DATE       ACTIVITY_NAME                  relate
---------- ------------------- ------------------------------ ------
4419821750 08:44:45 02.12.2020 Loop                           TRUE
4428307816 10:49:25 04.12.2020 Loop                           TRUE
4431920358 09:41:13 05.12.2020 Loop                           TRUE
…
4528825613 09:39:38 28.12.2020 Loop                           TRUE
4534027888 11:29:45 29.12.2020 Loop                           TRUE
4538488655 09:57:55 30.12.2020 Loop                           TRUE

25 rows selected.
Updated 24.11.2021: But I also made a fundamental error here.  SDO_GEOM.RELATE is a spatial function and does not use the spatial index that I created.
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |      1 |        |    862 |00:00:29.88 |     118K|     20 |
|   1 |  SORT ORDER BY                |                  |      1 |     29 |    862 |00:00:29.88 |     118K|     20 |
|   2 |   NESTED LOOPS                |                  |      1 |     29 |    862 |00:00:23.49 |   93227 |     20 |
|   3 |    TABLE ACCESS BY INDEX ROWID| MY_GEOMETRIES    |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |
|*  4 |     INDEX UNIQUE SCAN         | MY_GEOMETRIES_PK |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
|*  5 |    TABLE ACCESS FULL          | ACTIVITIES       |      1 |     29 |    862 |00:00:23.49 |   93225 |     20 |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("G"."GEOM_ID"=2)
   5 - filter(("A"."ACTIVITY_TYPE"='Ride' AND "SDO_GEOM"."RELATE"("A"."GEOM",'anyinteract',"G"."GEOM",25)='TRUE'))
Instead, I should use the SDO_ANYINTERACT() spatial operator.
SELECT a.activity_id, a.activity_date, a.activity_name
,      a.distance_km
FROM   activities a
,      my_geometries g
WHERE  a.activity_type = 'Ride'
and    g.geom_id = 2 /*Swains World Route*/
AND    SDO_ANYINTERACT(a.geom, g.geom) = 'TRUE'
Order by activity_date
And now it uses the spatial index on ACTIVITIES.
-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                  |      1 |        |    824 |00:00:07.04 |   34602 |     14 |
|   1 |  SORT ORDER BY                    |                  |      1 |      1 |    824 |00:00:07.04 |   34602 |     14 |
|   2 |   NESTED LOOPS                    |                  |      1 |      1 |    824 |00:00:07.04 |   34602 |     14 |
|   3 |    TABLE ACCESS BY INDEX ROWID    | MY_GEOMETRIES    |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |
|*  4 |     INDEX UNIQUE SCAN             | MY_GEOMETRIES_PK |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
|*  5 |    TABLE ACCESS BY INDEX ROWID    | ACTIVITIES       |      1 |      1 |    824 |00:00:07.04 |   34600 |     14 |
|*  6 |     DOMAIN INDEX (SEL: 0.010000 %)| ACTIVITIES_GEOM  |      1 |        |    824 |00:00:07.03 |   33224 |     14 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("G"."GEOM_ID"=2)
   5 - filter("A"."ACTIVITY_TYPE"='Ride')
   6 - access("MDSYS"."SDO_ANYINTERACT"("A"."GEOM","G"."GEOM")='TRUE')

Analyse Individual Efforts

Now I want to analyse each of my trips up Swain's Lane on a particular day.  I am going to work with the GPX rather than the spatial geometry because I am interested also in time, elevation and heart rate data that is not stored in the spatial geometry.
Also, you can't use analytic functions on spatial geometries.
with x as (
SELECT activity_id
,      TO_DATE(EXTRACTVALUE(VALUE(t), 'trkpt/time'),'YYYY-MM-DD"T"HH24:MI:SS"Z"') time
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
FROM   activities a,
       TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
WHERE  a.activity_id IN(4468006769)
), y as (
select x.*, strava_pkg.make_point(lng,lat) loc
from x
)
select lag(loc,1) over (partition by activity_id order by time) last_loc
from   y
/

select lag(loc,1) over (partition by activity_id order by time) last_loc
           *
ERROR at line 13:
ORA-22901: cannot compare VARRAY or LOB attributes of an object type
Instead, I will have to apply analytic functions to the values extracted from the GPX and then create a spatial point.  Thus I will be able to calculate the length of each individual trip by aggregating the distance between each pair of points.
The following query splits out each trip up Swain's Lane in a particular activity and shows the distance, duration, and metrics about elevation, gradient, and heart rate. 
alter session set statistics_level=ALL;
alter session set nls_date_Format = 'hh24:mi:ss dd.mm.yyyy';
break on activity_id skip 1
compute sum of sum_dist on activity_id
compute sum of num_pt on activity_id
compute sum of sum_secs on activity_id
Set lines 180 pages 50 timi on
Column activity_id heading 'Activity|ID'
Column activity_name format a15
column time format a20
column lat format 999.99999999
column lng format 999.99999999
column ele format 9999.9
column hr format 999
column sdo_relate format a10
column num_pts heading 'Num|Pts' format 99999
column sum_dist heading 'Dist.|(km)' format 999.999
column sum_secs heading 'Secs' format 9999
column avg_speed heading 'Avg|Speed|(kmph)' format 99.9
column ele_gain heading 'Ele|Gain|(m)' format 9999.9
column ele_loss heading 'Ele|Loss|(m)' format 9999.9
column avg_grade heading 'Avg|Grade|%' format 99.9
column min_ele heading 'Min|Ele|(m)' format 999.9
column max_ele heading 'Max|Ele|(m)' format 999.9
column avg_hr heading 'Avg|HR' format 999
column max_hr heading 'Max|HR' format 999
WITH geo as ( /*route geometry to compare to*/
select /*MATERIALIZE*/ g.*, 25 tol
,      sdo_geom.sdo_length(geom, unit=>'unit=m') geom_length
from   my_geometries g
where  geom_id = 2 /*Swains World Route*/
), a as ( /*extract all points in activity*/
SELECT a.activity_id, g.geom g_geom, g.tol, g.geom_length
,      TO_DATE(EXTRACTVALUE(VALUE(t), 'trkpt/time'),'YYYY-MM-DD"T"HH24:MI:SS"Z"') time
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
FROM   activities a,
       geo g,
       TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'))) t
Where  a.activity_id IN(4468006769)
AND    SDO_ANYINTERACT(a.geom, g.geom) = 'TRUE' /*activity has relation to reference geometry*/
), b as ( /*smooth elevation*/
Select a.*
,      avg(ele) over (partition by activity_id order by time rows between 2 preceding and 2 following) avg_ele
From   a
), c as ( /*last point*/
Select b.*
,      row_number() over (partition by activity_id order by time) seq
,      lag(time,1) over (partition by activity_id order by time) last_time
,      lag(lat,1) over (partition by activity_id order by time) last_lat
,      lag(lng,1) over (partition by activity_id order by time) last_lng
--,      lag(ele,1) over (partition by activity_id order by time) last_ele
,      lag(avg_ele,1) over (partition by activity_id order by time) last_avg_ele
From   b
), d as ( /*make points*/
SELECT c.* 
,      strava_pkg.make_point(lng,lat) loc
,      strava_pkg.make_point(last_lng,last_lat) last_loc
FROM   c
), e as ( /*determine whether point is inside the polygon*/
select d.*
,      86400*(time-last_time) secs
,      avg_ele-last_avg_ele ele_diff
,      sdo_geom.sdo_distance(loc,last_loc,0.05,'unit=m') dist
,      SDO_GEOM.RELATE(loc,'anyinteract', g_geom, tol) sdo_relate
FROM   d
), f as (
select e.*
,      CASE WHEN sdo_relate != lag(sdo_relate,1) over (partition by activity_id order by time) THEN 1 END sdo_diff
from   e
), g as (
select f.*
,      SUM(sdo_diff) over (partition by activity_id order by time range between unbounded preceding and current row) sdo_seq
from f
where  sdo_relate = 'TRUE'
)
select activity_id, min(time), max(time)
, sum(dist)/1000 sum_dist
, sum(secs) sum_secs
, 3.6*sum(dist)/sum(secs) avg_speed
, sum(greatest(0,ele_diff)) ele_gain
, sum(least(0,ele_diff)) ele_loss
, 100*sum(ele_diff*dist)/sum(dist*dist) avg_grade
, min(ele) min_ele
, max(ele) max_ele
, sum(hr*secs)/sum(secs) avg_Hr
, max(hr) max_hr
, count(*) num_pts
from   g
group by activity_id, sdo_seq, g.geom_length
having sum(dist)>= g.geom_length/2 /*make sure line we find is longer than half route to prevent fragmentation*/
order by 2
/
select * from table(dbms_xplan.display_cursor(null,null,'ADVANCED +IOSTATS -PROJECTION +ADAPTIVE'))
/
4a_1swains.sql
  • In subquery a, I compare the geometry of the activity with the geometry of Swain's Lane using sdo_geom.relate() to confirm that the activity includes Swain's Lane, but then I extract all the points in the activity GPX.
  • GPS is optimised for horizontal accuracy.  Even so, the tolerance for determining whether the track is close to the route has to be set to 25m to allow for noise in the data (Swain's Lane is tree-lined, and has walls on both sides, that both attenuate the GPS signal).  GPS elevation data is notorious for being noisy even under good conditions; you can see this in the variation of height gained on each ascent.  Sub-query b calculates an average elevation across 5 track points (up to +/-2 points).  
  • I need to compare each point in the track to each previous point so I can do some calculations and determine when the track comes into proximity with the Swain's Lane route, subquery c uses analytic functions to determine the previous point.  It is not possible to apply the analytic function to a geometry.
  • Subquery e determines whether a track point is in proximity to the route.  The tolerance, 25m, is set in subquery geo.  Then subquery f flags where the track point is in proximity to the route and the previous one was not.  Finally, subquery g maintains a running total of the number of times the track has gone close enough to the route.  That becomes a sequence number for each ascent of Swain's Lane by which I can group the subsequent analytics.
                                                                     Avg     Ele     Ele   Avg    Min    Max
  Activity                                            Dist.        Speed    Gain    Loss Grade    Ele    Ele  Avg  Max   Num
        ID MIN(TIME)           MAX(TIME)               (km)  Secs (kmph)     (m)     (m)     %    (m)    (m)   HR   HR   Pts
---------- ------------------- ------------------- -------- ----- ------ ------- ------- ----- ------ ------ ---- ---- -----
4468006769 14:55:51 13.12.2020 14:58:17 13.12.2020     .372   147    9.1    36.1      .0   8.6   86.8  122.7  141  153   147
           15:08:13 13.12.2020 15:10:28 13.12.2020     .374   136    9.9    36.2      .0   8.4   86.8  122.8  147  155   136
           15:22:49 13.12.2020 15:25:18 13.12.2020     .369   150    8.9    36.2      .0   8.2   86.8  122.7  147  155   150
**********                                         -------- -----
sum                                                   1.116   433
On my laptop, this query takes about 10s, of which about 8s is spent on the window sort for the analytic functions, and 2s is spent working out whether the track points are in proximity to the route.
Plan hash value: 3042349692

---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name             | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                  |      2 |        |       |       |  5147 (100)|          |      6 |00:00:20.94 |     392 |
|   1 |  SORT ORDER BY                           |                  |      2 |      1 |   104 |       |  5147   (1)| 00:00:01 |      6 |00:00:20.94 |     392 |
|*  2 |   FILTER                                 |                  |      2 |        |       |       |            |          |      6 |00:00:20.94 |     392 |
|   3 |    HASH GROUP BY                         |                  |      2 |      1 |   104 |       |  5147   (1)| 00:00:01 |      6 |00:00:20.94 |     392 |
|   4 |     VIEW                                 |                  |      2 |   8168 |   829K|       |  5144   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|   5 |      WINDOW SORT                         |                  |      2 |   8168 |    16M|    21M|  5144   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|*  6 |       VIEW                               |                  |      2 |   8168 |    16M|       |  1569   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|   7 |        WINDOW SORT                       |                  |      2 |   8168 |  1403K|  1688K|  1569   (1)| 00:00:01 |  10208 |00:00:09.63 |     392 |
|   8 |         VIEW                             |                  |      2 |   8168 |  1403K|       |  1252   (1)| 00:00:01 |  10208 |00:00:06.22 |     392 |
|   9 |          WINDOW SORT                     |                  |      2 |   8168 |  1021K|  1248K|  1252   (1)| 00:00:01 |  10208 |00:00:06.20 |     392 |
|  10 |           VIEW                           |                  |      2 |   8168 |  1021K|       |  1016   (1)| 00:00:01 |  10208 |00:00:06.05 |     392 |
|  11 |            WINDOW SORT                   |                  |      2 |   8168 |  4546K|  5040K|  1016   (1)| 00:00:01 |  10208 |00:00:00.76 |     392 |
|  12 |             NESTED LOOPS                 |                  |      2 |   8168 |  4546K|       |    31   (0)| 00:00:01 |  10208 |00:00:00.41 |     392 |
|  13 |              NESTED LOOPS                |                  |      2 |      1 |   560 |       |     2   (0)| 00:00:01 |      2 |00:00:00.03 |     104 |
|  14 |               TABLE ACCESS BY INDEX ROWID| MY_GEOMETRIES    |      2 |      1 |   112 |       |     1   (0)| 00:00:01 |      2 |00:00:00.01 |       4 |
|* 15 |                INDEX UNIQUE SCAN         | MY_GEOMETRIES_PK |      2 |      1 |       |       |     0   (0)|          |      2 |00:00:00.01 |       2 |
|* 16 |               TABLE ACCESS BY INDEX ROWID| ACTIVITIES       |      2 |      1 |   448 |       |     1   (0)| 00:00:01 |      2 |00:00:00.03 |     100 |
|* 17 |                INDEX UNIQUE SCAN         | ACTIVITIES_PK    |      2 |      1 |       |       |     0   (0)|          |      2 |00:00:00.01 |       4 |
|  18 |              XPATH EVALUATION            |                  |      2 |        |       |       |            |          |  10208 |00:00:00.37 |     288 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SUM("DIST")>="G"."GEOM_LENGTH"/2)
   6 - filter("SDO_RELATE"='TRUE')
  15 - access("GEOM_ID"=2)
  16 - filter("MDSYS"."SDO_ANYINTERACT"("A"."GEOM","G"."GEOM")='TRUE')
  17 - access("A"."ACTIVITY_ID"=4468006769)
I can apply this approach to all my trips up Swain's Lane.  However, I have logged 1115 ascents, and if I attempt to process them in a single SQL query I will have to do some very large window sorts that will spill out of memory (at least they will on my machine).  Instead, it is faster to process each activity separately in a PL/SQL loop (see 4x_allswains2.sql).
I now have a table containing all of my ascents of Swain's Lane and I can see if I am getting faster or slower.  I simply dumped the data into Excel with SQL Developer.  
Unfortunately, I have discovered that I am not going faster!

Friday, February 12, 2021

Spatial Data 2: Convert GPX Track to a Spatial Line Geometry

This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava.  All of my files are available on GitHub.

Having loaded my GPS tracks from GPX files into an XML type column, the next stage is to extract the track points and create a spatial geometry column.  

Defining Spatial Geometries

Spatial objects are generically referred to as geometries.  When you define one, you have to specify what kind of geometry it is, and what coordinate system you are using. Later when you compare geometries to each other they have to use the same coordinate system. Otherwise, Oracle will raise an error.  Fortunately, Oracle can convert between coordinate systems.

Various coordinate systems are used for geographical data, they are given EPSG Geodetic Parameter Dataset codes.  Oracle supports various coordinate systems.  As well as older definitions, it also has current definitions where the ESPG code matches the Spatial Reference ID (SDO_SRID).  They can be queried from SDO_COORD_REF_SYS.

I will use two different coordinate systems during this series of blogs

Set lines 150 pages 99
Column coord_ref_sys_name format a35
Column legacy_cs_bounds format a110
select srid, coord_ref_sys_name, coord_ref_sys_kind, legacy_cs_bounds 
from SDO_COORD_REF_SYS where srid IN(4326, 27700)
/
      SRID COORD_REF_SYS_NAME                  COORD_REF_SYS_KIND
---------- ----------------------------------- ------------------------
LEGACY_CS_BOUNDS(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
--------------------------------------------------------------------------------------------------------------
      4326 WGS 84                              GEOGRAPHIC2D
SDO_GEOMETRY(2003, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 3), SDO_ORDINATE_ARRAY(-180, -90, 180, 90))

     27700 OSGB 1936 / British National Grid   PROJECTED

  • "The World Geodetic System (WGS) is a standard for use in cartography, geodesy, and satellite navigation including GPS". The latest revision is WGS 84 (also known as WGS 1984, EPSG:4326). It is the reference coordinate system used by the Global Positioning System (GPS).  Where I am dealing with longitude and latitude, specified in degrees, especially from GPS data, I need to tell Oracle that it is WGS84 by specifying SDO_SRID of 4326.
  • Later on, I will also be using data for Great Britain available from the Ordnance Survey that uses the Ordnance Survey National Grid (also known as British National Grid) reference system.  That requires SDO_SRID to be set to 27700.

See also:

Creating Spatial Points

I have found it useful to create a packaged function to convert longitude and latitude to a spatial data point.  It is a useful shorthand that I use in various places.

create or replace package body strava_pkg as 
k_module  CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;
…
----------------------------------------------------------------------------------------------------
function make_point 
(longitude in number
,latitude  in number)
return sdo_geometry deterministic is
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
begin
  dbms_application_info.read_module(module_name=>l_module
                                   ,action_name=>l_action);
  dbms_application_info.set_module(module_name=>k_module
                                  ,action_name=>'make_point');

  if longitude is not null and latitude is not null then
    return
      sdo_geometry (
        2001, 4326,
        sdo_point_type (longitude, latitude, null),
        null, null
      );
  else
    return null;
  end if;

  dbms_application_info.set_module(module_name=>l_module
                                  ,action_name=>l_action);
end make_point;
----------------------------------------------------------------------------------------------------
END strava_pkg;
/

strava_pkg.sql

There are two parameters to SDO_GEOMETRY that I always have to specify.

  • The first parameter, SDO_GTYPE, describes the natures of the spatial geometry being defined.  Here it is 2001.  The 2 indicates that it is a 2-dimensional geometry, and the 1 indicates that it is a single point.  See SDO_GEOMETRY Object Type
  • The second parameter, SDO_SRID, defines the coordinate system that I discussed above.  4326 indicates that I am working with longitude and latitude.

XML Namespace

GPS data is often held in GPX or GPS Exchange Format.  This is an XML schema.  GPX has been the de-facto XML standard for the lightweight interchange of GPS data since the initial GPX 1.0 release in 2002.  The GPX 1.1 schema was released in 2004 (see https://www.topografix.com/gpx.asp).  

Garmin has created an extension schema that holds additional athlete training information such as heart rate.

I can extract individual track points from a GPX with SQL using the extract() and extractvalue() functions.  However, I have GPX tracks that use both versions of the Topographix GPX schema (it depends on upon which piece of software emitted the GPX file), and some that also use the Garmin extensions.  

Therefore, I need to register all three schemas with Oracle.  I can download the schema files with wget.

cd /tmp/strava
wget http://www.topografix.com/GPX/1/0/gpx.xsd --output-document=gpx0.xsd
wget http://www.topografix.com/GPX/1/1/gpx.xsd
wget https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd

Then I can register the files 

delete from plan_table WHERE statement_id = 'XSD';
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 1, 'gpx0.xsd', 'http://www.topografix.com/GPX/1/0/gpx.xsd');
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 2, 'gpx.xsd', 'http://www.topografix.com/GPX/1/1/gpx.xsd');
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 3, 'TrackPointExtensionv1.xsd', 'https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd');

DECLARE
  xmlSchema xmlType;
  res       boolean;
BEGIN
  FOR i IN (
    SELECT object_alias schemaURL
    ,      object_name  schemaDoc
    FROM   plan_table
    WHERE  statement_id = 'XSD'
    ORDER BY plan_id
  ) LOOP
    --read xsd file
    xmlSchema := XMLTYPE(getCLOBDocument('STRAVA',i.schemaDoc,'AL32UTF8'));
    --if already exists delete XSD
    if (dbms_xdb.existsResource(i.schemaDoc)) then
        dbms_xdb.deleteResource(i.schemaDoc);
    end if;
    --create resource from XSD
    res := dbms_xdb.createResource(i.schemaDoc,xmlSchema);

    -- Delete existing  schema
    dbms_xmlschema.deleteSchema(
      i.schemaURL
    );
    -- Now reregister the schema
    dbms_xmlschema.registerSchema(
      i.schemaURL,
      xmlSchema,
      TRUE,TRUE,FALSE,FALSE
    );
  END LOOP;
End;
/
3a_register_xml_schema.sql

Then I can query the registered schemas.

Set pages 99 lines 160
Column schema_url format a60
Column qual_schema_url format a105
select schema_url, local, hier_type, binary, qual_schema_url
from user_xml_schemas
/

SCHEMA_URL                                                   LOC HIER_TYPE   BIN
------------------------------------------------------------ --- ----------- ---
QUAL_SCHEMA_URL
---------------------------------------------------------------------------------------------------------
https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd

http://www.topografix.com/GPX/1/0/gpx.xsd                    YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/0/gpx.xsd

http://www.topografix.com/GPX/1/1/gpx.xsd                    YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/1/gpx.xsd

Extracting GPS Track Points from GPX

A GPS track is a list of points specifying at least time, longitude, latitude and often elevation.  I can extract all the points in a GPX as a set of rows.  However, I must specify the correct namespace for the specific GPX.

Column time_string format a20
SELECT g.activity_id
,      EXTRACTVALUE(VALUE(t), 'trkpt/time') time_string
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
 FROM activities g,
      TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/trk/trkseg/trkpt'
      ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'
      ))) t
Where  activity_id IN(4468006769)
And rownum <= 10
/

  Activity
        ID TIME_STRING                    LAT           LNG     ELE   HR
---------- -------------------- ------------- ------------- ------- ----
4468006769 2020-12-13T14:31:13Z   51.52963800    -.18753600    30.6   57
           2020-12-13T14:31:14Z   51.52963500    -.18753400    30.6   57
           2020-12-13T14:31:15Z   51.52964100    -.18753100    30.6   57
           2020-12-13T14:31:16Z   51.52964000    -.18752900    30.6   57
           2020-12-13T14:31:17Z   51.52963600    -.18752700    30.6   57
           2020-12-13T14:31:18Z   51.52963200    -.18752700    30.6   57
           2020-12-13T14:31:19Z   51.52962900    -.18752800    30.6   57
           2020-12-13T14:31:20Z   51.52962800    -.18752800    30.6   57
           2020-12-13T14:31:21Z   51.52962800    -.18752900    30.6   57
           2020-12-13T14:31:22Z   51.52962800    -.18753000    30.6   57

I can use this approach to extract all the points from a GPS track and create a spatial line geometry.  I have put the whole process into a packaged procedure strava_pkg.load_activity.

First I need to work out which version of the Topographix schema is in use.  So I can try extracting the creator name with each and see which is not null.

…
IF l_num_rows > 0 THEN
  UPDATE activities
  SET    gpx = XMLTYPE(l_gpx), geom = null, geom_27700 = null, num_pts = 0, xmlns = NULL
  WHERE  activity_id = p_activity_id
  RETURNING extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/0"') 
  ,         extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/1"') 
  INTO      l_xmlns0, l_xmlns1;
  l_num_rows := SQL%rowcount;
END IF;
…

Now I can extract all the points in a GPX as a set of rows and put them into a spatial geometry.  I turn each row with two coordinates into two rows with one point each.  Note that longitude is listed before latitude for each point.  I convert the rows into a list using multiset() and finally cast that as a spatial ordinate array. 

Note that the SDO_GTYPE is 2002 (rather than 2001) because it is a line (rather than a single point) on a two-dimensional coordinate system.

  BEGIN
    UPDATE activities a
    SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
    cast(multiset(
      select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
      from (
        SELECT rownum rn
        ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) as lng
        ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) as lat
        FROM   TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt', 'xmlns="http://www.topografix.com/GPX/1/1"'))) t
        ) pt,
        (select 1 rn from dual union all select 2 from dual) n
	    order by pt.rn, n.rn
      ) AS mdsys.sdo_ordinate_array))
    , xmlns = 'xmlns="http://www.topografix.com/GPX/1/1"'
    WHERE  a.gpx IS NOT NULL
    And    activity_id = p_activity_id;
    l_num_rows := SQL%rowcount;
  EXCEPTION
    WHEN e_13034 OR e_29877 THEN 
	  dbms_output.put_line('Exception:'||sqlerrm);
	  l_num_rows := 0;
  END;

I have found it helpful to simplify the line geometry with sdo_util.simplify(). It removes some of the noise in the GPS data and has resolved problems with calculating the length of lines that intersect with areas.

  BEGIN
    UPDATE activities 
    SET    geom = sdo_util.simplify(geom,1)
    WHERE  geom IS NOT NULL
    And    activity_id = p_activity_id;
    l_num_rows := SQL%rowcount;
  EXCEPTION
    WHEN e_13034 THEN 
	  dbms_output.put_line('Exception:'||sqlerrm);
  END;

There are a few other fields I also update at this point.  You will see me use them later.

  • NUM_PTS is the number of points in the line geometry.  
  • GEOM_27700 is the result of converting the line to British National Grid reference coordinates.  This helps when comparing it to British boundary data obtained from the Ordnance Survey or other government agencies.
  • MBR is the minimum bounding rectangle for the line.  This is generated to enable me to improve the performance of some spatial queries.  I have found some of the spatial operators to calculate intersections between geometries are quite slow and CPU intensive when applied to GPS tracks and boundary data that both have lots of points.  SDO_GEOM.SDO_MBR simply returns 4 ordinates that define the bounding rectangle.  This can be used to roughly match geometries that might match before doing a proper match.

  UPDATE activities 
  SET    num_pts = SDO_UTIL.GETNUMVERTICES(geom)
  ,      geom_27700 = sdo_cs.transform(geom,27700)
  ,      mbr = sdo_geom.sdo_mbr(geom)
  WHERE  geom IS NOT NULL
  And    activity_id = p_activity_id
  RETURNING num_pts INTO l_num_pts;
  dbms_output.put_line('Activity ID:'||p_activity_id||', '||l_num_pts||' points');
…

Now I can load each GPX and process it into a spatial geometry in one step.  I can process all of the activities in a simple loop.

set serveroutput on timi on
exec strava_pkg.load_activity(4468006769);
Loading Activity: 4468006769
ACTIVITIES/4468006769.gpx.gz - 1286238 bytes
xmlns 1=StravaGPX Android
Activity ID:4468006769, 998 points

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.41
Now my Strava activities are all in spatial geometries and I can start to do some spatial processing.

Wednesday, February 03, 2021

Spatial Data 1: Loading GPX data into XML data types

This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava. All of my files are available on GitHub.

In these posts, I have only shown extracts of some of the scripts I have written.  The full files are available on github.

Upload and Expand Strava Bulk Export

Strava will bulk export all your data to a zipped folder.  It contains various CSV files.  I am interested in activities.csv that contains a row for each activity with various pieces of data including the name of the data file that can be found in the /activities directory.  That file will usually be a .gpx, or it may be zipped as a .gpx.gz file.  GPX is an XML schema that contains sets of longitude/latitude coordinates and may contain other attributes.  

The first job is to upload the Strava export .zip file to somewhere accessible to the database server (in my case /vagrant) and to expand it (to /tmp/strava/).

cd /vagrant
mkdir /tmp/strava
unzip /vagrant/export_1679301.zip -d /tmp/strava

Create Strava Schema 

I need to create a new database schema to hold the various objects I will create, and I have to give it certain privileges.
connect / as sysdba
create user strava identified by strava;
grant connect, resource to strava;
grant create view to strava;
grant select_catalog_role to strava;
grant XDBADMIN to STRAVA;
grant alter session to STRAVA;
alter user strava quota unlimited on users;
alter user strava default tablespace users;

GRANT CREATE ANY DIRECTORY TO strava;
CREATE OR REPLACE DIRECTORY strava as '/tmp/strava';
CREATE OR REPLACE DIRECTORY activities as '/tmp/strava/activities';
CREATE OR REPLACE DIRECTORY exec_dir AS '/usr/bin';

GRANT READ, EXECUTE ON DIRECTORY exec_dir TO strava;
GRANT READ, EXECUTE ON DIRECTORY strava TO strava;
GRANT READ ON DIRECTORY activities TO strava;
  • I need to create database directories for both the CSV files in /tmp/strava and the various GPX files in the /tmp/strava/activities sub-directory.  I will need read privilege on both directories, and also execute privilege on the strava directory so that I can use a pre-processor script.
  • The exec_dir directory points to /usr/bin where the zip executables are located.  I need read and execute privilege on this so I can read directly from zipped files.
  • XDBADMIN: "Allows the grantee to register an XML schema globally, as opposed to registering it for use or access only by its owner. It also lets the grantee bypass access control list (ACL) checks when accessing Oracle XML DB Repository".

Import CSV file via an External Table

I will start by creating an external table to read the Strava activities.csv file, and then copy it into a database table.  This file is a simple comma-separated variable file.  The activity date, name and description are enclosed in double-quotes.  
The first problem that I encountered was that some of the descriptions I typed into Strava contain newline characters and the external table interprets them as the end of the record even though these characters are inside the double-quotes.
4380927517,"23 Nov 2020, 18:03:54",Zwift Crash Recovery,Virtual Ride,"Zwift Crash Recovery
1. recover fit file per https://zwiftinsider.com/retrieve-lost-ride/, 
2. fix corrupt .fit file with https://www.fitfiletools.com",1648,13.48,,false,Other,activities/4682540615.gpx.gz,,10.0,1648.0,1648.0,13480.2001953125,13.199999809265137,
8.179733276367188,91.0,36.20000076293945,12.600000381469727,69.5999984741211,7.099999904632568,0.40652215480804443,,,84.0,62.1943244934082,
,,,150.66201782226562,276.8444519042969,,,,,,,,,,,,158.0,1649.0,,,0.0,,1.0,,,,,,,,,,,,,,,,4907360.0,,,,,,,,,,,
As Chris Saxon points out on AskTom, it is necessary to pre-process the records to replace the newline characters with something else.  I found this awk script to process the record.  So I put it into a shell script nlfix.sh, made it executable and invoked as a pre-processor in the external table definition.
#nlfix.sh
/usr/bin/gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' $*
nlfix.sh
  • Note the full path for gawk is specified.
A database directory is needed for the location of the pre-processor scripts and it is necessary to grant read and execute privileges on it.  I simply put the pre-processor in the same directory as the CSV file so I could use the same strava directory I created earlier.
GRANT READ, EXECUTE ON DIRECTORY strava TO strava;
Now I can define an external table that will read the activities.csv file. 
CREATE TABLE strava.activities_ext
(Activity_ID NUMBER
,Activity_Date DATE
,Activity_Name VARCHAR2(100)
,Activity_Type VARCHAR2(15)
,Activity_Description VARCHAR2(200)
,Elapsed_Time NUMBER
,Distance_km NUMBER
…)
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
 DEFAULT DIRECTORY strava
 ACCESS PARAMETERS 
 (RECORDS DELIMITED BY newline 
  SKIP 1
  DISABLE_DIRECTORY_LINK_CHECK
  PREPROCESSOR strava:'nlfix.sh' 
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' RTRIM
  MISSING FIELD VALUES ARE NULL
  REJECT ROWS WITH ALL NULL FIELDS
  NULLIF = BLANKS
(Activity_ID,Activity_Date date "DD Mon yyyy,HH24:mi:ss"
,Activity_Name,Activity_Type,Activity_Description
,Elapsed_Time,Distance_km
…))
LOCATION ('activities.csv')
) REJECT LIMIT 5
/

Import Activities

Now I can simply copy from the external table to a regular table.  I have omitted a lot of columns that Strava does not populate (at least not in my export) but that appear in the CSV file.
rem 1b_create_activities_ext.sql
spool 1b_create_activities_ext 

CREATE TABLE strava.activities AS
select ACTIVITY_ID,ACTIVITY_DATE,ACTIVITY_NAME,ACTIVITY_TYPE,ACTIVITY_DESCRIPTION,
ELAPSED_TIME,DISTANCE_KM,RELATIVE_EFFORT,COMMUTE_CHAR,ACTIVITY_GEAR,
FILENAME,
ATHLETE_WEIGHT,BIKE_WEIGHT,ELAPSED_TIME2,MOVING_TIME,DISTANCE_M,MAX_SPEED,AVERAGE_SPEED,
ELEVATION_GAIN,ELEVATION_LOSS,ELEVATION_LOW,ELEVATION_HIGH,MAX_GRADE,AVERAGE_GRADE,
--AVERAGE_POSITIVE_GRADE,AVERAGE_NEGATIVE_GRADE,
MAX_CADENCE,AVERAGE_CADENCE,
--MAX_HEART_RATE,
AVERAGE_HEART_RATE,
--MAX_WATTS,
AVERAGE_WATTS,CALORIES,
--MAX_TEMPERATURE,AVERAGE_TEMPERATURE,
RELATIVE_EFFORT2,
TOTAL_WORK,
--NUMBER_OF_RUNS,
--UPHILL_TIME,DOWNHILL_TIME,OTHER_TIME,
PERCEIVED_EXERTION,
--TYPE,
--START_TIME,
WEIGHTED_AVERAGE_POWER,POWER_COUNT,
PREFER_PERCEIVED_EXERTION,PERCEIVED_RELATIVE_EFFORT,
COMMUTE,
--TOTAL_WEIGHT_LIFTED,
FROM_UPLOAD,
GRADE_ADJUSTED_DISTANCE,
--WEATHER_OBSERVATION_TIME,WEATHER_CONDITION,
--WEATHER_TEMPERATURE,APPARENT_TEMPERATURE,
--DEWPOINT,HUMIDITY,WEATHER_PRESSURE,
--WIND_SPEED,WIND_GUST,WIND_BEARING,
--PRECIPITATION_INTENSITY,
--SUNRISE_TIME,SUNSET_TIME,MOON_PHASE,
BIKE
--GEAR,
--PRECIPITATION_PROBABILITY,PRECIPITATION_TYPE,
--CLOUD_COVER,WEATHER_VISIBILITY,UV_INDEX,WEATHER_OZONE,
--JUMP_COUNT,TOTAL_GRIT,AVG_FLOW,
--FLAGGED
FROM strava.activities_ext
/

ALTER TABLE activities ADD CONSTRAINT activities_pk PRIMARY KEY (activity_id);
…
ALTER TABLE activities ADD (gpx XMLTYPE) XMLTYPE COLUMN gpx STORE AS SECUREFILE BINARY XML (CACHE DISABLE STORAGE IN ROW);
ALTER TABLE activities ADD (geom mdsys.sdo_geometry));
ALTER TABLE activities ADD (geom_27700 mdsys.sdo_geometry));
ALTER TABLE activities ADD (mbr mdsys.sdo_geometry));
ALTER TABLE activities ADD (xmlns VARCHAR2(128));
ALTER TABLE activities ADD (num_pts INTEGER DEFAULT 0);

Spool off
  • I have specified a primary key on activity_id and made a number of other columns not nullable.
  • I have added a new XMLTYPE column GPX into which I will load the GPS data in the .gpx files.  

FIT files

Some applications, such as Garmin and Rouvy generate compressed .fit files, and Strava exports them again (apparently if it can't convert them, although it can convert the .fit files from Zwift to .gpx).  These are binary files, and since I only have a few of them, I have converted them to .gpx files using GPSBabel on my laptop, and then I reuploaded the .gpx files.
for %i in (*.fit.gz) do "C:\Program Files\GnuWin\bin\gzip" -fd %i
for %i in (*.fit) do "C:\Program Files (x86)\GPSBabel\GPSBabel.exe" -i garmin_fit -f "%i" -o gpx -F "%~ni".gpx
I then update the file name in the activities table.
UPDATE activities
SET filename = REPLACE(filename,'.fit.gz','.gpx')
WHERE filename like '%.fit.gz'
/

Compress GPX files (optional)

Some of the GPX files in the Strava export are compressed and some are not.  There is no obvious reason why.  To minimise the space I can gzip the GPX files.
gzip -9v /tmp/strava/activities/*.gpx
If I do compress any .gpx files, then I also need to update the file names in the activities table.
UPDATE activities
Set filename = filename||'.gz'
Where filename like '%.gpx'
/

Load the GPX files into the XML data type.

The next stage is to load each of the GPX files into the activities table.  
create or replace package body strava_pkg as 
k_module      CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;
…
----------------------------------------------------------------------------------------------------
function getClobDocument
(p_directory IN VARCHAR2
,p_filename  IN VARCHAR2
,p_charset   IN VARCHAR2 DEFAULT NULL
) return        CLOB deterministic
is
  l_module VARCHAR2(64); 
  l_action VARCHAR2(64);

  v_filename      VARCHAR2(128);
  v_directory     VARCHAR2(128);
  v_file          bfile;
  v_unzipped      blob := empty_blob();

  v_Content       CLOB := ' ';
  v_src_offset    number := 1 ;
  v_dst_offset    number := 1 ;
  v_charset_id    number := 0;
  v_lang_ctx      number := DBMS_LOB.default_lang_ctx;
  v_warning       number;

  e_22288 EXCEPTION; --file or LOB operation FILEOPEN failed
  PRAGMA EXCEPTION_INIT(e_22288, -22288);
BEGIN
  dbms_application_info.read_module(module_name=>l_module
                                   ,action_name=>l_action);
  dbms_application_info.set_module(module_name=>k_module
                                  ,action_name=>'getClobDocument');

  IF p_charset IS NOT NULL THEN
    v_charset_id := NLS_CHARSET_ID(p_charset);
  END IF;

  v_filename  := REGEXP_SUBSTR(p_filename,'[^\/]+',1,2);
  v_directory := REGEXP_SUBSTR(p_filename,'[^\/]+',1,1);

  IF v_directory IS NOT NULL and v_filename IS NULL THEN /*if only one parameters then it is actually a filename*/
    v_filename := v_directory; 
    v_directory := '';
  END IF;

  IF p_directory IS NOT NULL THEN
    v_directory := p_directory;
  END IF;

  v_File := bfilename(UPPER(v_directory),v_filename);

  BEGIN
    DBMS_LOB.fileopen(v_File, DBMS_LOB.file_readonly);
  exception 
    when VALUE_ERROR OR e_22288 then
      dbms_output.put_line('Can''t open:'||v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
      v_content := '';
      dbms_application_info.set_module(module_name=>l_module
                                      ,action_name=>l_action);
      return v_content;
  END;

  IF v_filename LIKE '%.gz' THEN
    v_unzipped := utl_compress.lz_uncompress(v_file);
    dbms_lob.converttoclob(
      dest_lob     => v_content,
      src_blob     => v_unzipped,
      amount       => DBMS_LOB.LOBMAXSIZE, 
      dest_offset  => v_dst_offset,
      src_offset   => v_src_offset,
      blob_csid    => dbms_lob.default_csid,
      lang_context => v_lang_ctx,
      warning      => v_warning);
  ELSE --ELSIF v_filename LIKE '%.g__' THEN
    DBMS_LOB.LOADCLOBFROMFILE(v_Content, 
      Src_bfile    => v_File,
      amount       => DBMS_LOB.LOBMAXSIZE, 
      src_offset   => v_src_offset, 
      dest_offset  => v_dst_offset,
      bfile_csid   => v_charset_id, 
      lang_context => v_lang_ctx,
      warning => v_warning);
  END IF;

  dbms_output.put_line(v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
  DBMS_LOB.fileclose(v_File);

  dbms_application_info.set_module(module_name=>l_module
                                  ,action_name=>l_action);

  return v_Content;
exception when others then
  dbms_output.put_line(v_directory||'/'||v_filename||' - '||v_dst_offset||' bytes');
  DBMS_LOB.fileclose(v_File);
  dbms_application_info.set_module(module_name=>l_module
                                  ,action_name=>l_action);
  raise;
end getClobDocument;
----------------------------------------------------------------------------------------------------
…
END strava_pkg;
/
I can simply query the contents of the uncompressed GPX file in SQL by calling the function.  In this case, the zipped .gpx file is 65K but decompresses to 1.2Mb.
Set long 1000 lines 200 pages 99 serveroutput on
Column filename  format a30
Column gpx format a100
select activity_id, filename
, getClobDocument('',filename) gpx
from activities
where filename like '%.gpx%'
And activity_id = 4468006769
order by 1
/

ACTIVITY_ID FILENAME                       GPX
----------- ------------------------------ ----------------------------------------------------------------------------------------------------
 4468006769 activities/4468006769.gpx.gz   <?xml version="1.0" encoding="UTF-8"?>
                                           <gpx creator="StravaGPX Android" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc
                                           ation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin
                                           .com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.gar
                                           min.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
                                           " version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlsch
                                           emas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
                                            <metadata>
                                             <time>2020-12-13T14:31:13Z</time>
                                            </metadata>
                                            <trk>
                                             <name>Loop</name>
                                             <type>1</type>
                                             <trkseg>
                                              <trkpt lat="51.5296380" lon="-0.1875360">
                                               <ele>30.6</ele>
                                               <time>2020-12-13T14:31:13Z</time>
                                               <extensions>
                                                <gpxtpx:TrackPointExtension>
                                                 <gpxtpx:hr>57</gpxtpx:hr>
                                                </gpxtpx:TrackPointExtension>
                                               </extensions>
                                              </trkpt>
…

activities/4468006769.gpx.gz - 1286238
Elapsed: 00:00:00.14
I can load the .gpx files into the GPX column of the activities table with a simple update statement.  The CLOB returned from the function is converted to an XML with XMLTYPE.
UPDATE activities
SET gpx = XMLTYPE(strava_pkg.getClobDocument('ACTIVITIES',filename))
WHERE filename like '%.gpx%'
/
I can now query back the same GPX from the database.
Set long 1100 lines 200 pages 99 serveroutput on
select activity_id, filename, gpx
from activities
where filename like '%.gpx%'
And activity_id = 4468006769
order by 1
/

ACTIVITY_ID FILENAME                       GPX
----------- ------------------------------ ----------------------------------------------------------------------------------------------------
 4468006769 activities/4468006769.gpx.gz   <?xml version="1.0" encoding="US-ASCII"?>
                                           <gpx creator="StravaGPX Android" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc
                                           ation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin
                                           .com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.gar
                                           min.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
                                           " version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlsch
                                           emas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
                                             <metadata>
                                               <time>2020-12-13T14:31:13Z</time>
                                             </metadata>
                                             <trk>
                                               <name>Loop</name>
                                               <type>1</type>
                                               <trkseg>
                                                 <trkpt lat="51.5296380" lon="-0.1875360">
                                                   <ele>30.6</ele>
                                                   <time>2020-12-13T14:31:13Z</time>
                                                   <extensions>
                                                     <gpxtpx:TrackPointExtension>
                                                       <gpxtpx:hr>57</gpxtpx:hr>
                                                     </gpxtpx:TrackPointExtension>
                                                   </extensions>
                                                 </trkpt>
                                                 <trkpt lat="51.5296350" lon="-0.1875340">
…

First Steps in Spatial Data

This is the introductory blog post in a series about using Spatial Data in the Oracle database.

Caveat: Spatial Data has been a part of the Oracle database since at least version 8i.  I have been aware of it for many years, but have never previously used it myself.  Recently, I have recently had some spare time and decided to experiment with it.  These blogs document my first steps.  I have spent a lot of time reading the documentation and using Google to find other people's blogs.  Where I found useful material I have provided links to it.  It is likely that more experienced developers can point out my mistakes, and better methods to achieve results.  In which case, I will gladly publish comments and make corrections to my material.

Index

  1. Loading GPX data into XML data types
  2. Convert GPX Track to a Spatial Line Geometry
  3. Analyse a track in proximity to a GPS route
  4. Obtaining Geographical Data
  5. Searching For Geometries That Intersect Other Geometries
  6. Text Searching Areas by their Name, and the Names of Parent Areas

Introduction

We used to use paper maps!
When not working with Oracle databases, I am a keen cyclist and I ride with a club.  I have also always enjoyed maps having been taught to read Ordnance Survey maps at school.  It is no surprise therefore that I lead rides for my cycling club.  We used to use (and you can still buy) paper maps.  By 2005, I was starting to use a GPS.
Compaq iPaq in an expansion 
jacket with a PCMCIA GPS
Initially, I recorded rides as tracks on PDA.  By 2012, I was regularly using an Android tablet on my handlebar bag for navigation.   The market has caught up and people now attach their phones to their handlebars or have dedicated bike computers with GPS and Bluetooth links to their phones.  The cycling club website includes a library of the routes of previous rides, however, you can only search that by the structured data held for that ride.  So, for example, I can only search for rides in the Chilterns if that word appears in the description.  I cannot do a spatial search.

I also use Strava, an internet service for tracking exercise.  It is mainly used by cyclists and runners.  Activities can be recorded on a phone or other device and are then uploaded, compared, and analysed.  Every time I go out on the bike I upload the activity.  I have also uploaded my back catalogue of GPS tracks.  As a result of the Coronavirus lockdowns, I bought an indoor trainer that I use with Zwift and that also posts data to Strava.  Both Strava and Zwift capture additional data from a heart monitor.  Strava will let you see a certain amount of analysis about your activities and how you compare to other people, and more if you pay for their subscription service.  They will also allow you to export and download all of your data as a set of structured data in CSV files, and also the GPX files and photographs that you uploaded.

Problem Statement

I thought it would be interesting to try to analyse and interrogate that data.  Typical questions might include:

  1. I ride up Swain's Lane in Highgate most days.  How long do I take, and am I getting faster or slower?
  2. I want to go for a ride in the Chilterns, I would like to see tracks of previous rides to get some route ideas.

So I am going to upload my Strava data into an Oracle database, load the GPS tracks currently in GPX files into the database, convert them to Spatial geometries, and then process them.  To answer the first question I will need to provide a working definition of Swain's Lane.  For the second, I need definitions of various areas.  For example, I will take the Chilterns to be the area designed by Natural England as an Area of Outstanding Natural Beauty.  So I will need to import a definition of that and other areas from published data.

The following series of blogs illustrate how I dealt with these and other challenges.