Thursday, February 18, 2021

Spatial Data 3. Analyse a track in proximity to a GPS route

This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava.  All of my files are available on GitHub.

Swain's Lane, Highgate
Now I have loaded some data, I am going to start to do something useful with it.  I go out on my bike most mornings, and I usually ride up Swain's Lane in Highgate three times.  How long did each one take?  Over time, have I got faster or slower?

I need a definition of Swain's Lane that I can compare to.  I will start by drawing a route with my favourite GPS software.  A route is just a sequence of route points.  I can then export that as a GPX file.

<?xml version="1.0" encoding="UTF-8"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="ViewRanger - //www.viewranger.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
  <rte>
    <name><![CDATA[Swain's World]]></name>
    <rtept lat="51.569613039632" lon="-0.14770468632509"></rtept>
    <rtept lat="51.569407978151" lon="-0.14832964102552"></rtept>
    <rtept lat="51.567090552402" lon="-0.14674177328872"></rtept>
    <rtept lat="51.567080548869" lon="-0.14592101733016"></rtept>
    <rtept lat="51.569618041121" lon="-0.14773419062425"></rtept>
  </rte>
</gpx>

Geometries Table

I will load the GPX route into a table much as I did with the track files. 
drop table my_geometries purge;

createg table my_geometries
(geom_id    NUMBER NOT NULL 
,descr      VARCHAR2(64)
,gpx        XMLTYPE
,geom       mdsys.sdo_geometry
,geom_27700 mdsys.sdo_geometry
,mbr        mdsys.sdo_geometry
,constraint my_geometries_pk PRIMARY KEY (geom_id)
)
XMLTYPE COLUMN gpx STORE AS SECUREFILE BINARY XML (CACHE DISABLE STORAGE IN ROW)
/
The difference is that I have a series of route points instead of track points, so the paths in extract() and extractvalue() are slightly different.
delete from my_geometries where geom_id = 2;
INSERT INTO my_geometries (geom_id, descr, gpx) 
VALUES (2,'Swains World Route', XMLTYPE(strava_pkg.getClobDocument('STRAVA','swainsworldroute.gpx')));

UPDATE my_geometries
SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
cast(multiset(
  select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
  from (
    SELECT /*+MATERIALIZE*/ rownum rn
    ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon')) as lng
    ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat')) as lat
    FROM   my_geometries g,
           TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t
    where g.geom_id = 2
    ) pt,
    (select 1 rn from dual union all select 2 from dual) n
	order by pt.rn, n.rn
  ) AS mdsys.sdo_ordinate_array))
WHERE gpx IS NOT NULL
AND   geom IS NULL
/
UPDATE my_geometries
SET mbr = sdo_geom.sdo_mbr(geom)
,   geom_27700 = sdo_cs.transform(geom,27700)
/

Commit;
Set pages 99 lines 180 
Select geom_id, descr, gpx, geom 
from my_geometries
where geom_id = 2;

   GEOM_ID DESCR
---------- ----------------------------------------------------------------
GPX
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GEOM(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         2 Swains World Route
<?xml version="1.0" encoding="US-ASCII"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="ViewRanger - //www.viewranger.com" xml
SDO_GEOMETRY(2002, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(-.14651114, 51.5670769, -.14649237, 51.567298, -.1465782, 51.567563, -.14680618, 51.5680165, -.14697
516, 51.5682533, -.14754379, 51.5688701, -.14807219, 51.5694887))
I am going to build spatial indexes on the geometry columns, so I need to define the upper and lower bound values on the coordinates.
delete from user_sdo_geom_metadata where table_name = 'ACTIVITIES';
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values ( 
  'ACTIVITIES' , 'GEOM_27700',
  sdo_dim_array(
    sdo_dim_element('Easting',-1000000,1500000,0.05), 
    sdo_dim_element('Northing', -500000,2000000,0.05)),
  27700);
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values ( 
  'ACTIVITIES' , 'GEOM',
  sdo_dim_array(
    sdo_dim_element('Longitude',-180,180,0.05), 
    sdo_dim_element('Latgitude',-90,90,0.05)),
  4326);
commit;

CREATE INDEX activities_geom ON ACTIVITIES (geom) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;
CREATE INDEX activities_geom_27700 ON ACTIVITIES (geom_27700) INDEXTYPE IS MDSYS.SPATIAL_INDEX_v2;

Compare Geometries

Now I can compare my Swain's Lane geometry to my activity geometries.  Let's start by looking for rides in December 2020 that went up Swain's Lane
Column activity_id heading 'Activity|ID'
Column activity_name format a30
Column geom_relate heading 'geom|relate' format a6
With a as (
SELECT a.activity_id, a.activity_date, a.activity_name
,      SDO_GEOM.RELATE(a.geom,'anyinteract',g.geom,25) geom_relate
FROM   activities a
,      my_geometries g
WHERE  a.activity_type = 'Ride'
--And    a.activity_id IN(4468006769)
And    a.activity_date >= TO_DATE('01122020','DDMMYYYY')
and    g.geom_id = 2 /*Swains World Route*/
)
Select *
From   a
Where  geom_relate = 'TRUE'
Order by activity_date
/
Where there is a relation between the two geometries then I have a hit.
  Activity                                                    geom
        ID ACTIVITY_DATE       ACTIVITY_NAME                  relate
---------- ------------------- ------------------------------ ------
4419821750 08:44:45 02.12.2020 Loop                           TRUE
4428307816 10:49:25 04.12.2020 Loop                           TRUE
4431920358 09:41:13 05.12.2020 Loop                           TRUE
…
4528825613 09:39:38 28.12.2020 Loop                           TRUE
4534027888 11:29:45 29.12.2020 Loop                           TRUE
4538488655 09:57:55 30.12.2020 Loop                           TRUE

25 rows selected.
Updated 24.11.2021: But I also made a fundamental error here.  SDO_GEOM.RELATE is a spatial function and does not use the spatial index that I created.
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |      1 |        |    862 |00:00:29.88 |     118K|     20 |
|   1 |  SORT ORDER BY                |                  |      1 |     29 |    862 |00:00:29.88 |     118K|     20 |
|   2 |   NESTED LOOPS                |                  |      1 |     29 |    862 |00:00:23.49 |   93227 |     20 |
|   3 |    TABLE ACCESS BY INDEX ROWID| MY_GEOMETRIES    |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |
|*  4 |     INDEX UNIQUE SCAN         | MY_GEOMETRIES_PK |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
|*  5 |    TABLE ACCESS FULL          | ACTIVITIES       |      1 |     29 |    862 |00:00:23.49 |   93225 |     20 |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("G"."GEOM_ID"=2)
   5 - filter(("A"."ACTIVITY_TYPE"='Ride' AND "SDO_GEOM"."RELATE"("A"."GEOM",'anyinteract',"G"."GEOM",25)='TRUE'))
Instead, I should use the SDO_ANYINTERACT() spatial operator.
SELECT a.activity_id, a.activity_date, a.activity_name
,      a.distance_km
FROM   activities a
,      my_geometries g
WHERE  a.activity_type = 'Ride'
and    g.geom_id = 2 /*Swains World Route*/
AND    SDO_ANYINTERACT(a.geom, g.geom) = 'TRUE'
Order by activity_date
And now it uses the spatial index on ACTIVITIES.
-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                  |      1 |        |    824 |00:00:07.04 |   34602 |     14 |
|   1 |  SORT ORDER BY                    |                  |      1 |      1 |    824 |00:00:07.04 |   34602 |     14 |
|   2 |   NESTED LOOPS                    |                  |      1 |      1 |    824 |00:00:07.04 |   34602 |     14 |
|   3 |    TABLE ACCESS BY INDEX ROWID    | MY_GEOMETRIES    |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |
|*  4 |     INDEX UNIQUE SCAN             | MY_GEOMETRIES_PK |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
|*  5 |    TABLE ACCESS BY INDEX ROWID    | ACTIVITIES       |      1 |      1 |    824 |00:00:07.04 |   34600 |     14 |
|*  6 |     DOMAIN INDEX (SEL: 0.010000 %)| ACTIVITIES_GEOM  |      1 |        |    824 |00:00:07.03 |   33224 |     14 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("G"."GEOM_ID"=2)
   5 - filter("A"."ACTIVITY_TYPE"='Ride')
   6 - access("MDSYS"."SDO_ANYINTERACT"("A"."GEOM","G"."GEOM")='TRUE')

Analyse Individual Efforts

Now I want to analyse each of my trips up Swain's Lane on a particular day.  I am going to work with the GPX rather than the spatial geometry because I am interested also in time, elevation and heart rate data that is not stored in the spatial geometry.
Also, you can't use analytic functions on spatial geometries.
with x as (
SELECT activity_id
,      TO_DATE(EXTRACTVALUE(VALUE(t), 'trkpt/time'),'YYYY-MM-DD"T"HH24:MI:SS"Z"') time
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
FROM   activities a,
       TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
WHERE  a.activity_id IN(4468006769)
), y as (
select x.*, strava_pkg.make_point(lng,lat) loc
from x
)
select lag(loc,1) over (partition by activity_id order by time) last_loc
from   y
/

select lag(loc,1) over (partition by activity_id order by time) last_loc
           *
ERROR at line 13:
ORA-22901: cannot compare VARRAY or LOB attributes of an object type
Instead, I will have to apply analytic functions to the values extracted from the GPX and then create a spatial point.  Thus I will be able to calculate the length of each individual trip by aggregating the distance between each pair of points.
The following query splits out each trip up Swain's Lane in a particular activity and shows the distance, duration, and metrics about elevation, gradient, and heart rate. 
alter session set statistics_level=ALL;
alter session set nls_date_Format = 'hh24:mi:ss dd.mm.yyyy';
break on activity_id skip 1
compute sum of sum_dist on activity_id
compute sum of num_pt on activity_id
compute sum of sum_secs on activity_id
Set lines 180 pages 50 timi on
Column activity_id heading 'Activity|ID'
Column activity_name format a15
column time format a20
column lat format 999.99999999
column lng format 999.99999999
column ele format 9999.9
column hr format 999
column sdo_relate format a10
column num_pts heading 'Num|Pts' format 99999
column sum_dist heading 'Dist.|(km)' format 999.999
column sum_secs heading 'Secs' format 9999
column avg_speed heading 'Avg|Speed|(kmph)' format 99.9
column ele_gain heading 'Ele|Gain|(m)' format 9999.9
column ele_loss heading 'Ele|Loss|(m)' format 9999.9
column avg_grade heading 'Avg|Grade|%' format 99.9
column min_ele heading 'Min|Ele|(m)' format 999.9
column max_ele heading 'Max|Ele|(m)' format 999.9
column avg_hr heading 'Avg|HR' format 999
column max_hr heading 'Max|HR' format 999
WITH geo as ( /*route geometry to compare to*/
select /*MATERIALIZE*/ g.*, 25 tol
,      sdo_geom.sdo_length(geom, unit=>'unit=m') geom_length
from   my_geometries g
where  geom_id = 2 /*Swains World Route*/
), a as ( /*extract all points in activity*/
SELECT a.activity_id, g.geom g_geom, g.tol, g.geom_length
,      TO_DATE(EXTRACTVALUE(VALUE(t), 'trkpt/time'),'YYYY-MM-DD"T"HH24:MI:SS"Z"') time
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
FROM   activities a,
       geo g,
       TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'))) t
Where  a.activity_id IN(4468006769)
AND    SDO_ANYINTERACT(a.geom, g.geom) = 'TRUE' /*activity has relation to reference geometry*/
), b as ( /*smooth elevation*/
Select a.*
,      avg(ele) over (partition by activity_id order by time rows between 2 preceding and 2 following) avg_ele
From   a
), c as ( /*last point*/
Select b.*
,      row_number() over (partition by activity_id order by time) seq
,      lag(time,1) over (partition by activity_id order by time) last_time
,      lag(lat,1) over (partition by activity_id order by time) last_lat
,      lag(lng,1) over (partition by activity_id order by time) last_lng
--,      lag(ele,1) over (partition by activity_id order by time) last_ele
,      lag(avg_ele,1) over (partition by activity_id order by time) last_avg_ele
From   b
), d as ( /*make points*/
SELECT c.* 
,      strava_pkg.make_point(lng,lat) loc
,      strava_pkg.make_point(last_lng,last_lat) last_loc
FROM   c
), e as ( /*determine whether point is inside the polygon*/
select d.*
,      86400*(time-last_time) secs
,      avg_ele-last_avg_ele ele_diff
,      sdo_geom.sdo_distance(loc,last_loc,0.05,'unit=m') dist
,      SDO_GEOM.RELATE(loc,'anyinteract', g_geom, tol) sdo_relate
FROM   d
), f as (
select e.*
,      CASE WHEN sdo_relate != lag(sdo_relate,1) over (partition by activity_id order by time) THEN 1 END sdo_diff
from   e
), g as (
select f.*
,      SUM(sdo_diff) over (partition by activity_id order by time range between unbounded preceding and current row) sdo_seq
from f
where  sdo_relate = 'TRUE'
)
select activity_id, min(time), max(time)
, sum(dist)/1000 sum_dist
, sum(secs) sum_secs
, 3.6*sum(dist)/sum(secs) avg_speed
, sum(greatest(0,ele_diff)) ele_gain
, sum(least(0,ele_diff)) ele_loss
, 100*sum(ele_diff*dist)/sum(dist*dist) avg_grade
, min(ele) min_ele
, max(ele) max_ele
, sum(hr*secs)/sum(secs) avg_Hr
, max(hr) max_hr
, count(*) num_pts
from   g
group by activity_id, sdo_seq, g.geom_length
having sum(dist)>= g.geom_length/2 /*make sure line we find is longer than half route to prevent fragmentation*/
order by 2
/
select * from table(dbms_xplan.display_cursor(null,null,'ADVANCED +IOSTATS -PROJECTION +ADAPTIVE'))
/
4a_1swains.sql
  • In subquery a, I compare the geometry of the activity with the geometry of Swain's Lane using sdo_geom.relate() to confirm that the activity includes Swain's Lane, but then I extract all the points in the activity GPX.
  • GPS is optimised for horizontal accuracy.  Even so, the tolerance for determining whether the track is close to the route has to be set to 25m to allow for noise in the data (Swain's Lane is tree-lined, and has walls on both sides, that both attenuate the GPS signal).  GPS elevation data is notorious for being noisy even under good conditions; you can see this in the variation of height gained on each ascent.  Sub-query b calculates an average elevation across 5 track points (up to +/-2 points).  
  • I need to compare each point in the track to each previous point so I can do some calculations and determine when the track comes into proximity with the Swain's Lane route, subquery c uses analytic functions to determine the previous point.  It is not possible to apply the analytic function to a geometry.
  • Subquery e determines whether a track point is in proximity to the route.  The tolerance, 25m, is set in subquery geo.  Then subquery f flags where the track point is in proximity to the route and the previous one was not.  Finally, subquery g maintains a running total of the number of times the track has gone close enough to the route.  That becomes a sequence number for each ascent of Swain's Lane by which I can group the subsequent analytics.
                                                                     Avg     Ele     Ele   Avg    Min    Max
  Activity                                            Dist.        Speed    Gain    Loss Grade    Ele    Ele  Avg  Max   Num
        ID MIN(TIME)           MAX(TIME)               (km)  Secs (kmph)     (m)     (m)     %    (m)    (m)   HR   HR   Pts
---------- ------------------- ------------------- -------- ----- ------ ------- ------- ----- ------ ------ ---- ---- -----
4468006769 14:55:51 13.12.2020 14:58:17 13.12.2020     .372   147    9.1    36.1      .0   8.6   86.8  122.7  141  153   147
           15:08:13 13.12.2020 15:10:28 13.12.2020     .374   136    9.9    36.2      .0   8.4   86.8  122.8  147  155   136
           15:22:49 13.12.2020 15:25:18 13.12.2020     .369   150    8.9    36.2      .0   8.2   86.8  122.7  147  155   150
**********                                         -------- -----
sum                                                   1.116   433
On my laptop, this query takes about 10s, of which about 8s is spent on the window sort for the analytic functions, and 2s is spent working out whether the track points are in proximity to the route.
Plan hash value: 3042349692

---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name             | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                  |      2 |        |       |       |  5147 (100)|          |      6 |00:00:20.94 |     392 |
|   1 |  SORT ORDER BY                           |                  |      2 |      1 |   104 |       |  5147   (1)| 00:00:01 |      6 |00:00:20.94 |     392 |
|*  2 |   FILTER                                 |                  |      2 |        |       |       |            |          |      6 |00:00:20.94 |     392 |
|   3 |    HASH GROUP BY                         |                  |      2 |      1 |   104 |       |  5147   (1)| 00:00:01 |      6 |00:00:20.94 |     392 |
|   4 |     VIEW                                 |                  |      2 |   8168 |   829K|       |  5144   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|   5 |      WINDOW SORT                         |                  |      2 |   8168 |    16M|    21M|  5144   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|*  6 |       VIEW                               |                  |      2 |   8168 |    16M|       |  1569   (1)| 00:00:01 |    866 |00:00:20.93 |     392 |
|   7 |        WINDOW SORT                       |                  |      2 |   8168 |  1403K|  1688K|  1569   (1)| 00:00:01 |  10208 |00:00:09.63 |     392 |
|   8 |         VIEW                             |                  |      2 |   8168 |  1403K|       |  1252   (1)| 00:00:01 |  10208 |00:00:06.22 |     392 |
|   9 |          WINDOW SORT                     |                  |      2 |   8168 |  1021K|  1248K|  1252   (1)| 00:00:01 |  10208 |00:00:06.20 |     392 |
|  10 |           VIEW                           |                  |      2 |   8168 |  1021K|       |  1016   (1)| 00:00:01 |  10208 |00:00:06.05 |     392 |
|  11 |            WINDOW SORT                   |                  |      2 |   8168 |  4546K|  5040K|  1016   (1)| 00:00:01 |  10208 |00:00:00.76 |     392 |
|  12 |             NESTED LOOPS                 |                  |      2 |   8168 |  4546K|       |    31   (0)| 00:00:01 |  10208 |00:00:00.41 |     392 |
|  13 |              NESTED LOOPS                |                  |      2 |      1 |   560 |       |     2   (0)| 00:00:01 |      2 |00:00:00.03 |     104 |
|  14 |               TABLE ACCESS BY INDEX ROWID| MY_GEOMETRIES    |      2 |      1 |   112 |       |     1   (0)| 00:00:01 |      2 |00:00:00.01 |       4 |
|* 15 |                INDEX UNIQUE SCAN         | MY_GEOMETRIES_PK |      2 |      1 |       |       |     0   (0)|          |      2 |00:00:00.01 |       2 |
|* 16 |               TABLE ACCESS BY INDEX ROWID| ACTIVITIES       |      2 |      1 |   448 |       |     1   (0)| 00:00:01 |      2 |00:00:00.03 |     100 |
|* 17 |                INDEX UNIQUE SCAN         | ACTIVITIES_PK    |      2 |      1 |       |       |     0   (0)|          |      2 |00:00:00.01 |       4 |
|  18 |              XPATH EVALUATION            |                  |      2 |        |       |       |            |          |  10208 |00:00:00.37 |     288 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SUM("DIST")>="G"."GEOM_LENGTH"/2)
   6 - filter("SDO_RELATE"='TRUE')
  15 - access("GEOM_ID"=2)
  16 - filter("MDSYS"."SDO_ANYINTERACT"("A"."GEOM","G"."GEOM")='TRUE')
  17 - access("A"."ACTIVITY_ID"=4468006769)
I can apply this approach to all my trips up Swain's Lane.  However, I have logged 1115 ascents, and if I attempt to process them in a single SQL query I will have to do some very large window sorts that will spill out of memory (at least they will on my machine).  Instead, it is faster to process each activity separately in a PL/SQL loop (see 4x_allswains2.sql).
I now have a table containing all of my ascents of Swain's Lane and I can see if I am getting faster or slower.  I simply dumped the data into Excel with SQL Developer.  
Unfortunately, I have discovered that I am not going faster!

No comments :