Friday, February 12, 2021

Spatial Data 2: Convert GPX Track to a Spatial Line Geometry

This blog is part of a series about my first steps using Spatial Data in the Oracle database.  I am using the GPS data for my cycling activities collected by Strava.  All of my files are available on GitHub.

Having loaded my GPS tracks from GPX files into an XML type column, the next stage is to extract the track points and create a spatial geometry column.  

Defining Spatial Geometries

Spatial objects are generically referred to as geometries.  When you define one, you have to specify what kind of geometry it is, and what coordinate system you are using. Later when you compare geometries to each other they have to use the same coordinate system. Otherwise, Oracle will raise an error.  Fortunately, Oracle can convert between coordinate systems.

Various coordinate systems are used for geographical data, they are given EPSG Geodetic Parameter Dataset codes.  Oracle supports various coordinate systems.  As well as older definitions, it also has current definitions where the ESPG code matches the Spatial Reference ID (SDO_SRID).  They can be queried from SDO_COORD_REF_SYS.

I will use two different coordinate systems during this series of blogs

Set lines 150 pages 99
Column coord_ref_sys_name format a35
Column legacy_cs_bounds format a110
select srid, coord_ref_sys_name, coord_ref_sys_kind, legacy_cs_bounds 
from SDO_COORD_REF_SYS where srid IN(4326, 27700)
/
      SRID COORD_REF_SYS_NAME                  COORD_REF_SYS_KIND
---------- ----------------------------------- ------------------------
LEGACY_CS_BOUNDS(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
--------------------------------------------------------------------------------------------------------------
      4326 WGS 84                              GEOGRAPHIC2D
SDO_GEOMETRY(2003, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 3), SDO_ORDINATE_ARRAY(-180, -90, 180, 90))

     27700 OSGB 1936 / British National Grid   PROJECTED

  • "The World Geodetic System (WGS) is a standard for use in cartography, geodesy, and satellite navigation including GPS". The latest revision is WGS 84 (also known as WGS 1984, EPSG:4326). It is the reference coordinate system used by the Global Positioning System (GPS).  Where I am dealing with longitude and latitude, specified in degrees, especially from GPS data, I need to tell Oracle that it is WGS84 by specifying SDO_SRID of 4326.
  • Later on, I will also be using data for Great Britain available from the Ordnance Survey that uses the Ordnance Survey National Grid (also known as British National Grid) reference system.  That requires SDO_SRID to be set to 27700.

See also:

Creating Spatial Points

I have found it useful to create a packaged function to convert longitude and latitude to a spatial data point.  It is a useful shorthand that I use in various places.

create or replace package body strava_pkg as 
k_module  CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;
…
----------------------------------------------------------------------------------------------------
function make_point 
(longitude in number
,latitude  in number)
return sdo_geometry deterministic is
  l_module VARCHAR2(64);
  l_action VARCHAR2(64);
begin
  dbms_application_info.read_module(module_name=>l_module
                                   ,action_name=>l_action);
  dbms_application_info.set_module(module_name=>k_module
                                  ,action_name=>'make_point');

  if longitude is not null and latitude is not null then
    return
      sdo_geometry (
        2001, 4326,
        sdo_point_type (longitude, latitude, null),
        null, null
      );
  else
    return null;
  end if;

  dbms_application_info.set_module(module_name=>l_module
                                  ,action_name=>l_action);
end make_point;
----------------------------------------------------------------------------------------------------
END strava_pkg;
/

strava_pkg.sql

There are two parameters to SDO_GEOMETRY that I always have to specify.

  • The first parameter, SDO_GTYPE, describes the natures of the spatial geometry being defined.  Here it is 2001.  The 2 indicates that it is a 2-dimensional geometry, and the 1 indicates that it is a single point.  See SDO_GEOMETRY Object Type
  • The second parameter, SDO_SRID, defines the coordinate system that I discussed above.  4326 indicates that I am working with longitude and latitude.

XML Namespace

GPS data is often held in GPX or GPS Exchange Format.  This is an XML schema.  GPX has been the de-facto XML standard for the lightweight interchange of GPS data since the initial GPX 1.0 release in 2002.  The GPX 1.1 schema was released in 2004 (see https://www.topografix.com/gpx.asp).  

Garmin has created an extension schema that holds additional athlete training information such as heart rate.

I can extract individual track points from a GPX with SQL using the extract() and extractvalue() functions.  However, I have GPX tracks that use both versions of the Topographix GPX schema (it depends on upon which piece of software emitted the GPX file), and some that also use the Garmin extensions.  

Therefore, I need to register all three schemas with Oracle.  I can download the schema files with wget.

cd /tmp/strava
wget http://www.topografix.com/GPX/1/0/gpx.xsd --output-document=gpx0.xsd
wget http://www.topografix.com/GPX/1/1/gpx.xsd
wget https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd

Then I can register the files 

delete from plan_table WHERE statement_id = 'XSD';
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 1, 'gpx0.xsd', 'http://www.topografix.com/GPX/1/0/gpx.xsd');
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 2, 'gpx.xsd', 'http://www.topografix.com/GPX/1/1/gpx.xsd');
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 3, 'TrackPointExtensionv1.xsd', 'https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd');

DECLARE
  xmlSchema xmlType;
  res       boolean;
BEGIN
  FOR i IN (
    SELECT object_alias schemaURL
    ,      object_name  schemaDoc
    FROM   plan_table
    WHERE  statement_id = 'XSD'
    ORDER BY plan_id
  ) LOOP
    --read xsd file
    xmlSchema := XMLTYPE(getCLOBDocument('STRAVA',i.schemaDoc,'AL32UTF8'));
    --if already exists delete XSD
    if (dbms_xdb.existsResource(i.schemaDoc)) then
        dbms_xdb.deleteResource(i.schemaDoc);
    end if;
    --create resource from XSD
    res := dbms_xdb.createResource(i.schemaDoc,xmlSchema);

    -- Delete existing  schema
    dbms_xmlschema.deleteSchema(
      i.schemaURL
    );
    -- Now reregister the schema
    dbms_xmlschema.registerSchema(
      i.schemaURL,
      xmlSchema,
      TRUE,TRUE,FALSE,FALSE
    );
  END LOOP;
End;
/
3a_register_xml_schema.sql

Then I can query the registered schemas.

Set pages 99 lines 160
Column schema_url format a60
Column qual_schema_url format a105
select schema_url, local, hier_type, binary, qual_schema_url
from user_xml_schemas
/

SCHEMA_URL                                                   LOC HIER_TYPE   BIN
------------------------------------------------------------ --- ----------- ---
QUAL_SCHEMA_URL
---------------------------------------------------------------------------------------------------------
https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd

http://www.topografix.com/GPX/1/0/gpx.xsd                    YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/0/gpx.xsd

http://www.topografix.com/GPX/1/1/gpx.xsd                    YES CONTENTS    NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/1/gpx.xsd

Extracting GPS Track Points from GPX

A GPS track is a list of points specifying at least time, longitude, latitude and often elevation.  I can extract all the points in a GPX as a set of rows.  However, I must specify the correct namespace for the specific GPX.

Column time_string format a20
SELECT g.activity_id
,      EXTRACTVALUE(VALUE(t), 'trkpt/time') time_string
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
       ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
 FROM activities g,
      TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/trk/trkseg/trkpt'
      ,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'
      ))) t
Where  activity_id IN(4468006769)
And rownum <= 10
/

  Activity
        ID TIME_STRING                    LAT           LNG     ELE   HR
---------- -------------------- ------------- ------------- ------- ----
4468006769 2020-12-13T14:31:13Z   51.52963800    -.18753600    30.6   57
           2020-12-13T14:31:14Z   51.52963500    -.18753400    30.6   57
           2020-12-13T14:31:15Z   51.52964100    -.18753100    30.6   57
           2020-12-13T14:31:16Z   51.52964000    -.18752900    30.6   57
           2020-12-13T14:31:17Z   51.52963600    -.18752700    30.6   57
           2020-12-13T14:31:18Z   51.52963200    -.18752700    30.6   57
           2020-12-13T14:31:19Z   51.52962900    -.18752800    30.6   57
           2020-12-13T14:31:20Z   51.52962800    -.18752800    30.6   57
           2020-12-13T14:31:21Z   51.52962800    -.18752900    30.6   57
           2020-12-13T14:31:22Z   51.52962800    -.18753000    30.6   57

I can use this approach to extract all the points from a GPS track and create a spatial line geometry.  I have put the whole process into a packaged procedure strava_pkg.load_activity.

First I need to work out which version of the Topographix schema is in use.  So I can try extracting the creator name with each and see which is not null.

…
IF l_num_rows > 0 THEN
  UPDATE activities
  SET    gpx = XMLTYPE(l_gpx), geom = null, geom_27700 = null, num_pts = 0, xmlns = NULL
  WHERE  activity_id = p_activity_id
  RETURNING extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/0"') 
  ,         extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/1"') 
  INTO      l_xmlns0, l_xmlns1;
  l_num_rows := SQL%rowcount;
END IF;
…

Now I can extract all the points in a GPX as a set of rows and put them into a spatial geometry.  I turn each row with two coordinates into two rows with one point each.  Note that longitude is listed before latitude for each point.  I convert the rows into a list using multiset() and finally cast that as a spatial ordinate array. 

Note that the SDO_GTYPE is 2002 (rather than 2001) because it is a line (rather than a single point) on a two-dimensional coordinate system.

  BEGIN
    UPDATE activities a
    SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
    cast(multiset(
      select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
      from (
        SELECT rownum rn
        ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) as lng
        ,      TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) as lat
        FROM   TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt', 'xmlns="http://www.topografix.com/GPX/1/1"'))) t
        ) pt,
        (select 1 rn from dual union all select 2 from dual) n
	    order by pt.rn, n.rn
      ) AS mdsys.sdo_ordinate_array))
    , xmlns = 'xmlns="http://www.topografix.com/GPX/1/1"'
    WHERE  a.gpx IS NOT NULL
    And    activity_id = p_activity_id;
    l_num_rows := SQL%rowcount;
  EXCEPTION
    WHEN e_13034 OR e_29877 THEN 
	  dbms_output.put_line('Exception:'||sqlerrm);
	  l_num_rows := 0;
  END;

I have found it helpful to simplify the line geometry with sdo_util.simplify(). It removes some of the noise in the GPS data and has resolved problems with calculating the length of lines that intersect with areas.

  BEGIN
    UPDATE activities 
    SET    geom = sdo_util.simplify(geom,1)
    WHERE  geom IS NOT NULL
    And    activity_id = p_activity_id;
    l_num_rows := SQL%rowcount;
  EXCEPTION
    WHEN e_13034 THEN 
	  dbms_output.put_line('Exception:'||sqlerrm);
  END;

There are a few other fields I also update at this point.  You will see me use them later.

  • NUM_PTS is the number of points in the line geometry.  
  • GEOM_27700 is the result of converting the line to British National Grid reference coordinates.  This helps when comparing it to British boundary data obtained from the Ordnance Survey or other government agencies.
  • MBR is the minimum bounding rectangle for the line.  This is generated to enable me to improve the performance of some spatial queries.  I have found some of the spatial operators to calculate intersections between geometries are quite slow and CPU intensive when applied to GPS tracks and boundary data that both have lots of points.  SDO_GEOM.SDO_MBR simply returns 4 ordinates that define the bounding rectangle.  This can be used to roughly match geometries that might match before doing a proper match.

  UPDATE activities 
  SET    num_pts = SDO_UTIL.GETNUMVERTICES(geom)
  ,      geom_27700 = sdo_cs.transform(geom,27700)
  ,      mbr = sdo_geom.sdo_mbr(geom)
  WHERE  geom IS NOT NULL
  And    activity_id = p_activity_id
  RETURNING num_pts INTO l_num_pts;
  dbms_output.put_line('Activity ID:'||p_activity_id||', '||l_num_pts||' points');
…

Now I can load each GPX and process it into a spatial geometry in one step.  I can process all of the activities in a simple loop.

set serveroutput on timi on
exec strava_pkg.load_activity(4468006769);
Loading Activity: 4468006769
ACTIVITIES/4468006769.gpx.gz - 1286238 bytes
xmlns 1=StravaGPX Android
Activity ID:4468006769, 998 points

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.41
Now my Strava activities are all in spatial geometries and I can start to do some spatial processing.

No comments :