This blog is part of a series about my first steps using Spatial Data in the Oracle database. I am using the GPS data for my cycling activities collected by Strava. All of my files are available on GitHub.
Having loaded my GPS tracks from GPX files into an XML type column, the next stage is to extract the track points and create a spatial geometry column.
Defining Spatial Geometries
Spatial objects are generically referred to as geometries. When you define one, you have to specify what kind of geometry it is, and what coordinate system you are using. Later when you compare geometries to each other they have to use the same coordinate system. Otherwise, Oracle will raise an error. Fortunately, Oracle can convert between coordinate systems.
Various coordinate systems are used for geographical data, they are given EPSG Geodetic Parameter Dataset codes. Oracle supports various coordinate systems. As well as older definitions, it also has current definitions where the ESPG code matches the Spatial Reference ID (SDO_SRID). They can be queried from SDO_COORD_REF_SYS.
I will use two different coordinate systems during this series of blogs
Set lines 150 pages 99
Column coord_ref_sys_name format a35
Column legacy_cs_bounds format a110
select srid, coord_ref_sys_name, coord_ref_sys_kind, legacy_cs_bounds
from SDO_COORD_REF_SYS where srid IN(4326, 27700)
/
SRID COORD_REF_SYS_NAME COORD_REF_SYS_KIND
---------- ----------------------------------- ------------------------
LEGACY_CS_BOUNDS(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
--------------------------------------------------------------------------------------------------------------
4326 WGS 84 GEOGRAPHIC2D
SDO_GEOMETRY(2003, 4326, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 3), SDO_ORDINATE_ARRAY(-180, -90, 180, 90))
27700 OSGB 1936 / British National Grid PROJECTED
- "The World Geodetic System (WGS) is a standard for use in cartography, geodesy, and satellite navigation including GPS". The latest revision is WGS 84 (also known as WGS 1984, EPSG:4326). It is the reference coordinate system used by the Global Positioning System (GPS). Where I am dealing with longitude and latitude, specified in degrees, especially from GPS data, I need to tell Oracle that it is WGS84 by specifying SDO_SRID of 4326.
- Later on, I will also be using data for Great Britain available from the Ordnance Survey that uses the Ordnance Survey National Grid (also known as British National Grid) reference system. That requires SDO_SRID to be set to 27700.
See also:
- AskTOM: Spatial queries involving latitude and longitude: "[The spatial reference ID] is extremely important. If we do not tell the database what coordinate system we are using, then we will not be able to make meaningful comparisons or measurements. This is especially true for spherical coordinates (long/lat). If we do not say that they are spherical, then any measurement (area, length, distances …) will be totally wrong. Also, we will not be able to correlate them with other geographic shapes, in other coordinate systems."
- What are the differences between 81989 and 27700 CRS for the British National Grid?
- Ordnance Survey Coordinate Tools
Creating Spatial Points
I have found it useful to create a packaged function to convert longitude and latitude to a spatial data point. It is a useful shorthand that I use in various places.
create or replace package body strava_pkg as
k_module CONSTANT VARCHAR2(48) := $$PLSQL_UNIT;
…
----------------------------------------------------------------------------------------------------
function make_point
(longitude in number
,latitude in number)
return sdo_geometry deterministic is
l_module VARCHAR2(64);
l_action VARCHAR2(64);
begin
dbms_application_info.read_module(module_name=>l_module
,action_name=>l_action);
dbms_application_info.set_module(module_name=>k_module
,action_name=>'make_point');
if longitude is not null and latitude is not null then
return
sdo_geometry (
2001, 4326,
sdo_point_type (longitude, latitude, null),
null, null
);
else
return null;
end if;
dbms_application_info.set_module(module_name=>l_module
,action_name=>l_action);
end make_point;
----------------------------------------------------------------------------------------------------
END strava_pkg;
/
There are two parameters to SDO_GEOMETRY that I always have to specify.
- The first parameter, SDO_GTYPE, describes the natures of the spatial geometry being defined. Here it is 2001. The 2 indicates that it is a 2-dimensional geometry, and the 1 indicates that it is a single point. See SDO_GEOMETRY Object Type
- The second parameter, SDO_SRID, defines the coordinate system that I discussed above. 4326 indicates that I am working with longitude and latitude.
XML Namespace
GPS data is often held in GPX or GPS Exchange Format. This is an XML schema. GPX has been the de-facto XML standard for the lightweight interchange of GPS data since the initial GPX 1.0 release in 2002. The GPX 1.1 schema was released in 2004 (see https://www.topografix.com/gpx.asp).
Garmin has created an extension schema that holds additional athlete training information such as heart rate.
I can extract individual track points from a GPX with SQL using the extract() and extractvalue() functions. However, I have GPX tracks that use both versions of the Topographix GPX schema (it depends on upon which piece of software emitted the GPX file), and some that also use the Garmin extensions.
Therefore, I need to register all three schemas with Oracle. I can download the schema files with wget.
cd /tmp/strava
wget http://www.topografix.com/GPX/1/0/gpx.xsd --output-document=gpx0.xsd
wget http://www.topografix.com/GPX/1/1/gpx.xsd
wget https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
Then I can register the files
delete from plan_table WHERE statement_id = 'XSD';
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 1, 'gpx0.xsd', 'http://www.topografix.com/GPX/1/0/gpx.xsd');
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 2, 'gpx.xsd', 'http://www.topografix.com/GPX/1/1/gpx.xsd');
insert into plan_table (statement_id, plan_id, object_name, object_alias)
values ('XSD', 3, 'TrackPointExtensionv1.xsd', 'https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd');
DECLARE
xmlSchema xmlType;
res boolean;
BEGIN
FOR i IN (
SELECT object_alias schemaURL
, object_name schemaDoc
FROM plan_table
WHERE statement_id = 'XSD'
ORDER BY plan_id
) LOOP
--read xsd file
xmlSchema := XMLTYPE(getCLOBDocument('STRAVA',i.schemaDoc,'AL32UTF8'));
--if already exists delete XSD
if (dbms_xdb.existsResource(i.schemaDoc)) then
dbms_xdb.deleteResource(i.schemaDoc);
end if;
--create resource from XSD
res := dbms_xdb.createResource(i.schemaDoc,xmlSchema);
-- Delete existing schema
dbms_xmlschema.deleteSchema(
i.schemaURL
);
-- Now reregister the schema
dbms_xmlschema.registerSchema(
i.schemaURL,
xmlSchema,
TRUE,TRUE,FALSE,FALSE
);
END LOOP;
End;
/
Then I can query the registered schemas.
Set pages 99 lines 160
Column schema_url format a60
Column qual_schema_url format a105
select schema_url, local, hier_type, binary, qual_schema_url
from user_xml_schemas
/
SCHEMA_URL LOC HIER_TYPE BIN
------------------------------------------------------------ --- ----------- ---
QUAL_SCHEMA_URL
---------------------------------------------------------------------------------------------------------
https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd YES CONTENTS NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/https://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd
http://www.topografix.com/GPX/1/0/gpx.xsd YES CONTENTS NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/0/gpx.xsd
http://www.topografix.com/GPX/1/1/gpx.xsd YES CONTENTS NO
http://xmlns.oracle.com/xdb/schemas/STRAVA/www.topografix.com/GPX/1/1/gpx.xsd
Extracting GPS Track Points from GPX
A GPS track is a list of points specifying at least time, longitude, latitude and often elevation. I can extract all the points in a GPX as a set of rows. However, I must specify the correct namespace for the specific GPX.
Column time_string format a20
SELECT g.activity_id
, EXTRACTVALUE(VALUE(t), 'trkpt/time') time_string
, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) lat
, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) lng
, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/ele')) ele
, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/extensions/gpxtpx:TrackPointExtension/gpxtpx:hr'
,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"')) hr
FROM activities g,
TABLE(XMLSEQUENCE(extract(g.gpx,'/gpx/trk/trkseg/trkpt'
,'xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"'
))) t
Where activity_id IN(4468006769)
And rownum <= 10
/
Activity
ID TIME_STRING LAT LNG ELE HR
---------- -------------------- ------------- ------------- ------- ----
4468006769 2020-12-13T14:31:13Z 51.52963800 -.18753600 30.6 57
2020-12-13T14:31:14Z 51.52963500 -.18753400 30.6 57
2020-12-13T14:31:15Z 51.52964100 -.18753100 30.6 57
2020-12-13T14:31:16Z 51.52964000 -.18752900 30.6 57
2020-12-13T14:31:17Z 51.52963600 -.18752700 30.6 57
2020-12-13T14:31:18Z 51.52963200 -.18752700 30.6 57
2020-12-13T14:31:19Z 51.52962900 -.18752800 30.6 57
2020-12-13T14:31:20Z 51.52962800 -.18752800 30.6 57
2020-12-13T14:31:21Z 51.52962800 -.18752900 30.6 57
2020-12-13T14:31:22Z 51.52962800 -.18753000 30.6 57
I can use this approach to extract all the points from a GPS track and create a spatial line geometry. I have put the whole process into a packaged procedure strava_pkg.load_activity.
First I need to work out which version of the Topographix schema is in use. So I can try extracting the creator name with each and see which is not null.
…
IF l_num_rows > 0 THEN
UPDATE activities
SET gpx = XMLTYPE(l_gpx), geom = null, geom_27700 = null, num_pts = 0, xmlns = NULL
WHERE activity_id = p_activity_id
RETURNING extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/0"')
, extractvalue(gpx,'/gpx/@version', 'xmlns="http://www.topografix.com/GPX/1/1"')
INTO l_xmlns0, l_xmlns1;
l_num_rows := SQL%rowcount;
END IF;
…
Now I can extract all the points in a GPX as a set of rows and put them into a spatial geometry. I turn each row with two coordinates into two rows with one point each. Note that longitude is listed before latitude for each point. I convert the rows into a list using multiset() and finally cast that as a spatial ordinate array.
Note that the SDO_GTYPE is 2002 (rather than 2001) because it is a line (rather than a single point) on a two-dimensional coordinate system.
BEGIN
UPDATE activities a
SET geom = mdsys.sdo_geometry(2002,4326,null,mdsys.sdo_elem_info_array(1,2,1),
cast(multiset(
select CASE n.rn WHEN 1 THEN pt.lng WHEN 2 THEN pt.lat END ord
from (
SELECT rownum rn
, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lon')) as lng
, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'trkpt/@lat')) as lat
FROM TABLE(XMLSEQUENCE(extract(a.gpx,'/gpx/trk/trkseg/trkpt', 'xmlns="http://www.topografix.com/GPX/1/1"'))) t
) pt,
(select 1 rn from dual union all select 2 from dual) n
order by pt.rn, n.rn
) AS mdsys.sdo_ordinate_array))
, xmlns = 'xmlns="http://www.topografix.com/GPX/1/1"'
WHERE a.gpx IS NOT NULL
And activity_id = p_activity_id;
l_num_rows := SQL%rowcount;
EXCEPTION
WHEN e_13034 OR e_29877 THEN
dbms_output.put_line('Exception:'||sqlerrm);
l_num_rows := 0;
END;
I have found it helpful to simplify the line geometry with sdo_util.simplify(). It removes some of the noise in the GPS data and has resolved problems with calculating the length of lines that intersect with areas.
BEGIN
UPDATE activities
SET geom = sdo_util.simplify(geom,1)
WHERE geom IS NOT NULL
And activity_id = p_activity_id;
l_num_rows := SQL%rowcount;
EXCEPTION
WHEN e_13034 THEN
dbms_output.put_line('Exception:'||sqlerrm);
END;
There are a few other fields I also update at this point. You will see me use them later.
- NUM_PTS is the number of points in the line geometry.
- GEOM_27700 is the result of converting the line to British National Grid reference coordinates. This helps when comparing it to British boundary data obtained from the Ordnance Survey or other government agencies.
- MBR is the minimum bounding rectangle for the line. This is generated to enable me to improve the performance of some spatial queries. I have found some of the spatial operators to calculate intersections between geometries are quite slow and CPU intensive when applied to GPS tracks and boundary data that both have lots of points. SDO_GEOM.SDO_MBR simply returns 4 ordinates that define the bounding rectangle. This can be used to roughly match geometries that might match before doing a proper match.
UPDATE activities
SET num_pts = SDO_UTIL.GETNUMVERTICES(geom)
, geom_27700 = sdo_cs.transform(geom,27700)
, mbr = sdo_geom.sdo_mbr(geom)
WHERE geom IS NOT NULL
And activity_id = p_activity_id
RETURNING num_pts INTO l_num_pts;
dbms_output.put_line('Activity ID:'||p_activity_id||', '||l_num_pts||' points');
…
set serveroutput on timi on
exec strava_pkg.load_activity(4468006769);
Loading Activity: 4468006769
ACTIVITIES/4468006769.gpx.gz - 1286238 bytes
xmlns 1=StravaGPX Android
Activity ID:4468006769, 998 points
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.41
No comments :
Post a Comment