This blog is part of a series about my first steps in using Spatial Data in the Oracle database. I am using the GPS data from my cycling activities collected by Strava. All of my files are available on GitHub.
I have loaded basic data for all countries, and detailed data for the UK and other countries where I have recorded activities. The next step is to determine which activities pass through which areas. Generically, the question is simply whether one geometry intersects with another. I can test this in SQL with the sdo_geom.relate() function.
WHERE SDO_GEOM.RELATE(a.geom,'anyinteract',m.geom) = 'TRUE'
However, working out whether an activity, with several thousand points, is within an area defined with several thousand points can be CPU intensive and time-consuming. Larger areas such as UK counties average over 20,000 points.
I have 60,000 defined areas, of which, over 20,000 of which are for the UK. I have 2700 activities recorded on Strava, with an average of 2700 points, but some have over 10,000 points. It isn't viable to compare every activity with every area. Comparing these large geometries can take a significant time, too long to do the spatial queries every time I want to interrogate the data, and too long for an on-line application.
Pre-processing Geometry Intersections
However, the data, once loaded is static. Definitions of areas can change, but it is rare. Activities do not change. Therefore, I have decided to pre-process the data to produce a table of matching activities and areas.
CREATE TABLE activity_areas
(activity_id NUMBER NOT NULL
,area_code VARCHAR2(4) NOT NULL
,area_number NUMBER NOT NULL
,geom_length NUMBER
,CONSTRAINT ACTIVITY_AREAS_PK PRIMARY KEY (activity_id, area_code, area_number)
,CONSTRAINT ACTIVITY_AREAS_FK FOREIGN KEY (activity_id) REFERENCES ACTIVITIES (activity_id)
,CONSTRAINT ACTIVITY_AREAS_FK2 FOREIGN KEY (area_code, area_number)
REFERENCES MY_AREAS (area_code, area_number)
);
Recursive Search
I have written the search as a PL/SQL procedure to search areas that match a particular activity.- I pass the ID of the activity to be processed to the procedure.
- I can specify the area code and number, or the parent area code and number, at which to search through the areas. I usually leave them to default to null so the search starts with areas at the root of the hierarchy that therefore have no parents (i.e. sovereign countries).
- The procedure then calls itself recursively for each area that it finds matches the activity, to search its children. This way, I limit the total number of comparisons required.
- For every area and activity, I have calculated the minimum bounding rectangle using sdo_geom.sdo_mbr() and stored it in another geometry column on the same row. This geometry contains just 5 points (the last point is the same as the first to close the rectangle). I can compare two rectangles very quickly, and if they don't intersect overlap then there is no need to see if the actual geometries overlap. This approach filters out geometries that cannot match, so that fewer geometries then have to be compared in full, thus significantly improving the performance of the search.
AND SDO_GEOM.RELATE(a.mbr,'anyinteract',m.mbr) = 'TRUE'
- I have found that it is necessary to have the MBR comparison earlier in the predicate clauses than the GEOM comparison.
…
PROCEDURE activity_area_search
(p_activity_id INTEGER
,p_area_code my_areas.area_code%TYPE DEFAULT NULL
,p_area_number my_areas.area_number%TYPE DEFAULT NULL
,p_query_type VARCHAR2 DEFAULT 'P'
,p_level INTEGER DEFAULT 0
) IS
BEGIN
FOR i IN(
SELECT m.*
, CASE WHEN m.geom_27700 IS NOT NULL THEN sdo_geom.sdo_length(SDO_GEOM.sdo_intersection(m.geom_27700,a.geom_27700,5), unit=>'unit=km')
WHEN m.geom IS NOT NULL THEN sdo_geom.sdo_length(SDO_GEOM.sdo_intersection(m.geom,a.geom,5), unit=>'unit=km')
END geom_length
, (SELECT MIN(m2.area_level) FROM my_areas m2
WHERE m2.parent_area_code = m.area_code AND m2.parent_area_number = m.area_number) min_child_level
FROM my_areas m
, activities a
WHERE ( (p_query_type = 'P' AND parent_area_code = p_area_code AND parent_area_number = p_area_number)
OR (p_query_type = 'A' AND area_code = p_area_code AND area_number = p_area_number)
OR (p_query_type = 'A' AND p_area_number IS NULL AND area_code = p_area_code)
OR (p_area_code IS NULL AND p_area_number IS NULL AND parent_area_code IS NULL AND parent_area_number IS NULL))
AND a.activity_id = p_activity_id
and SDO_GEOM.RELATE(a.mbr,'anyinteract',m.mbr) = 'TRUE'
and SDO_GEOM.RELATE(a.geom,'anyinteract',m.geom) = 'TRUE'
) LOOP
IF i.area_level>0 OR i.num_children IS NULL THEN
BEGIN
INSERT INTO activity_areas
(activity_id, area_code, area_number, geom_length)
VALUES
(p_activity_id, i.area_code, i.area_number, i.geom_length);
EXCEPTION
WHEN dup_val_on_index THEN
UPDATE activity_areas
SET geom_length = i.geom_length
WHERE activity_id = p_activity_id
AND area_code = i.area_code
AND area_number = i.area_number;
END;
END IF;
IF i.num_children > 0 THEN
strava_pkg.activity_area_search(p_activity_id, i.area_code, i.area_number, 'P', p_level+1);
END IF;
END LOOP;
END activity_area_search;
…
SQL> exec strava_pkg.activity_area_search(4372796838);
Searching 4372796838:-
Found SOV-1159320701:United Kingdom, 2.895 km
.Searching 4372796838:SOV-1159320701
.Found GEOU-1159320743:England, 2.851 km
..Searching 4372796838:GEOU-1159320743
..Found GLA-117537:Greater London, 2.851 km
...Searching 4372796838:GLA-117537
...Found LBO-50724:City of Westminster, 1.732 km
....Searching 4372796838:LBO-50724
....Found LBW-117484:Abbey Road, 1.435 km
....Found LBW-50639:Maida Vale, 0.298 km
....Done 4372796838:LBO-50724: 0.415 secs).
...Found LBO-50632:Camden, 1.119 km
....Searching 4372796838:LBO-50632
....Found LBW-117286:Kilburn, 0.273 km
....Found LBW-117288:Swiss Cottage, 1.033 km
....Found LBW-117287:West Hampstead, 0.084 km
....Done 4372796838:LBO-50632: 0.521 secs).
...Done 4372796838:GLA-117537: 3.368 secs).
..Done 4372796838:GEOU-1159320743: 4.372 secs).
.Done 4372796838:SOV-1159320701: 4.750 secs).
Done 4372796838:-: 5.532 secs).
PL/SQL procedure successfully completed.
--process unmatched activities
set pages 99 lines 180 timi on serveroutput on
column activity_name format a60
BEGIN
FOR i IN (
SELECT a.activity_id, activity_date, activity_name
, distance_km, num_pts, ROUND(num_pts/NULLIF(distance_km,0),0) ppkm
FROM activities a
WHERE activity_id NOT IN (SELECT DISTINCT activity_id FROM activity_areas)
AND num_pts>0
) LOOP
strava_pkg.activity_area_search(i.activity_id);
commit;
END LOOP;
END;
/