This blog is part of a series about my first steps in using Spatial Data in the Oracle database. I am using the GPS data from my cycling activities collected by Strava. All of my files are available on GitHub.
I can generate the full hierarchy of an area with a PL/SQL function (strava_pkg.name_heirarchy_fn) by navigating up the linked list and discarding repeated names. I could make that available in a virtual column. However, I cannot build a text index on a function or a virtual column.
Text Index Option 1: Store Hierarchy on Table, and Create a Multi-Column Text Index
I could store the hierarchy of an area on the my_areas table, and generate the area from PL/SQL function strava_pkg. name_heirarchy_fn.
DECLARE
l_clob CLOB;
l_my_areas my_areas%ROWTYPE;
BEGIN
select m.*
into l_my_areas
FROM my_areas m
WHERE area_code = 'CPC'
And area_number = '40307';
dbms_output.put_line(strava_pkg.name_heirarchy_fn(l_my_areas.area_code,l_my_areas.area_number));
dbms_output.put_line(strava_pkg.name_heirarchy_fn(l_my_areas.parent_area_code,l_my_areas.parent_area_number));
END;
/
If I pass the code and number for a particular area, I can get its full hierarchy including its name. I can see the parish of Streatley, is in the Unitary Authority of West Berkshire, which is in England, and England is in the United Kingdom. If I pass the code and number of its parent, I just get the hierarchy up to its parent.
Streatley, West Berkshire, England, United Kingdom
West Berkshire, England, United Kingdom
I can store the hierarchy on my_areas, though I have to store results on a temporary table, rather than update it directly. Otherwise, I get a mutation error.
ALTER TABLE my_areas add name_heirarchy VARCHAR(4000)
/
CREATE GLOBAL TEMPORARY TABLE my_areas_temp ON COMMIT PRESERVE ROWS AS
SELECT area_code, area_number, strava_pkg.name_heirarchy_fn(parent_area_code,parent_area_number) name_heirarchy
FROM my_areas WHERE parent_area_code IS NOT NULL AND parent_area_number IS NOT NULL
/
MERGE INTO my_areas u
USING (SELECT * FROM my_areas_temp) s
ON (u.area_code = s.area_code AND u.area_number = s.area_number)
WHEN MATCHED THEN UPDATE
SET u.name_heirarchy = s.name_heirarchy
/
Then I can create a multi-column text index on the name
begin
ctx_ddl.create_preference('my_areas_lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('my_areas_lexer', 'mixed_case', 'NO');
ctx_ddl.create_preference('my_areas_datastore', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('my_areas_datastore', 'columns', 'name, name_heirarchy');
end;
/
CREATE INDEX my_areas_name_txtidx ON my_areas (name) INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore my_areas_datastore lexer my_areas_lexer sync(on commit)');
The index will sync if I have cause to update the hierarchy.
Text Index Option 2: Index a user_datastore based on the result of a PL/SQL function
Alternatively, I can build a text index on a combination of data from various sources by creating a PL/SQL procedure that combines the data and returns the string to be indexed.
I have created a procedure (strava_pkg.name_heirarchy_txtidx) that returns a string containing the hierarchy of a given area, and then I will create a text index on that. The format of the parameters must be exactly as follows:
- The rowid of the row being indexed is passed to the procedure;
- The string to be indexed is passed back as a CLOB parameter.
…
PROCEDURE name_heirarchy_txtidx
(p_rowid in rowid
,p_dataout IN OUT NOCOPY CLOB
) IS
l_count INTEGER := 0;
BEGIN
FOR i IN (
SELECT area_code, area_number, name, matchable
FROM my_areas m
START WITH rowid = p_rowid
CONNECT BY NOCYCLE prior m.parent_area_code = m.area_code
AND prior m.parent_area_number = m.area_number
) LOOP
IF i.matchable >= 1 THEN
l_count := l_count + 1;
IF l_count > 1 THEN
p_dataout := p_dataout ||', '|| i.name;
ELSE
p_dataout := i.name;
END IF;
END IF;
END LOOP;
END name_heirarchy_txtidx;
…
As an example, if I pass a particular rowid to the procedure, I obtain the full hierarchy of areas as before.
set serveroutput on
DECLARE
l_rowid ROWID;
l_clob CLOB;
BEGIN
select rowid
into l_rowid
FROM my_areas m
WHERE area_code = 'CPC'
And area_number = '40307';
strava_pkg.name_heirarchy_txtidx(l_rowid, l_clob);
dbms_output.put_line(l_clob);
END;
/
Streatley, West Berkshire, England, United Kingdom
PL/SQL procedure successfully completed.
The procedure is referenced as an attribute to a user datastore, I can then build a text index on the user datastore.
BEGIN
ctx_ddl.create_preference('my_areas_lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('my_areas_lexer', 'mixed_case', 'NO');
ctx_ddl.create_preference('my_areas_datastore', 'user_datastore');
ctx_ddl.set_attribute('my_areas_datastore', 'procedure', 'strava_pkg.name_heirarchy_txtidx');
ctx_ddl.set_attribute('my_areas_datastore', 'output_type', 'CLOB');
END;
/
CREATE INDEX my_areas_name_txtidx on my_areas (name) INDEXTYPE IS ctxsys.context
PARAMETERS ('datastore my_areas_datastore lexer my_areas_lexer');
Text Search examples
Both options produce an index that I can use in the same way. I can search for a particular name, for example, the village of Streatley.
SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
FROM my_areas m
WHERE CONTAINS(name,'streatley',1)>0
/
I get the two Streatleys, one in Berkshire, and the other in Bedfordshire.
SCORE(1) AREA AREA_NUMBER NAME SUFFIX NAME_HEIRARCHY
---------- ---- ----------- -------------------- ---------- ------------------------------------------------------------
16 CPC 41076 Streatley CP Streatley, Central Bedfordshire, England, United Kingdom
16 CPC 40307 Streatley CP Streatley, West Berkshire, England, United Kingdom
As I have indexed the full hierarchy, I can be more precise and search for both the village and the county, even though they are two different columns in the my_areas table.
SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
FROM my_areas m
WHERE CONTAINS(name,'streatley and berks%',1)>0
/
Now I just get one result. The Streatley in Berkshire.
SCORE(1) AREA AREA_NUMBER NAME SUFFIX NAME_HEIRARCHY
---------- ---- ----------- -------------------- ---------- ------------------------------------------------------------
11 CPC 40307 Streatley CP Streatley, West Berkshire, England, United Kingdom
Searching For the Top of Hierarchies
My search query works satisfactorily if my search identifies areas with no children, but supposing I search for something higher up the hierarchy, like Berkshire?
SELECT score(1), area_Code, area_number, name, suffix, name_heirarchy
FROM my_areas m
WHERE CONTAINS(name,'berkshire',1)>0
/
I get 184 areas, of different types within the areas called Berkshire, because the name of the parent area appears in the hierarchy of all the children and so is returned by the text index.
Area Area
SCORE(1) Code Number NAME SUFFIX NAME_HEIRARCHY
---------- ---- ----------- ------------------------- ---------- -----------------------------------------------------------
11 UTA 101678 Windsor and Maidenhead (B) Windsor and Maidenhead, Berkshire, England, United Kingdom
11 UTA 101680 Wokingham (B) Wokingham, Berkshire, England, United Kingdom
11 UTA 101681 Reading (B) Reading, Berkshire, England, United Kingdom
11 UTA 101685 West Berkshire West Berkshire, England, United Kingdom
11 UTW 40258 Norreys Ward Norreys, Wokingham, Berkshire, England, United Kingdom
11 UTW 40261 Barkham Ward Barkham, Wokingham, Berkshire, England, United Kingdom
…
However, I am just interested in the highest points on each part of the hierarchy I have identified. So, I exclude any result where its parent is also in the result set.
WITH x AS (
SELECT area_code, area_number, parent_area_code, parent_area_number, name, name_heirarchy
FROM my_areas m
WHERE CONTAINS(name,'berkshire',1)>0
) SELECT * FROM x WHERE NOT EXISTS (
SELECT 'x' FROM x x1
WHERE x1.area_code = x.parent_area_code
AND x1.area_number = x.parent_area_number
)
/
In this case, I still get two results because the boundaries of the unitary authority of West Berkshire are not entirely within the ceremonial county of Berkshire (some parts of Hungerford and Lambourne were exchanged with Wiltshire in 1990), hence I could not make Berkshire the parent of West Berkshire.
Area Area
Code Number SCORE NAME_HEIRARCHY
---- ----------- ---------- ------------------------------------------------------------
UTA 101685 11 West Berkshire, England, United Kingdom
CCTY 7 11 Berkshire, England, United Kingdom
Text Searching for Activities that pass through Areas
It is a simple extension to join the pre-processed areas through which activities pass to the areas found by the text search, and then exclude areas whose parent was also found in the same activity.
WITH x AS (
SELECT aa.activity_id, m.area_code, m.area_number, m.parent_area_code, m.parent_area_number, m.name, m.name_heirarchy
FROM my_areas m, activity_areas aa
WHERE m.area_Code = aa.area_code
AND m.area_number = aa.area_number
AND CONTAINS(name,'berkshire',1)>0
)
SELECT a.activity_id, a.activity_date, a.activity_name, a.activity_type, a.distance_km
, x.area_Code, x.area_number, x.name, x.name_heirarchy
FROM x, activities a
WHERE x.activity_id = a.activity_id
AND a.activity_date between TO_DATE('01022019','DDMMYYYY') and TO_DATE('28022019','DDMMYYYY')
AND NOT EXISTS (
SELECT 'x' FROM x x1
WHERE x1.area_code = x.parent_area_code
AND x1.area_number = x.parent_area_number
AND x1.activity_id = x.activity_id)
ORDER BY a.activity_date
/
Now I can see the rides in Berkshire in February 2019. I get two rows returned for the ride that was in both Berkshire and West Berkshire.
Activity Activity Activity Distance Area Area
ID Date ACTIVITY_NAME Type (km) Code Number NAME NAME_HEIRARCHY
---------- --------- --------------------------------------------- -------- -------- ---- ------ --------------- -------------------------
2156308823 17-FEB-19 MV - Aldworth, CLCTC Aldworth-Reading Ride 120.86 CCTY 7 Berkshire England, United Kingdom
2156308823 17-FEB-19 MV - Aldworth, CLCTC Aldworth-Reading Ride 120.86 UTA 101685 West Berkshire England, United Kingdom
2172794879 24-FEB-19 MV - Maidenhead Ride 48.14 CCTY 7 Berkshire England, United Kingdom
2173048214 24-FEB-19 CLCTC: Maidenhead - Turville Heath Ride 53.15 CCTY 7 Berkshire England, United Kingdom
2173048406 24-FEB-19 Maidenhead - Burnham Beeches - West Drayton Ride 27.92 CCTY 7 Berkshire England, United Kingdom
…
References
I found these references useful while creating the Text index:- Boyko Dimitrov: Full text search across multiple database columns with Oracle Text
- Oracle Blog about Oracle Text: Getting started Part 3 - Index maintenance
- Jonathan Lewis (at Redgate): Text Indexes
No comments :
Post a Comment