User Tools

Site Tools


dev:search_changes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
dev:search_changes [2012/09/26 12:48] – created tsberedev:search_changes [2022/02/10 13:34] (current) – external edit 127.0.0.1
Line 96: Line 96:
  
 Always use the individual field_entry rows like we currently do, but look to see if "simple" is indexed on the field(s). If it is use *only* "simple" ts_config as an override. Otherwise build atoms like above. Always use the individual field_entry rows like we currently do, but look to see if "simple" is indexed on the field(s). If it is use *only* "simple" ts_config as an override. Otherwise build atoms like above.
 +
 +If the phrase has no * as the first or last character then word-boundary the start and end. Otherwise skip the word boundary where there is a *.
  
 === Negated phrase searches === === Negated phrase searches ===
Line 113: Line 115:
 Move to an all-in-one plperlu function (maybe eventually a C func?) that does the various rel bumps based on array inputs. These may be less needed with the new method of doing things. Move to an all-in-one plperlu function (maybe eventually a C func?) that does the various rel bumps based on array inputs. These may be less needed with the new method of doing things.
  
 +===== Example searches =====
 +
 +Some fairly simplified examples without full query output for explaining some of the above.
 +
 +NOTE: The examples below assume the current behavior of the - modifier.
 +
 +NOTE 2: The tsquery examples below are lazy examples and would actually be built with each atom passed through to_tsquery, which would stem them as appropriate.
 +
 +==== keyword: martin luther -king ====
 +
 +=== Current ===
 +
 +search metabib.keyword_field_entry for index_vector with tsquery "martin & luther & !king", rank with the same tsquery.
 +
 +Test Issue: If there are multiple keyword indexes with different weights and even one contains martin and luther but not king then the record will be returned.
 +
 +=== Proposed ===
 +
 +search metabib.combined_keyword_field_entry for index_vector with tsquery "martin & luther & !king" and metabib_field set to NULL. Join to metabib.keyword_field_entry based on the record ID for ranking using tsquery "martin | luther".
 +
 +Issues solved: The combined table has all of the atoms within it, so matching on it is a one-shot. We then go back to the non-combined for ranking, which I think should be faster in the long run as we only load the records we previously identified.
 +
 +==== title: the assist ====
 +
 +=== Current ===
 +
 +search metabib.title_field_entry for index_vector with tsquery "the & assist", rank with the same tsquery.
 +
 +Test Issue: 'assistant' stems to 'assist', and tends to come up more often.
 +
 +=== Proposed ===
 +
 +search metabib.combined_title_field_entry for index_vector with tsquery "the & assist" and metabib_field set to NULL. Join to metabib.title_field_entry based on the record ID for ranking using tsquery "the | assist".
 +
 +Issues solved: Note that the new stock config will likely have 'A' weight atoms for 'assist' where that was present exactly, but only 'C' or 'D' weight atoms for ones where 'assistant' was present, so 'assist' should weight higher by default.
 +
 +==== title: "the assist" ====
 +
 +=== Current ===
 +
 +search metabib.title_field_entry for index_vector with tsquery "the & assist" and a where regex clause looking for "the assist", rank with the tsquery.
 +
 +Test Issue: "the assistant" contains "the assist"
 +
 +=== Proposed ===
 +
 +search metabib.title_field_entry for index_vector with tsquery "the & assist" and a where regex clause looking for "<nowiki>[[:<:]]</nowiki>the assist<nowiki>[[:>:]]</nowiki>" (or equiv), rank with tsquery "the | assist".
 +
 +Issues solved: The word boundaries will ensure that "the assist" is not followed by "ant".
 +
 +==== title|general|eng: the assist ====
 +
 +=== Current ===
 +
 +search metabib.title_field_entry for index_vector with tsquery "the & assist" and field set to the ids of the general or eng title fields, rank with the tsquery.
 +
 +Test Issue: Generally the same as the standard title search, but limited to those two indexes.
 +
 +=== Proposed ===
 +
 +search metabib.combined_title_field_entry for a combined index_vector (string_agg(index_vector::text, ' ')::tsvector) with tsquery "the & assist" and metabib_field set to the ids of the general or eng title fields. Join to metabib.title_field_entry based on the record and field IDs for ranking using tsquery "the | assist".
 +
 +Issues Solved: Generally the same as the standard title search, but limiting to those two index.
dev/search_changes.1348678124.txt.gz · Last modified: 2022/02/10 13:34 (external edit)

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki

© 2008-2022 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later.
The Evergreen Project is a U.S. 501(c)3 non-profit organization.