User Tools

Site Tools


dev:proposal:search_modifications

Search Modifications

Summary

This feature consists of adding a few changes to the way Evergreen performs searches via the QueryParser.pm module.

Search modifications will alter the way searches perform in the following ways:

1. Exact Match: Now exactly matches what is typed in under normalization rules.

        Example:
                * Before: Search for: title| exact match | horses | results found 44: Horses, Runway horses., Crazy over horses, etc...
                * Now: Search for: title | exact match | horses | results found 1: Horses
              
                * Before: Search for: subject| exact match | art greek | results found 5: Greek Art, The Development of Attic black-figure, A pillage of art, etc....
                * Now: Search for: subject| exact match | art greek | results found 3: Greek art, A handbook of Greek Art, and The art of Crete and early Greece.

2. Contains Phrase: Will now require that the phrase appear in the index being searched on. (This feature appears to be in master at the moment. The only change, now, will be that the phrase is normalized and checked against a normalized table of values).

         Example:
               * Before: subject | contains phrase | art greek | results found 1: Greek Art
               * Now: subject | contains phrase | art greek | results found 3: Greek art, The art of Crete.., A handbook of Greek art

3. Normalized Indexes: Some indexes contain content that are all normalized characters. This will detect that anomaly and force a search without using text normalization.

         Example:
               * Before: contains | !!! | results found 0: No results found
               * Now:  contains | !!! | results found 1: !!!

These changes will be organization specific and can be turned on or off. The change will be added to the Admin (.) - Local Administration - Library Settings Editor menu.

  • Configuration changes added to 950.seed_values in SQL build scripts. Upgrade script is available in upgrades.

All changes will be done to the QueryParser.pm perl module at /OpenILS/Application/Storage/Driver/Pg/Queryparser.pm.

         Functions Added:
               * /Application/Storage/Queryparser.pm - search_mods
                        - Creates an object for query parser to track if search mods is on or off.
               * /Application/Storage/Driver/Pg/QueryParser.pm - search_mod
                       - Creates additions for main SQL statement returned from toSQL. Specifically modifies anything created within the flatten sub routine.
               * /Application/Storage/Driver/Pg/QueryParser.pm - naco_normalize
                       - Makes a call to the database to normalize a string using search_normalize stored procedure.
               * /Application/Storage/Driver/Pg/QueryParser.pm - remove_search_characters
                       - Removes search characters that dictate what type of search is to be performed on the query.
               * /Application/Storage/Driver/Pg/QueryParser.pm - quote_value
                       - Escapes all characters for SQL consumption.
               * /WWW/EGCatLoader/Search.pm - get_search_mod
                       - Gets search modification setting.
          Functions Modified:
               *  /Application/Storage/Queryparser.pm - new
                       - Added a check for search mods to either create or not create the element inside queryparser.
               *  /Application/Storage/Driver/Pg/QueryParser.pm - flatten
                       - Appends pieces of SQL queries to the $from and $where strings.
               *  /Application/Storage/Publisher/metabib.pm - query_parser_fts_wrapper
                       - Passes search_mods setting on to query_parser_fts.
               *  /WWW/EGCatLoader/Search.pm - load_rresults
                       - Collection information about search_mid configuration and load it into $args for getting $results. 
         

Remaining changes will be to the database, they include adding 5 tables of normalized text fields that are created from triggers from the field_entry tables. The text fields will be indexed using an extension called Pg_trgm.

  • Added 5 new tables, these tables are mapped to the *_field_entry tables. Each table contains an id which is one to one with the matching *_field_entry table. The other two columns contain the source and a value, which is a normalized value of the value field in the *_field_entry table.
  • Added an extension to index the normalized_*_field_entry tables. The index allows searching using the LIKE operator and is optimized for phrase matching as well as fuzzy matching. For more information on pg_trgm visit this site Postgres Pg_Trgm.
  • Added a trigger to populate the normalized_*_field_entry tables from the *_field_entry tables.

Pull Request

Blue Print

Deliverable

  1. Option to have searches perform differently.
  2. Newly Indexed tables for more searching options. (In future releases)
    • Ability to use an indexed search on tables with LIKE and ILIKE
    • New operator to fuzzy match values, offering the ability to recognize spelling errors.
    • Relevance functions that will score results if they are spelled incorrectly. ("neighbor" will be scored similarly to "nieghbor")
dev/proposal/search_modifications.txt · Last modified: 2022/02/10 13:34 by 127.0.0.1

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki

© 2008-2022 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later.
The Evergreen Project is a U.S. 501(c)3 non-profit organization.