scratchpad:brush_up_search
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
scratchpad:brush_up_search [2016/04/23 20:09] – created klussier | scratchpad:brush_up_search [2022/02/10 13:34] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 3: | Line 3: | ||
=====Apostrophe Searching===== | =====Apostrophe Searching===== | ||
- | **The problem:** Evergreen uses a modified NACO normalization scheme to better handle apostrophe searching for French records. The problem is that this normalization scheme doesn' | + | **The problem:** Evergreen uses a modified NACO normalization scheme to better handle apostrophe searching for French records. The problem is that this normalization scheme doesn' |
- | **Who should consider this tweak:** Evergreen sites where the majority of records are in English and are okay with search behavior where a search for a French word like ' | + | **Who should consider this adjustment:** Evergreen sites with a database |
**How To:** | **How To:** | ||
- | - Add mappings to NACO Normalize for all indexes that are currently mapped to the default Search Normalize function. | + | Add mappings to NACO Normalize for all indexes that are currently mapped to the default Search Normalize function. |
+ | <code sql> | ||
+ | UPDATE config.metabib_field_index_norm_map a | ||
+ | SET norm = 1 | ||
+ | FROM (SELECT id,norm FROM config.metabib_field_index_norm_map) AS subquery | ||
+ | WHERE subquery.norm = 17 and a.id = subquery.id; | ||
+ | </ | ||
+ | A [[scratchpad: | ||
+ | =====Synonym Dictionary===== | ||
+ | |||
+ | **The problem:** Although stemming can allow users to find records with some variations of their search terms, it will only find variations that share the same stems. There are other common word variations (e.g. color/ | ||
+ | |||
+ | **Who should consider this adjustment: | ||
+ | |||
+ | **About Postgres dictionaries: | ||
+ | |||
+ | See [[http:// | ||
+ | |||
+ | Dictionaries are used to eliminate words that should not be considered in a search (stop words), and to normalize words so that different derived forms of the same word will match. A successfully normalized word is called a lexeme. Aside from improving search quality, normalization and removal of stop words reduce the size of the tsvector representation of a document, thereby improving performance. | ||
+ | |||
+ | PostgreSQL provides predefined dictionaries for many languages. There are also several predefined templates that can be used to create new dictionaries with custom parameters. | ||
+ | |||
+ | The synonym dictionary template is used to create dictionaries that replace a word with a synonym. Phrases are not supported. | ||
+ | |||
+ | Note: after creating a new dictionary or adding to an existing dictionary, [[scratchpad: | ||
+ | |||
+ | |||
+ | **How to:** | ||
+ | |||
+ | Below are the steps used at the North of Boston Library Exchange (NOBLE) when creating a synonym dictionary. You can replace the use of the word ' | ||
+ | |||
+ | **1. Create our own synonym dictionary on disk** | ||
+ | |||
+ | Copy the sample dictionary or create a new file: | ||
+ | |||
+ | cd / | ||
+ | sudo cp synonym_sample.syn synonym_noble.syn | ||
+ | |||
+ | The NOBLE test file looks like this: | ||
+ | |||
+ | < | ||
+ | color colour | ||
+ | colour color | ||
+ | 19th nineteenth | ||
+ | nineteenth 19th | ||
+ | 20th twentieth | ||
+ | twentieth 20th | ||
+ | indices index* | ||
+ | </ | ||
+ | |||
+ | An asterisk (*) can be placed at the end of a synonym in the configuration file. This indicates that the synonym is a prefix. | ||
+ | |||
+ | |||
+ | **2. Create a dictionary in the public schema** | ||
+ | |||
+ | <code sql> | ||
+ | psql -U evergreen -h localhost | ||
+ | CREATE TEXT SEARCH DICTIONARY public.synonym_noble (template=pg_catalog.synonym, | ||
+ | </ | ||
+ | |||
+ | This command creates a dictionary based on the template ‘pg_catalog.synonym’. | ||
+ | |||
+ | A synonym dictionary replaces one word with another word. Phrases are not supported. | ||
+ | |||
+ | < | ||
+ | color colour | ||
+ | colour color | ||
+ | </ | ||
+ | |||
+ | To see all dictionaries in the Evergreen database: | ||
+ | |||
+ | <code sql> | ||
+ | select * from pg_ts_dict; | ||
+ | </ | ||
+ | |||
+ | There are more dictionaries in the pg_catalog schema which you can see from the psql shell invoked as the evergreen user. | ||
+ | |||
+ | Note: the postgres user’s password is disabled and needs to remain that way. | ||
+ | |||
+ | <code sql> | ||
+ | % psql -U evergreen -h localhost | ||
+ | evergreen=# \dFd (show dictionaries) | ||
+ | evergreen=# \dF (show configurations) | ||
+ | evergreen=# \q | ||
+ | </ | ||
+ | |||
+ | Test a dictionary by passing a term to a dictionary: | ||
+ | <code sql> | ||
+ | select ts_lexize(' | ||
+ | | ||
+ | ----------- | ||
+ | | ||
+ | </ | ||
+ | |||
+ | **3. Create a Configuration in the public schema** | ||
+ | |||
+ | A text search configuration binds a parser together with a set of dictionaries to process the parser' | ||
+ | |||
+ | Create the new configuration using the default configuration as a template: | ||
+ | |||
+ | <code sql> | ||
+ | psql -U evergreen -h localhost | ||
+ | CREATE TEXT SEARCH CONFIGURATION public.synonym_noble (copy=default); | ||
+ | </ | ||
+ | |||
+ | The copy command (copy=default) specifies the configuration to copy to create this new configuration. | ||
+ | |||
+ | To see all configurations in the Evergreen database: | ||
+ | |||
+ | <code sql> | ||
+ | select * from pg_ts_config; | ||
+ | </ | ||
+ | |||
+ | After copying the default configuration, | ||
+ | |||
+ | <code sql> | ||
+ | ALTER TEXT SEARCH CONFIGURATION public.synonym_noble | ||
+ | ALTER MAPPING FOR asciiword | ||
+ | WITH synonym_noble; | ||
+ | |||
+ | ALTER TEXT SEARCH CONFIGURATION public.synonym_noble | ||
+ | ALTER MAPPING FOR asciihword | ||
+ | WITH synonym_noble; | ||
+ | |||
+ | ALTER TEXT SEARCH CONFIGURATION public.synonym_noble | ||
+ | ALTER MAPPING FOR hword_asciipart | ||
+ | WITH synonym_noble; | ||
+ | |||
+ | </ | ||
+ | |||
+ | You can change all the mappings with one command: | ||
+ | |||
+ | <code sql> | ||
+ | ALTER TEXT SEARCH CONFIGURATION public.synonym_noble | ||
+ | ALTER MAPPING FOR asciiword, asciihword, hword_asciipart | ||
+ | WITH synonym_noble; | ||
+ | </ | ||
+ | |||
+ | To see the new configuration in the Evergreen database go into the psql shell: | ||
+ | |||
+ | <code sql> | ||
+ | % psql -U evergreen -h localhost | ||
+ | evergreen=# \dF+ synonym_noble | ||
+ | |||
+ | |||
+ | Text search configuration " | ||
+ | Parser: " | ||
+ | Token | ||
+ | -----------------+--------------- | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | **4.Create config.ts_config_list** | ||
+ | |||
+ | This table in Evergreen’s config schema lists each full-text configuration that will be referenced in config.metabib_class_ts_map. | ||
+ | |||
+ | <code sql> | ||
+ | |||
+ | INSERT into config.ts_config_list values (' | ||
+ | |||
+ | </ | ||
+ | |||
+ | Verify the addition of the dictionary to the map: | ||
+ | |||
+ | <code sql> | ||
+ | select * from config.ts_config_list; | ||
+ | |||
+ | | ||
+ | -------------------+-------------------- | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | **5. Create a mapping in config.metabib_class_ts_map** | ||
+ | |||
+ | This mapping table relates each metabib class (keyword, title etc.) with the configuration that sends the indexing through the specified dictionary. | ||
+ | |||
+ | <code sql> | ||
+ | INSERT into config.metabib_class_ts_map (field_class, | ||
+ | (' | ||
+ | (' | ||
+ | (' | ||
+ | </ | ||
+ | |||
+ | **6. [[scratchpad: | ||
+ | |||
+ | See sample [[scratchpad: |
scratchpad/brush_up_search.1461456596.txt.gz · Last modified: 2022/02/10 13:33 (external edit)