documentation:indexing
                Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| documentation:indexing [2012/10/17 09:40] – [Normalization Functions] typo fix csharp | documentation:indexing [2022/02/10 13:34] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | FIXME - Add Information on Virtual Index Definitions in 3.1+ | ||
| + | |||
| ======= Bibliographic Indexing in Evergreen ======== | ======= Bibliographic Indexing in Evergreen ======== | ||
| Indexing and searching bibliographic data in Evergreen are complex processes. | Indexing and searching bibliographic data in Evergreen are complex processes. | ||
| Line 100: | Line 102: | ||
| </ | </ | ||
| - | Each indexed datum coming from a bibliographic record in Evergreen is extracted based on an Indexed Field Definition. | + | Each indexed datum coming from a bibliographic record in Evergreen is extracted based on an Indexed Field Definition. | 
| ==== Field Class ==== | ==== Field Class ==== | ||
| Line 111: | Line 113: | ||
| Indexed data from different fields will probably be considered to have different importance when calculating the relevance of a matched query term.  For instance, a match in a translated title may be considered less important than a match in the title proper. | Indexed data from different fields will probably be considered to have different importance when calculating the relevance of a matched query term.  For instance, a match in a translated title may be considered less important than a match in the title proper. | ||
| - | By supplying a higher or lower relative **weight**, one field can be made more or less important, in relevance ranking terms, than others. | + | By supplying a higher or lower relative **weight**, one field can be made more or less important, in relevance ranking terms, than others. | 
| Evergreen ships with all Indexed Field Definition weights set to 1 by default. | Evergreen ships with all Indexed Field Definition weights set to 1 by default. | ||
| Line 163: | Line 165: | ||
| First, these aliases provide a mechanism for internationalizing the user-supplied search constraints; | First, these aliases provide a mechanism for internationalizing the user-supplied search constraints; | ||
| - | In a similar manner, aliases can be used to map [[http:// | + | In a similar manner, aliases can be used to map [[http:// | 
| Line 195: | Line 197: | ||
| Normalizer functions are in-database stored procedures, and can be written in any programming language supported by Postgres. | Normalizer functions are in-database stored procedures, and can be written in any programming language supported by Postgres. | ||
| - | Twelve | + | Twenty-one | 
| ^ Name ^ Description ^ | ^ Name ^ Description ^ | ||
| + | |Approximate High Date Normalize|Normalize the value to the nearest date-ish value, rounding up| | ||
| + | |Approximate Low Date Normalize|Normalize the value to the nearest date-ish value, rounding down| | ||
| + | |Coded Value Map Normalizer|Applies coded_value_map mapping of values| | ||
| |Down-case|Convert text lower case.| | |Down-case|Convert text lower case.| | ||
| |Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | |Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | ||
| |First word|Include only the first space-separated word of a string.| | |First word|Include only the first space-separated word of a string.| | ||
| + | |Generic Mapping Normalizer|Map values or sets of values to new values.| | ||
| |ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | |ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | ||
| |Left truncation|Discard the specified number of characters from the left side of the string.| | |Left truncation|Discard the specified number of characters from the left side of the string.| | ||
| - | |NACO Normalize|Apply NACO normalization rules to the extracted text.  See http:// | + | |NACO Normalize|Apply NACO normalization rules to the extracted text.  See https:// | 
| - | |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. | + | |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. | 
| |Normalize date range|Split date ranges in the form of " | |Normalize date range|Split date ranges in the form of " | ||
| - | |Replace|Replace all occurances | + | |Normalize date range|Normalize the value to NULL if it is not a number| | 
| + | |Replace|Replace all occurrences | ||
| + | |Remove Parenthesized Substring|Remove any parenthesized substrings from the extracted text, such as the agency code preceding authority record control numbers in subfield 0.| | ||
| |Right truncation|Include only the specified number of characters from the left side of the string.| | |Right truncation|Include only the specified number of characters from the left side of the string.| | ||
| + | |Search Normalize|Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.| | ||
| |Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | |Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | ||
| + | |Trim Surrounding Space|Trim leading and trailing spaces from extracted text.| | ||
| + | |Trim Trailing Punctuation|Eliminate extraneous trailing commas and periods in text.| | ||
| |Up-case|Convert text upper case.| | |Up-case|Convert text upper case.| | ||
documentation/indexing.1350481218.txt.gz · Last modified: 2022/02/10 13:33 (external edit)