documentation:indexing
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
documentation:indexing [2012/10/17 09:40] – [Normalization Functions] typo fix csharp | documentation:indexing [2022/02/10 13:34] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | FIXME - Add Information on Virtual Index Definitions in 3.1+ | ||
+ | |||
======= Bibliographic Indexing in Evergreen ======== | ======= Bibliographic Indexing in Evergreen ======== | ||
Indexing and searching bibliographic data in Evergreen are complex processes. | Indexing and searching bibliographic data in Evergreen are complex processes. | ||
Line 100: | Line 102: | ||
</ | </ | ||
- | Each indexed datum coming from a bibliographic record in Evergreen is extracted based on an Indexed Field Definition. | + | Each indexed datum coming from a bibliographic record in Evergreen is extracted based on an Indexed Field Definition. |
==== Field Class ==== | ==== Field Class ==== | ||
Line 111: | Line 113: | ||
Indexed data from different fields will probably be considered to have different importance when calculating the relevance of a matched query term. For instance, a match in a translated title may be considered less important than a match in the title proper. | Indexed data from different fields will probably be considered to have different importance when calculating the relevance of a matched query term. For instance, a match in a translated title may be considered less important than a match in the title proper. | ||
- | By supplying a higher or lower relative **weight**, one field can be made more or less important, in relevance ranking terms, than others. | + | By supplying a higher or lower relative **weight**, one field can be made more or less important, in relevance ranking terms, than others. |
Evergreen ships with all Indexed Field Definition weights set to 1 by default. | Evergreen ships with all Indexed Field Definition weights set to 1 by default. | ||
Line 163: | Line 165: | ||
First, these aliases provide a mechanism for internationalizing the user-supplied search constraints; | First, these aliases provide a mechanism for internationalizing the user-supplied search constraints; | ||
- | In a similar manner, aliases can be used to map [[http:// | + | In a similar manner, aliases can be used to map [[http:// |
Line 195: | Line 197: | ||
Normalizer functions are in-database stored procedures, and can be written in any programming language supported by Postgres. | Normalizer functions are in-database stored procedures, and can be written in any programming language supported by Postgres. | ||
- | Twelve | + | Twenty-one |
^ Name ^ Description ^ | ^ Name ^ Description ^ | ||
+ | |Approximate High Date Normalize|Normalize the value to the nearest date-ish value, rounding up| | ||
+ | |Approximate Low Date Normalize|Normalize the value to the nearest date-ish value, rounding down| | ||
+ | |Coded Value Map Normalizer|Applies coded_value_map mapping of values| | ||
|Down-case|Convert text lower case.| | |Down-case|Convert text lower case.| | ||
|Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | |Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | ||
|First word|Include only the first space-separated word of a string.| | |First word|Include only the first space-separated word of a string.| | ||
+ | |Generic Mapping Normalizer|Map values or sets of values to new values.| | ||
|ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | |ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | ||
|Left truncation|Discard the specified number of characters from the left side of the string.| | |Left truncation|Discard the specified number of characters from the left side of the string.| | ||
- | |NACO Normalize|Apply NACO normalization rules to the extracted text. See http:// | + | |NACO Normalize|Apply NACO normalization rules to the extracted text. See https:// |
- | |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. | + | |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. |
|Normalize date range|Split date ranges in the form of " | |Normalize date range|Split date ranges in the form of " | ||
- | |Replace|Replace all occurances | + | |Normalize date range|Normalize the value to NULL if it is not a number| |
+ | |Replace|Replace all occurrences | ||
+ | |Remove Parenthesized Substring|Remove any parenthesized substrings from the extracted text, such as the agency code preceding authority record control numbers in subfield 0.| | ||
|Right truncation|Include only the specified number of characters from the left side of the string.| | |Right truncation|Include only the specified number of characters from the left side of the string.| | ||
+ | |Search Normalize|Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.| | ||
|Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | |Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | ||
+ | |Trim Surrounding Space|Trim leading and trailing spaces from extracted text.| | ||
+ | |Trim Trailing Punctuation|Eliminate extraneous trailing commas and periods in text.| | ||
|Up-case|Convert text upper case.| | |Up-case|Convert text upper case.| | ||
documentation/indexing.txt · Last modified: 2022/02/10 13:34 by 127.0.0.1