documentation:indexing
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
documentation:indexing [2016/07/27 09:39] – [Field Aliases] rjs7 | documentation:indexing [2022/02/10 13:34] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | FIXME - Add Information on Virtual Index Definitions in 3.1+ | ||
+ | |||
======= Bibliographic Indexing in Evergreen ======== | ======= Bibliographic Indexing in Evergreen ======== | ||
Indexing and searching bibliographic data in Evergreen are complex processes. | Indexing and searching bibliographic data in Evergreen are complex processes. | ||
Line 195: | Line 197: | ||
Normalizer functions are in-database stored procedures, and can be written in any programming language supported by Postgres. | Normalizer functions are in-database stored procedures, and can be written in any programming language supported by Postgres. | ||
- | Twelve | + | Twenty-one |
^ Name ^ Description ^ | ^ Name ^ Description ^ | ||
+ | |Approximate High Date Normalize|Normalize the value to the nearest date-ish value, rounding up| | ||
+ | |Approximate Low Date Normalize|Normalize the value to the nearest date-ish value, rounding down| | ||
+ | |Coded Value Map Normalizer|Applies coded_value_map mapping of values| | ||
|Down-case|Convert text lower case.| | |Down-case|Convert text lower case.| | ||
|Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | |Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | ||
|First word|Include only the first space-separated word of a string.| | |First word|Include only the first space-separated word of a string.| | ||
+ | |Generic Mapping Normalizer|Map values or sets of values to new values.| | ||
|ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | |ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | ||
|Left truncation|Discard the specified number of characters from the left side of the string.| | |Left truncation|Discard the specified number of characters from the left side of the string.| | ||
Line 206: | Line 212: | ||
|NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. | |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. | ||
|Normalize date range|Split date ranges in the form of " | |Normalize date range|Split date ranges in the form of " | ||
- | |Replace|Replace all occurances | + | |Normalize date range|Normalize the value to NULL if it is not a number| |
+ | |Replace|Replace all occurrences | ||
+ | |Remove Parenthesized Substring|Remove any parenthesized substrings from the extracted text, such as the agency code preceding authority record control numbers in subfield 0.| | ||
|Right truncation|Include only the specified number of characters from the left side of the string.| | |Right truncation|Include only the specified number of characters from the left side of the string.| | ||
+ | |Search Normalize|Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.| | ||
|Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | |Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | ||
+ | |Trim Surrounding Space|Trim leading and trailing spaces from extracted text.| | ||
+ | |Trim Trailing Punctuation|Eliminate extraneous trailing commas and periods in text.| | ||
|Up-case|Convert text upper case.| | |Up-case|Convert text upper case.| | ||
documentation/indexing.1469626776.txt.gz · Last modified: 2022/02/10 13:33 (external edit)