documentation:indexing
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| documentation:indexing [2016/07/27 09:39] – [Field Aliases] rjs7 | documentation:indexing [2022/02/10 13:34] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | FIXME - Add Information on Virtual Index Definitions in 3.1+ | ||
| + | |||
| ======= Bibliographic Indexing in Evergreen ======== | ======= Bibliographic Indexing in Evergreen ======== | ||
| Indexing and searching bibliographic data in Evergreen are complex processes. | Indexing and searching bibliographic data in Evergreen are complex processes. | ||
| Line 195: | Line 197: | ||
| Normalizer functions are in-database stored procedures, and can be written in any programming language supported by Postgres. | Normalizer functions are in-database stored procedures, and can be written in any programming language supported by Postgres. | ||
| - | Twelve | + | Twenty-one |
| ^ Name ^ Description ^ | ^ Name ^ Description ^ | ||
| + | |Approximate High Date Normalize|Normalize the value to the nearest date-ish value, rounding up| | ||
| + | |Approximate Low Date Normalize|Normalize the value to the nearest date-ish value, rounding down| | ||
| + | |Coded Value Map Normalizer|Applies coded_value_map mapping of values| | ||
| |Down-case|Convert text lower case.| | |Down-case|Convert text lower case.| | ||
| |Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | |Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | ||
| |First word|Include only the first space-separated word of a string.| | |First word|Include only the first space-separated word of a string.| | ||
| + | |Generic Mapping Normalizer|Map values or sets of values to new values.| | ||
| |ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | |ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | ||
| |Left truncation|Discard the specified number of characters from the left side of the string.| | |Left truncation|Discard the specified number of characters from the left side of the string.| | ||
| Line 206: | Line 212: | ||
| |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. | |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. | ||
| |Normalize date range|Split date ranges in the form of " | |Normalize date range|Split date ranges in the form of " | ||
| - | |Replace|Replace all occurances | + | |Normalize date range|Normalize the value to NULL if it is not a number| |
| + | |Replace|Replace all occurrences | ||
| + | |Remove Parenthesized Substring|Remove any parenthesized substrings from the extracted text, such as the agency code preceding authority record control numbers in subfield 0.| | ||
| |Right truncation|Include only the specified number of characters from the left side of the string.| | |Right truncation|Include only the specified number of characters from the left side of the string.| | ||
| + | |Search Normalize|Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.| | ||
| |Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | |Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | ||
| + | |Trim Surrounding Space|Trim leading and trailing spaces from extracted text.| | ||
| + | |Trim Trailing Punctuation|Eliminate extraneous trailing commas and periods in text.| | ||
| |Up-case|Convert text upper case.| | |Up-case|Convert text upper case.| | ||
documentation/indexing.1469626776.txt.gz · Last modified: 2022/02/10 13:33 (external edit)