documentation:indexing
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionLast revisionBoth sides next revision | ||
documentation:indexing [2010/06/15 16:14] – created miker | documentation:indexing [2018/04/25 13:20] – klussier | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ======= Indexing in Evergreen ======== | + | FIXME - Add Information on Virtual Index Definitions in 3.1+ |
+ | |||
+ | ======= | ||
Indexing and searching bibliographic data in Evergreen are complex processes. | Indexing and searching bibliographic data in Evergreen are complex processes. | ||
Line 100: | Line 102: | ||
</ | </ | ||
- | Each indexed datum coming from a bibliographic record in Evergreen is extracted based on an Indexed Field Definition. | + | Each indexed datum coming from a bibliographic record in Evergreen is extracted based on an Indexed Field Definition. |
==== Field Class ==== | ==== Field Class ==== | ||
Line 111: | Line 113: | ||
Indexed data from different fields will probably be considered to have different importance when calculating the relevance of a matched query term. For instance, a match in a translated title may be considered less important than a match in the title proper. | Indexed data from different fields will probably be considered to have different importance when calculating the relevance of a matched query term. For instance, a match in a translated title may be considered less important than a match in the title proper. | ||
- | By supplying a higher or lower relative **weight**, one field can be made more or less important, in relevance ranking terms, than others. | + | By supplying a higher or lower relative **weight**, one field can be made more or less important, in relevance ranking terms, than others. |
Evergreen ships with all Indexed Field Definition weights set to 1 by default. | Evergreen ships with all Indexed Field Definition weights set to 1 by default. | ||
Line 163: | Line 165: | ||
First, these aliases provide a mechanism for internationalizing the user-supplied search constraints; | First, these aliases provide a mechanism for internationalizing the user-supplied search constraints; | ||
- | In a similar manner, aliases can be used to map CQL context set match points, which have standard names external to any specific search backend, to appropriate match points in any given Evergreen installation. | + | In a similar manner, aliases can be used to map [[http:// |
Line 191: | Line 193: | ||
</ | </ | ||
- | Data extracted from bibliographic record, for indexing purposes, will normally require some normalization. | + | Data extracted from bibliographic record, for indexing purposes, will normally require some normalization. |
- | Normalizer functions are in-database | + | Normalizer functions are in-database |
- | Twelve | + | Twenty-one |
^ Name ^ Description ^ | ^ Name ^ Description ^ | ||
+ | |Approximate High Date Normalize|Normalize the value to the nearest date-ish value, rounding up| | ||
+ | |Approximate Low Date Normalize|Normalize the value to the nearest date-ish value, rounding down| | ||
+ | |Coded Value Map Normalizer|Applies coded_value_map mapping of values| | ||
|Down-case|Convert text lower case.| | |Down-case|Convert text lower case.| | ||
|Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | |Extract Dewey-like number|Extract a string of numeric characters ther resembles a DDC number.| | ||
|First word|Include only the first space-separated word of a string.| | |First word|Include only the first space-separated word of a string.| | ||
+ | |Generic Mapping Normalizer|Map values or sets of values to new values.| | ||
|ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | |ISBN 10/13 conversion|Translate ISBN10 to ISBN13, and vice versa, for indexing purposes.| | ||
|Left truncation|Discard the specified number of characters from the left side of the string.| | |Left truncation|Discard the specified number of characters from the left side of the string.| | ||
- | |NACO Normalize|Apply NACO normalization rules to the extracted text. See http:// | + | |NACO Normalize|Apply NACO normalization rules to the extracted text. See https:// |
- | |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. | + | |NACO Normalize -- retain first comma|Apply NACO normalization rules to the extracted text, retaining the first comma. |
|Normalize date range|Split date ranges in the form of " | |Normalize date range|Split date ranges in the form of " | ||
- | |Replace|Replace all occurances | + | |Normalize date range|Normalize the value to NULL if it is not a number| |
+ | |Replace|Replace all occurrences | ||
+ | |Remove Parenthesized Substring|Remove any parenthesized substrings from the extracted text, such as the agency code preceding authority record control numbers in subfield 0.| | ||
|Right truncation|Include only the specified number of characters from the left side of the string.| | |Right truncation|Include only the specified number of characters from the left side of the string.| | ||
+ | |Search Normalize|Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.| | ||
|Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | |Strip Diacritics|Convert text to NFD form and remove non-spacing combining marks.| | ||
+ | |Trim Surrounding Space|Trim leading and trailing spaces from extracted text.| | ||
+ | |Trim Trailing Punctuation|Eliminate extraneous trailing commas and periods in text.| | ||
|Up-case|Convert text upper case.| | |Up-case|Convert text upper case.| | ||
Line 260: | Line 271: | ||
===== Search-oriented Index Definition Example ===== | ===== Search-oriented Index Definition Example ===== | ||
[[search_idx_def_example|Adding a Local Subjects (690) search index]] | [[search_idx_def_example|Adding a Local Subjects (690) search index]] | ||
- | + | ===== Facet-oriented Index Definition Example ===== | |
- | + | ||
- | ===== Search-oriented Index Definition Example ===== | + | |
[[facet_idx_def_example|Adding a Material Type (947$t) facet index]] | [[facet_idx_def_example|Adding a Material Type (947$t) facet index]] | ||
- | |||
===== Query Parser ===== | ===== Query Parser ===== | ||
- | [[technical: | + | [[documentation: |
documentation/indexing.txt · Last modified: 2022/02/10 13:34 by 127.0.0.1