Introduction to Elasticsearch search highlight configuration

Little scum · Posted on 2/14/2021 11:43:47 AM

Highlighters
In daily life, when we use search tools to try to query some information, we can often see that the fields in the returned result set that match our query conditions are marked with special colors, which is the result highlighting. By highlighting where users can clearly find query matches,

ES uses highlight to highlight one or more fields in search results.

.NET/C# Use Elasticsearch debugging to view request and response information
https://www.itsvse.com/thread-9561-1-1.html

Highlight parameters

parameter	illustrate
boundary_chars	A string containing each boundary character. The default is ,! ?\\n.
boundary_max_scan	The distance to the scan boundary character. The default is 20.
boundary_scanner	Specify how to split the highlighted fragments, which can be used in three ways: chars, sentence, or word.
boundary_scanner_locale	Localization settings for searching and determining word boundaries, this parameter is in the form of language tags ("en-US", "fr-FR", "ja-JP")
encoder	Indicates that the snippet should be HTML encoded: default (unencoded) or HTML (HTML - escape snippet text and then insert highlight)
fields	Specifies the fields to be highlighted to retrieve. Fields can be specified using wildcards. For example, you can specify comment_* to get the highlighting of all text and keyword fields that start with comment_.
force_source	Highlight according to the source. The default value is false.
fragmenter	Specifies how the text should be split in the highlighted fragment: support the parameters simple or span.
fragment_offset	Control the white space you want to start highlighting. Works only when using FVH Highlighter.
fragment_size	The size of the segment highlighted in the character. The default is 100.
highlight_query	Highlight matches for queries other than the search query. This is especially useful when using rescoring queries, as these issues are not taken into account by default.
matched_fields	Combine multiple matching results to highlight a single field, and for multiple fields that use different ways to analyze the same string. All matched_fields must have term_vector set to with_positions_offsets, but only the field that matches are combined into will be loaded, so only setting store to yes will benefit that field. Only for FVH Highlighter.
no_match_size	If there is no matching fragment to highlight, the amount of text you want to return from the beginning of the field. The default is 0 (returns nothing).
number_of_fragments	The maximum number of fragments returned. If the number of fragments is set to 0, no fragments will be returned. Instead, highlight and return the entire field content. This configuration is convenient when you need to highlight short text, such as a title or address, but you don't need segmentation. If the number_of_fragments is 0, ignore the fragment_size. The default is 5.
order	When set to score, the highlighted fragments are sorted by score. By default, fragments will be output in the order in which they appear in the field (order:none). Setting this option to score will output the most relevant clips first. Each highlight applies its own logic to calculate the relevance score.
phrase_limit	Controls the number of matching phrases considered in the document. Prevents FVH Highlighter from analyzing too many phrases and consuming too much memory. Raising the limit increases query time and consumes more memory. The default is 256.
pre_tags	Used with post_tags to define HTML markup to highlight text. By default, highlighted text is wrapped in and marked. Specified as a string array.
post_tags	Used with pre_tags to define HTML markup to highlight text. By default, highlighted text is wrapped in and marked. Specified as a string array.
require_field_match	By default, only fields that contain query matches are highlighted. Set require_field_match to false to highlight all fields. The default value is true.
tags_schema	Set to style using the built-in markup mode.
type	Highlight mode used: Unified, Plain, or FVH. The default is unified.

Elasticsearch supports three highlighters: unified, plain, and fvh (fast vector highlighter).The default is unified。 You can specify the type of highlighter to use for each field.

(1) Unified highlighter
Unified Highlighter uses Lucene Unified Highlighter. This highlighter breaks down text into sentences and uses the BM25 algorithm to score individual sentences as if they were documents in an anthology. It also supports accurate phrases and multiple (fuzzy, prefix, regex) highlighting. This is the default highlighter.

(2) Plain highlighter
The plain highlighter uses a standard Lucene highlighter. It attempts to understand word importance and any word positioning criteria in phrase queries to reflect query matching logic.

(3) FVH highlighter
The fvh highlighter uses the Lucene Fast Vector highlighter. This highlighter can be used for fields where the term_vector is set to with_positions_offsets in the map.

Resources:

The hyperlink login is visible.
The hyperlink login is visible.

Introduction to Elasticsearch search highlight configuration

Related Posts

Sections viewed