Discussion:
Stemming acronyms ending in "s"; keyword marker token filter; minimal english stemmer
Loren
2014-01-22 16:10:24 UTC
Permalink
Using the minimal_english stemmer<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html>,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.html>to specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.

Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.

Are either of those options possible?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Adrien Grand
2014-01-23 21:51:46 UTC
Permalink
I think it would be nice to support protecting tokens based on their
length. Maybe you can open an issue about it?
Post by Loren
Using the minimal_english stemmer<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html>,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.html>to specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.
Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.
Are either of those options possible?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.
--
Adrien Grand
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7avbq%2B9f2O1HxMzkwDgEUQXj6%2BThVJs3dCSuQObPZFgA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Loren
2014-01-23 22:01:06 UTC
Permalink
Done!

https://github.com/elasticsearch/elasticsearch/issues/4877
Post by Adrien Grand
I think it would be nice to support protecting tokens based on their
length. Maybe you can open an issue about it?
Post by Loren
Using the minimal_english stemmer<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html>,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.html>to specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.
Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.
Are either of those options possible?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.
--
Adrien Grand
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cf0b1c2c-55c6-4bc7-a662-752457de7e61%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Loading...