Discussion:
Compression in Elasticsearch documents
a***@gmail.com
2015-04-14 16:47:59 UTC
Permalink
I would like to know if Elasticsearch documents/indices are stored in
compressed format on disk . If yes, what type of compression options are
available and it's performance overheads.

and if these compression options are configurable.

Thanks
Ajay
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Adrien Grand
2015-04-14 17:13:18 UTC
Permalink
Hi,

Data are both duplicated to suit different access patterns and compressed.
There are so many compression algorithms in-place that it would be hard to
be exhaustive, but we have for instance Frame-Of-Reference compression for
postings lists, LZ4 for the document store, bit packing for numeric doc
values, ...

There are no configurations options available to configure compression
besides disabling features that you don't need (such as norms on fields
that you don't score on). In the next major version of elasticsearch (2.0)
there will be a setting to enable heavier compression though (which in
practice will use DEFLATE instead of LZ4 for the document store):
https://github.com/elastic/elasticsearch/pull/8863
Post by a***@gmail.com
I would like to know if Elasticsearch documents/indices are stored in
compressed format on disk . If yes, what type of compression options are
available and it's performance overheads.
and if these compression options are configurable.
Thanks
Ajay
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
Adrien
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAiF5uRZmGCocKgjeiuBahpsc1iMZ-7XkQWFzWK3hVWPvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
a***@gmail.com
2015-04-14 17:35:34 UTC
Permalink
Hi Adrian

Thanks for quick response.

When I loaded nearly 45m documents of test data with 3 replicas (each
document approx 2K+ bytes in size), I got following info on storage:

*health status index pri rep docs.count docs.deleted store.size
pri.store.size*

*green open test_insert 5 3 44985382 0
414.9gb 106.4gb*

This indicates there was hardly any compression on physical storage*.*
Hence my question. How do I find /estimate how much storage would be used
for X number of documents of average size of Y kilobytes each. From above
result, it appears to be no compression at all on all stored data.

Thanks

Ajay
Post by Adrien Grand
Hi,
Data are both duplicated to suit different access patterns and compressed.
There are so many compression algorithms in-place that it would be hard to
be exhaustive, but we have for instance Frame-Of-Reference compression for
postings lists, LZ4 for the document store, bit packing for numeric doc
values, ...
There are no configurations options available to configure compression
besides disabling features that you don't need (such as norms on fields
that you don't score on). In the next major version of elasticsearch (2.0)
there will be a setting to enable heavier compression though (which in
https://github.com/elastic/elasticsearch/pull/8863
Post by a***@gmail.com
I would like to know if Elasticsearch documents/indices are stored in
compressed format on disk . If yes, what type of compression options are
available and it's performance overheads.
and if these compression options are configurable.
Thanks
Ajay
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
Adrien
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/50726923-3199-457b-a53e-24978cb94510%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Adrien Grand
2015-04-15 07:14:22 UTC
Permalink
Post by a***@gmail.com
Hi Adrian
Thanks for quick response.
When I loaded nearly 45m documents of test data with 3 replicas (each
*health status index pri rep docs.count docs.deleted store.size
pri.store.size*
*green open test_insert 5 3 44985382 0
414.9gb 106.4gb*
This indicates there was hardly any compression on physical storage*.*
Hence my question. How do I find /estimate how much storage would be used
for X number of documents of average size of Y kilobytes each. From above
result, it appears to be no compression at all on all stored data.
Compression ratios depend so much on the data that you can't really know
what the compression ratio will be without indexing sample documents.
However, once you indexed enough documents (eg. 100k), you can expect the
store size to keep growing linearly with the number of documents.

Most of time the largest part of the index is the document store. In your
case I assume that LZ4 is too lightweight a compression algorithm to manage
to compress your data efficiently. The high compression option which is
coming in elasticsearch 2.0 might help.
--
Adrien
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAhehRg9QDNCK-TO%2BKYnX3T%2B5BH9QEM5nUi21u%2BgqBQEFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Loading...