logstach: jdbc_page_size doesn't dump all my data to elastic search logstach: jdbc_page_size doesn't dump all my data to elastic search elasticsearch elasticsearch

logstach: jdbc_page_size doesn't dump all my data to elastic search


The reason you are facing this - you have ordering problems: your query doesn't controlls the order in which the data is received, and in general postgresql should not guarantie that in unordered consequent paging calls you don't fetch the same data: this produces situation when some data will be not fetched at all, and some data will be fetched multiple times :( even when the data is not modified during these calls, the background vacuum worker may change the order of the data in the physical file, and thus reproduce described situation.

Either add order to your statement SELECT * FROM tom_test2 ORDER BY id and page your data. But be aware: in this case your upload to elasticsearch will not assure the exact replica of the table at moment of time. The cause of that will be, that during logstash processing of consequent paging request the update of data in upcoming page introduced, i.e. you are uploading at the moment page 1 to 10000 and update happened at data on page 10001 and 20000, and then later otherwise... so you have problem in consistency of your data.

Or if you want to fetch all the data and generously use memory on logstash... , then you need to control the jdbc_fetch_size parameter: i.e. you are performing the same SELECT * FROM tom_test2. With this approach you will create a single query resultset, but will "pump" it in pieces, and data modification during your "pumping" will not cause you: you will be fetching the state at the moment of query start.


Because ordering is not guaranteed between queries in jdbc_page_size as WARNED in the documentation of jdbc_paging_enabled.

I recommend using jdbc_fetch_size instead of using jdbc_page_size as the documentation also says that for large result-sets.

P.S: sometimes ;) asking your questions at http://discuss.elastic.co is better answered by elastic maintainers