Improve performance of PostgreSQL array queries Improve performance of PostgreSQL array queries postgresql postgresql

Improve performance of PostgreSQL array queries


You store your data in a structured data management storage container (i.e. PostgreSQL), but due to the nature of your data (i.e. large but irregularly sized collections of like data) you actually store your data outside of the container. PostgreSQL is not good at retrieving data from irregular and unpredictable?) large arrays, as you have noticed; the fact that the arrays are stored externally is already testament to the fact that your requirements are not aligned with where PostgreSQL excels. It is very likely that there are much better solutions for storing and reading your arrays than PostgreSQL. Given that the results from analyzing the arrays through prediction models is stored in some tables in a PostgreSQL database hints at a hybrid solution: store your data in some form that allows efficient access in the patterns that you need, then store the results in PostgreSQL for further processing.

Since you do not provide any details on the prediction models, it is impossible to be specific in this answer, but I hope this will help you on your way.

If your prediction models are written in some language for which a PostgreSQL driver is available, then store your data in some format that is suited for that language, do your predictions and write the results to a table in PostgreSQL. This would work for languages like C and C++ with the pq library and for Java, C#, Python, etc using a high-level library like JDBC.

If your prediction model is written in MatLab, then store your arrays in a MatLab format and connect to PostgreSQL for the results. If written in R, you can use the R extension for PostgreSQL.

The key here is that you should store the arrays in a form that allows for efficient use in your prediction models. Match your data storage to the prediction models, not the other way around.