Django database planning - time series data Django database planning - time series data database database

Django database planning - time series data


One thing that may help you with HUGE tables is partitioning. Judging by the postgresql tag that you set for your question, take a look here: http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

But for a start I would go with a one, simple table, supported by a reasonable set of indexes. From what I understand, each data entry in the table will be identified by race session id, player id and time indicator. Those columns should be covered with indexes according to your querying requirements.

As for your two questions:1) You store those informations as simple integers. Remember to set a proper data types for those columns. For e.g. if you are 100% sure that some values will be very small, you can use smallint data type. More on integer data types here: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html#DATATYPE-INT

2) That won't be a problem if you every var list will be different row in the table. You will be able to insert as many as you'd like.

So, to sum things up. I would start with a VERY simple single table schema. From django perspective this would look something like this:

class RaceTelemetryData(models.Model):  user = models.ForeignKey(..., index_db=True)  race = models.ForeignKey(YourRaceModel, index_db=True)  time = models.IntegerField()  gas = models.IntegerField()  speed = models.SmallIntegerField()  # and so on...

Additionaly, you should create an index (manually) for (user_id, race_id, time) columns, so looking up, data about one race session (and sorting it) would be quick.

In the future, if you'll find the performance of this single table too slow, you'll be able to experiment with additional indexes, or partitioning. PostgreSQL is quite flexible in modifying existing database structures, so you shouldn't have many problems with it.

If you decide to add a new variable to the collection, you will simply need to add a new column to the table.

EDIT:

In the end you end up with one table, that has at least these columns:user_id - To specify which users data this row is about.race_id - To specify which race data this row is about.time - To identify the correct order in which to represent the data.

This way, when you want to get information on Joe's 5th race, you would look up rows that have user_id = 'Joe_ID' and race_id = 5, then sort all those rows by the time column.