Decompress and read Dukascopy .bi5 tick files

python csv pandas binary lzma

The code below should do the trick. First, it opens a file and decodes it in lzma and then uses struct to unpack the binary data.

import lzmaimport structimport pandas as pddef bi5_to_df(filename, fmt):    chunk_size = struct.calcsize(fmt)    data = []    with lzma.open(filename) as f:        while True:            chunk = f.read(chunk_size)            if chunk:                data.append(struct.unpack(fmt, chunk))            else:                break    df = pd.DataFrame(data)    return df

The most important thing is to know the right format. I googled around and tried to guess and '>3i2f' (or >3I2f) works quite good. (It's big endian 3 ints 2 floats. What you suggest: 'i4f' doesn't produce sensible floats - regardless whether big or little endian.) For struct and format syntax see the docs.

df = bi5_to_df('13h_ticks.bi5', '>3i2f')df.head()Out[177]:       0       1       2     3     40   210  110218  110216  1.87  1.121   362  110219  110216  1.00  5.852   875  110220  110217  1.00  1.123  1408  110220  110218  1.50  1.004  1884  110221  110219  3.94  1.00

Update

To compare the output of bi5_to_df with https://github.com/ninety47/dukascopy,I compiled and run test_read_bi5 from there. The first lines of the output are:

time, bid, bid_vol, ask, ask_vol2012-Dec-03 01:00:03.581000, 131.945, 1.5, 131.966, 1.52012-Dec-03 01:00:05.142000, 131.943, 1.5, 131.964, 1.52012-Dec-03 01:00:05.202000, 131.943, 1.5, 131.964, 2.252012-Dec-03 01:00:05.321000, 131.944, 1.5, 131.964, 1.52012-Dec-03 01:00:05.441000, 131.944, 1.5, 131.964, 1.5

And bi5_to_df on the same input file gives:

bi5_to_df('01h_ticks.bi5', '>3I2f').head()Out[295]:       0       1       2     3    40  3581  131966  131945  1.50  1.51  5142  131964  131943  1.50  1.52  5202  131964  131943  2.25  1.53  5321  131964  131944  1.50  1.54  5441  131964  131944  1.50  1.5

So everything seems to be fine (ninety47's code reorders columns).

Also, it's probably more accurate to use '>3I2f' instead of '>3i2f' (i.e. unsigned int instead of int).

python csv pandas binary lzma

import requestsimport structfrom lzma import LZMADecompressor, FORMAT_AUTO# for download compressed EURUSD 2020/06/15/10h_ticks.bi5 fileres = requests.get("https://www.dukascopy.com/datafeed/EURUSD/2020/06/15/10h_ticks.bi5", stream=True)print(res.headers.get('content-type'))rawdata = res.contentdecomp = LZMADecompressor(FORMAT_AUTO, None, None)decompresseddata = decomp.decompress(rawdata)firstrow = struct.unpack('!IIIff', decompresseddata[0: 20])print("firstrow:", firstrow)# firstrow: (436, 114271, 114268, 0.9399999976158142, 0.75)# time = 2020/06/15/10h + (1 month) + 436 milisecondsecondrow = struct.unpack('!IIIff', decompresseddata[20: 40])print("secondrow:", secondrow)# secondrow: (537, 114271, 114267, 4.309999942779541, 2.25)# time = 2020/06/15/10h + (1 month) + 537 milisecond# ask = 114271 / 100000 = 1.14271# bid = 114267 / 100000 = 1.14267# askvolume = 4.31# bidvolume = 2.25# note that 00 -> is january# "https://www.dukascopy.com/datafeed/EURUSD/2020/00/15/10h_ticks.bi5" for january# "https://www.dukascopy.com/datafeed/EURUSD/2020/01/15/10h_ticks.bi5" for february#  iteratingprint(len(decompresseddata), int(len(decompresseddata) / 20))for i in range(0, int(len(decompresseddata) / 20)):    print(struct.unpack('!IIIff', decompresseddata[i * 20: (i + 1) * 20]))

python csv pandas binary lzma

Did you try using numpy as to parse the data before transfer it to pandas. Maybe is a long way solution, but I will allow you to manipulate and clean the data before you made the analysis in Panda, also the integration between them are pretty straight forward,

CodeHunter

Decompress and read Dukascopy .bi5 tick files

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last