How to correlate two time series with gaps and different time bases? How to correlate two time series with gaps and different time bases? numpy numpy

How to correlate two time series with gaps and different time bases?


My interpretation of your question: Given two very long, noisy time series, find a shift of one that matches large 'bumps' in one signal to large bumps in the other signal.

My suggestion: interpolate the data so it's uniformly spaced, rectify and smooth the data (assuming the phase of the fast oscillations is uninteresting), and do a one-point-at-a-time cross correlation (assuming a small shift will line up the data).

import numpyfrom scipy.ndimage import gaussian_filter"""sig1 and sig 2 are assumed to be large, 1D numpy arrayssig1 is sampled at times t1, sig2 is sampled at times t2t_start, t_end, is your desired sampling intervalt_len is your desired number of measurements"""t = numpy.linspace(t_start, t_end, t_len)sig1 = numpy.interp(t, t1, sig1)sig2 = numpy.interp(t, t2, sig2)#Now sig1 and sig2 are sampled at the same points."""Rectify and smooth, so 'peaks' will stand out.This makes big assumptions about your data;these assumptions seem true-ish based on your plots."""sigma = 10 #Tune this parameter to get the right smoothingsig1, sig2 = abs(sig1), abs(sig2)sig1, sig2 = gaussian_filter(sig1, sigma), gaussian_filter(sig2, sigma)"""Now sig1 and sig2 should look smoothly varying, with humps at each 'event'.Hopefully we can search a small range of shifts to find the maximum of the cross-correlation. This assumes your data are *nearly* lined up already."""max_xc = 0best_shift = 0for shift in range(-10, 10): #Tune this search range    xc = (numpy.roll(sig1, shift) * sig2).sum()    if xc > max_xc:        max_xc = xc        best_shift = shiftprint 'Best shift:', best_shift"""If best_shift is at the edges of your search range,you should expand the search range."""


If the data contains gaps of unknown sizes that are different in each time series, then I would give up on trying to correlate entire sequences, and instead try cross correlating pairs of short windows on each time series, say overlapping windows twice the length of a typical event (300 samples long). Find potential high cross correlation matches across all possibilities, and then impose a sequential ordering constraint on the potential matches to get sequences of matched windows.

From there you have smaller problems that are easier to analyze.


This isn't a technical answer, but it might help you come up with one:

  • Convert the plot to an image, and stick it into a decent image program like gimp or photoshop
  • break the plots into discrete images whenever there's a gap
  • put the first series of plots in a horizontal line
  • put the second series in a horizontal line right underneath it
  • visually identify the first correlated event
  • if the two events are not lined up vertically:
    • select whichever instance is further to the left and everything to the right of it on that row
    • drag those things to the right until they line up

This is pretty much how an audio editor works, so you if you converted it into a simple audio format like an uncompressed WAV file, you could manipulate it directly in something like Audacity. (It'll sound horrible, of course, but you'll be able to move the data plots around pretty easily.)

Actually, audacity has a scripting language called nyquist, too, so if you don't need the program to detect the correlations (or you're at least willing to defer that step for the time being) you could probably use some combination of audacity's markers and nyquist to automate the alignment and export the clean data in your format of choice once you tag the correlation points.