Python baseline correction library Python baseline correction library python python

Python baseline correction library


I found an answer to my question, just sharing for everyone who stumbles upon this.

There is an algorithm called "Asymmetric Least Squares Smoothing" by P. Eilers and H. Boelens in 2005. The paper is free and you can find it on google.

def baseline_als(y, lam, p, niter=10):  L = len(y)  D = sparse.csc_matrix(np.diff(np.eye(L), 2))  w = np.ones(L)  for i in xrange(niter):    W = sparse.spdiags(w, 0, L, L)    Z = W + lam * D.dot(D.transpose())    z = spsolve(Z, w*y)    w = p * (y > z) + (1-p) * (y < z)  return z


The following code works on Python 3.6.

This is adapted from the accepted correct answer to avoid the dense matrix diff computation (which can easily cause memory issues) and uses range (not xrange)

import numpy as npfrom scipy import sparsefrom scipy.sparse.linalg import spsolvedef baseline_als(y, lam, p, niter=10):  L = len(y)  D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))  w = np.ones(L)  for i in range(niter):    W = sparse.spdiags(w, 0, L, L)    Z = W + lam * D.dot(D.transpose())    z = spsolve(Z, w*y)    w = p * (y > z) + (1-p) * (y < z)  return z


Recently, I needed to use this method. The code from answers works well, but it obviously overuses the memory. So, here is my version with optimized memory usage.

def baseline_als_optimized(y, lam, p, niter=10):    L = len(y)    D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))    D = lam * D.dot(D.transpose()) # Precompute this term since it does not depend on `w`    w = np.ones(L)    W = sparse.spdiags(w, 0, L, L)    for i in range(niter):        W.setdiag(w) # Do not create a new matrix, just update diagonal values        Z = W + D        z = spsolve(Z, w*y)        w = p * (y > z) + (1-p) * (y < z)    return z

According to my benchmarks bellow, it is also about 1,5 times faster.

%%timeit -n 1000 -r 10 y = randn(1000)baseline_als(y, 10000, 0.05) # function from @jpantina's answer# 20.5 ms ± 382 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)%%timeit -n 1000 -r 10 y = randn(1000)baseline_als_optimized(y, 10000, 0.05)# 13.3 ms ± 874 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)

NOTE 1: The original article says:

To emphasize the basic simplicity of the algorithm, the number of iterations has been fixed to 10. In practical applications one should check whether the weights show any change; if not, convergence has been attained.

So, it means that the more correct way to stop iteration is to check that ||w_new - w|| < tolerance

NOTE 2: Another useful quote (from @glycoaddict's comment) gives an idea how to choose values of the parameters.

There are two parameters: p for asymmetry and λ for smoothness. Both have to be tuned to the data at hand. We found that generally 0.001 ≤ p ≤ 0.1 is a good choice (for a signal with positive peaks) and 102 ≤ λ ≤ 109, but exceptions may occur. In any case one should vary λ on a grid that is approximately linear for log λ. Often visual inspection is sufficient to get good parameter values.