Python baseline correction library

I found an answer to my question, just sharing for everyone who stumbles upon this.

There is an algorithm called "Asymmetric Least Squares Smoothing" by P. Eilers and H. Boelens in 2005. The paper is free and you can find it on google.

def baseline_als(y, lam, p, niter=10):  L = len(y)  D = sparse.csc_matrix(np.diff(np.eye(L), 2))  w = np.ones(L)  for i in xrange(niter):    W = sparse.spdiags(w, 0, L, L)    Z = W + lam * D.dot(D.transpose())    z = spsolve(Z, w*y)    w = p * (y > z) + (1-p) * (y < z)  return z

python numpy scipy signal-processing

The following code works on Python 3.6.

This is adapted from the accepted correct answer to avoid the dense matrix diff computation (which can easily cause memory issues) and uses range (not xrange)

import numpy as npfrom scipy import sparsefrom scipy.sparse.linalg import spsolvedef baseline_als(y, lam, p, niter=10):  L = len(y)  D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))  w = np.ones(L)  for i in range(niter):    W = sparse.spdiags(w, 0, L, L)    Z = W + lam * D.dot(D.transpose())    z = spsolve(Z, w*y)    w = p * (y > z) + (1-p) * (y < z)  return z

python numpy scipy signal-processing

Recently, I needed to use this method. The code from answers works well, but it obviously overuses the memory. So, here is my version with optimized memory usage.

def baseline_als_optimized(y, lam, p, niter=10):    L = len(y)    D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))    D = lam * D.dot(D.transpose()) # Precompute this term since it does not depend on `w`    w = np.ones(L)    W = sparse.spdiags(w, 0, L, L)    for i in range(niter):        W.setdiag(w) # Do not create a new matrix, just update diagonal values        Z = W + D        z = spsolve(Z, w*y)        w = p * (y > z) + (1-p) * (y < z)    return z

According to my benchmarks bellow, it is also about 1,5 times faster.

%%timeit -n 1000 -r 10 y = randn(1000)baseline_als(y, 10000, 0.05) # function from @jpantina's answer# 20.5 ms ± 382 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)%%timeit -n 1000 -r 10 y = randn(1000)baseline_als_optimized(y, 10000, 0.05)# 13.3 ms ± 874 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)

NOTE 1: The original article says:

To emphasize the basic simplicity of the algorithm, the number of iterations has been fixed to 10. In practical applications one should check whether the weights show any change; if not, convergence has been attained.

So, it means that the more correct way to stop iteration is to check that ||w_new - w|| < tolerance

NOTE 2: Another useful quote (from @glycoaddict's comment) gives an idea how to choose values of the parameters.

There are two parameters: p for asymmetry and λ for smoothness. Both have to be tuned to the data at hand. We found that generally 0.001 ≤ p ≤ 0.1 is a good choice (for a signal with positive peaks) and 102 ≤ λ ≤ 109, but exceptions may occur. In any case one should vary λ on a grid that is approximately linear for log λ. Often visual inspection is sufficient to get good parameter values.

CodeHunter

Python baseline correction library

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last