Best way to join / merge by range in pandas

Setup
Consider the dataframes A and B

A = pd.DataFrame(dict(        A_id=range(10),        A_value=range(5, 105, 10)    ))B = pd.DataFrame(dict(        B_id=range(5),        B_low=[0, 30, 30, 46, 84],        B_high=[10, 40, 50, 54, 84]    ))A   A_id  A_value0     0        51     1       152     2       253     3       354     4       455     5       556     6       657     7       758     8       859     9       95B   B_high  B_id  B_low0      10     0      01      40     1     302      50     2     303      54     3     464      84     4     84

numpy
The ✌easiest✌ way is to use numpy broadcasting.
We look for every instance of A_value being greater than or equal to B_low while at the same time A_value is less than or equal to B_high.

a = A.A_value.valuesbh = B.B_high.valuesbl = B.B_low.valuesi, j = np.where((a[:, None] >= bl) & (a[:, None] <= bh))pd.DataFrame(    np.column_stack([A.values[i], B.values[j]]),    columns=A.columns.append(B.columns))   A_id  A_value  B_high  B_id  B_low0     0        5      10     0      01     3       35      40     1     302     3       35      50     2     303     4       45      50     2     30

To address the comments and give something akin to a left join, I appended the part of A that doesn't match.

pd.DataFrame(    np.column_stack([A.values[i], B.values[j]]),    columns=A.columns.append(B.columns)).append(    A[~np.in1d(np.arange(len(A)), np.unique(i))],    ignore_index=True, sort=False)    A_id  A_value  B_id  B_low  B_high0      0        5   0.0    0.0    10.01      3       35   1.0   30.0    40.02      3       35   2.0   30.0    50.03      4       45   2.0   30.0    50.04      1       15   NaN    NaN     NaN5      2       25   NaN    NaN     NaN6      5       55   NaN    NaN     NaN7      6       65   NaN    NaN     NaN8      7       75   NaN    NaN     NaN9      8       85   NaN    NaN     NaN10     9       95   NaN    NaN     NaN

python pandas numpy join

Not sure that is more efficient, however you can use sql directly (from the module sqlite3 for instance) with pandas (inspired from this question) like:

conn = sqlite3.connect(":memory:") df2 = pd.DataFrame(np.random.randn(10, 5), columns=["col1", "col2", "col3", "col4", "col5"])df1 = pd.DataFrame(np.random.randn(10, 5), columns=["col1", "col2", "col3", "col4", "col5"])df1.to_sql("df1", conn, index=False)df2.to_sql("df2", conn, index=False)qry = "SELECT * FROM df1, df2 WHERE df1.col1 > 0 and df1.col1<0.5"tt = pd.read_sql_query(qry,conn)

You can adapt the query as needed in your application

python pandas numpy join

I don't know how efficient it is, but someone wrote a wrapper that allows you to use SQL syntax with pandas objects. That's called pandasql. The documentation explicitly states that joins are supported. This might be at least easier to read since SQL syntax is very readable.

CodeHunter

Best way to join / merge by range in pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last