How to convert a for loop into parallel processing in Python? How to convert a for loop into parallel processing in Python? pandas pandas

How to convert a for loop into parallel processing in Python?


Here is a quick solution - I didn't try to optimize your code at all, just fed it into a multiprocessing pool. This will run your function on each row individually, return a row with the new properties, and create a new dataframe from this output.

import multiprocessing as mppool = mp.Pool(processes=mp.cpu_count())def func( arg ):    idx,row = arg    dist = gcd.dist(row['destination'], row['ping_location'])    row['gc_distance'] = dist    temp_idx = str(row['address_fields_dest']).find(":")    pos_start = temp_idx + 3    pos_end = str(row['address_fields_dest']).find(",") - 2    row['destination address'] = str(row['address_fields_dest'])[pos_start:pos_end]    ##### calculate velocity which is: v = d/t    ## time is the difference btwn destination time and the ping creation time    timediff = abs(row['dest_data_generate_time'] - row['event_time'])    row['velocity km/hr'] = 0    ## check if the time dif btwn destination and event ping is more than a minute long    if timediff > datetime.timedelta(minutes=1):       row['velocity km/hr'] = dist / timediff.total_seconds() * 3600.0    return rownew_rows = pool.map( func, [(idx,row) for idx,row in data_all.iterrows()])data_all_new = pd.concat( new_rows )