How to convert a for loop into parallel processing in Python?
Here is a quick solution - I didn't try to optimize your code at all, just fed it into a multiprocessing pool. This will run your function on each row individually, return a row with the new properties, and create a new dataframe from this output.
import multiprocessing as mppool = mp.Pool(processes=mp.cpu_count())def func( arg ): idx,row = arg dist = gcd.dist(row['destination'], row['ping_location']) row['gc_distance'] = dist temp_idx = str(row['address_fields_dest']).find(":") pos_start = temp_idx + 3 pos_end = str(row['address_fields_dest']).find(",") - 2 row['destination address'] = str(row['address_fields_dest'])[pos_start:pos_end] ##### calculate velocity which is: v = d/t ## time is the difference btwn destination time and the ping creation time timediff = abs(row['dest_data_generate_time'] - row['event_time']) row['velocity km/hr'] = 0 ## check if the time dif btwn destination and event ping is more than a minute long if timediff > datetime.timedelta(minutes=1): row['velocity km/hr'] = dist / timediff.total_seconds() * 3600.0 return rownew_rows = pool.map( func, [(idx,row) for idx,row in data_all.iterrows()])data_all_new = pd.concat( new_rows )