How to access a field of a namedtuple using a variable for the field name?
The 'getattr' answer works, but there is another option which is slightly faster.
idx = {name: i for i, name in enumerate(list(df), start=1)}for row in df.itertuples(name=None): example_value = row[idx['product_price']]
Explanation
Make a dictionary mapping the column names to the row position. Call 'itertuples' with "name=None". Then access the desired values in each tuple using theindexes obtained using the column name from the dictionary.
- Make a dictionary to find the indexes.
idx = {name: i for i, name in enumerate(list(df), start=1)}
- Use the dictionary to access the desired values by name in the row tuples
for row in df.itertuples(name=None): example_value = row[idx['product_price']]
Note: Use start=0
in enumerate
if you call itertuples with index=False
Here is a working example showing both methods and the timing of both methods.
import numpy as npimport pandas as pdimport timeitdata_length = 3 * 10**5fake_data = { "id_code": list(range(data_length)), "letter_code": np.random.choice(list('abcdefgz'), size=data_length), "pine_cones": np.random.randint(low=1, high=100, size=data_length), "area": np.random.randint(low=1, high=100, size=data_length), "temperature": np.random.randint(low=1, high=100, size=data_length), "elevation": np.random.randint(low=1, high=100, size=data_length),}df = pd.DataFrame(fake_data)def iter_with_idx(): result_data = [] idx = {name: i for i, name in enumerate(list(df), start=1)} for row in df.itertuples(name=None): row_calc = row[idx['pine_cones']] / row[idx['area']] result_data.append(row_calc) return result_data def iter_with_getaatr(): result_data = [] for row in df.itertuples(): row_calc = getattr(row, 'pine_cones') / getattr(row, 'area') result_data.append(row_calc) return result_data dict_idx_method = timeit.timeit(iter_with_idx, number=100)get_attr_method = timeit.timeit(iter_with_getaatr, number=100)print(f'Dictionary index Method {dict_idx_method:0.4f} seconds')print(f'Get attribute method {get_attr_method:0.4f} seconds')
Result:
Dictionary index Method 49.1814 secondsGet attribute method 80.1912 seconds
I assume the difference is due to lower overhead in creating a tuple vs a named tuple and also lower overhead in accessing it by the index rather than getattr but both of those are just guesses. If anyone knows better please comment.
I have not explored how the number of columns vs number of rows effects the timing results.
since python version 3.6 one could inherit from typing.NamedTuple
import typing as tpclass HistoryItem(tp.NamedTuple): inp: str tsb: float rtn: int frequency: int = None def __getitem__(self, item): if isinstance(item, int): item = self._fields[item] return getattr(self, item) def get(self, item, default=None): try: return self[item] except (KeyError, AttributeError, IndexError): return defaultitem = HistoryItem("inp", 10, 10, 10)print(item[0]) # 'inp'print(item["inp"]) # 'inp'