How to create a dictionary of key : column_name and value : unique values in column in python from a dataframe
With a DataFrame
like this:
import pandas as pddf = pd.DataFrame([["Women", "Slip on", 7, "Black", "Clarks"], ["Women", "Slip on", 8, "Brown", "Clarcks"], ["Women", "Slip on", 7, "Blue", "Clarks"]], columns= ["Category", "Sub Category", "Size", "Color", "Brand"])print(df)
Output:
Category Sub Category Size Color Brand0 Women Slip on 7 Black Clarks1 Women Slip on 8 Brown Clarcks2 Women Slip on 7 Blue Clarks
You can convert your DataFrame into dict and create your new dict when mapping the the columns of the DataFrame, like this example:
new_dict = {"color_list": list(df["Color"]), "size_list": list(df["Size"])}# OR:#new_dict = {"color_list": [k for k in df["Color"]], "size_list": [k for k in df["Size"]]}print(new_dict)
Output:
{'color_list': ['Black', 'Brown', 'Blue'], 'size_list': [7, 8, 7]}
In order to have a unique values, you can use set
like this example:
new_dict = {"color_list": list(set(df["Color"])), "size_list": list(set(df["Size"]))}print(new_dict)
Output:
{'color_list': ['Brown', 'Blue', 'Black'], 'size_list': [8, 7]}
Or, like what @Ami Tavory said in his answer, in order to have the whole unique keys and values from your DataFrame, you can simply do this:
new_dict = {k:list(df[k].unique()) for k in df.columns}print(new_dict)
Output:
{'Brand': ['Clarks', 'Clarcks'], 'Category': ['Women'], 'Color': ['Black', 'Brown', 'Blue'], 'Size': [7, 8], 'Sub Category': ['Slip on']}
I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.
You could use a simple dictionary comprehension for that.
Say you start with
import pandas as pddf = pd.DataFrame({'a': [1, 2, 1], 'b': [1, 4, 5]})
Then the following comprehension solves it:
>>> {c: list(df[c].unique()) for c in df.columns}{'a': [1, 2], 'b': [1, 4, 5]}
If I understand your question correctly, you may need set
instead of list. Probably at this piece of code, you might add set
to get the unique values of the given list.
for col in col_list[1:]: _list = [] _list.append(footwear_data[col].unique()) list_name = ''.join([str(col),'_list']) list_name = set(list_name)
Sample of usage
>>> a_list = [7, 8, 7, 9, 10, 9]>>> set(a_list) {8, 9, 10, 7}