A Quick Tutorial to Encode List Variables

Using pandas functions

Photo by Mika Baumeister on Unsplash
import pandas as pddata = [
{'cities': ['Athens', 'London', 'Berlin'], 'person': 'John'},
{'cities': ['Athens', 'London'], 'person': 'Nick'},
{'cities': ['Berlin', 'London'], 'person': 'Helen'}
df = pd.DataFrame(data)
The original DataFrame
The desired DataFrame
cities_df = pd.DataFrame(df['cities'].tolist())
cities_df: DataFrame resulted from the cities column

Note or Before applying Step 1. Sometimes, if we load the data from a file, we can’t be sure about the type of such a column. Depending on where the data comes from (e.g. derived from data crawling), the cities column might be type of str and therefore be loaded as type str; in this case, an additional preprocessing step is needed, before trying to apply the tolist() function to the column, to ensure that the input is evaluated correctly.
Using the literal_eval() function of the ast module, we can transform each element of the cities column. After that, we are ready to use the tolist() function on our data.

import ast
df['cities'] = df['cities'].apply(lambda x: ast.literal_eval(x))
cities_obj = cities_df.stack()
The result after applying stack()
cities_df = pd.get_dummies(cities_obj)
The result after applying get_dummies()
cities_df = cities_df.sum(level=0)
The result applying sum()
cities_df = pd.concat([df, cities_df], axis=1)
The result after using concat()



Data enthusiast, Machine Learning fan. Doing data work at the BBC

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store