A Quick Tutorial to Encode List Variables

Using pandas functions

Photo by Mika Baumeister on Unsplash

Step 0: Generate some data

import pandas as pddata = [
{'cities': ['Athens', 'London', 'Berlin'], 'person': 'John'},
{'cities': ['Athens', 'London'], 'person': 'Nick'},
{'cities': ['Berlin', 'London'], 'person': 'Helen'}
]
df = pd.DataFrame(data)
The original DataFrame
The desired DataFrame

Step 1: Transform the column to a list

cities_df = pd.DataFrame(df['cities'].tolist())
cities_df: DataFrame resulted from the cities column
import ast
df['cities'] = df['cities'].apply(lambda x: ast.literal_eval(x))

Step 2: Transform columns to indexes

cities_obj = cities_df.stack()
The result after applying stack()

Step 3: Convert to dummy variables

cities_df = pd.get_dummies(cities_obj)
The result after applying get_dummies()

Step 4: Sum on the index level

cities_df = cities_df.sum(level=0)
The result applying sum()

Step 5: Re-join result with the original data

cities_df = pd.concat([df, cities_df], axis=1)
The result after using concat()

Wrap-up

--

--

Data enthusiast, Machine Learning fan. Doing data work at the BBC

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store