How To Use Pandas Case When

Authors

Pandas Case When: An Introduction

Pandas is a popular data analysis library for Python that provides efficient and easy-to-use data structures for data manipulation and analysis.

One of its key features is the ability to apply conditional operations on data frames, known as case when in SQL terms.

The case when operation is a way to apply conditional statements to data in a Pandas data frame.

It allows us to specify multiple conditions and corresponding values, and to update the values in the data frame based on those conditions.

The operation is performed using the np.where method in Pandas.

Here is an example of how to use case when in Pandas.

Let's say we have a data frame that contains information about students and their grades.

We want to categorize the students based on their grades into "good", "average", and "poor" categories.

We can achieve this using the following code:

import pandas as pd
import numpy as np

# Create the data frame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Jim', 'Jill'],
                   'Grade': [85, 75, 95, 65]})

# Define the condition
conditions = [
    (df['Grade'] >= 90),
    (df['Grade'] >= 75) & (df['Grade'] < 90),
    (df['Grade'] < 75)
]

# Define the values for each condition
values = ['Good', 'Average', 'Poor']

# Apply the "case when" operation
df['Performance'] = np.select(conditions, values, default='Unknown')

# View the result
print(df)

This will produce the following output:

   Name  Grade Performance
0  John    85      Average
1  Jane    75        Poor
2   Jim    95         Good
3  Jill    65        Poor

As you can see, the values in the Performance column have been updated based on the conditions specified in the conditions list.

The np.select method applies each condition in the list, and assigns the corresponding value from the values list to the new column Performance.

Summary

In conclusion, the case when operation in Pandas provides a convenient way to apply conditional statements to data frames.

It is useful in many data analysis scenarios where we need to categorize data based on certain conditions.

With its simple syntax and efficient implementation, it is a powerful tool for data manipulation and analysis.

TrackingJoy