Using the SQL Like Condition in Pandas for Data Filtering

Authors

Pandas is an open-source library for data analysis and manipulation. It provides data structures for efficiently storing large datasets and tools for working with them.

One of the most important operations when working with a dataset is filtering data based on certain conditions.

In this blog post, we will discuss how to use the "like" condition in Pandas to filter data based on patterns in strings.

The "like" condition is used to filter data based on patterns in string columns.

It is often used in SQL-style data filtering, and Pandas provides similar functionality through the str.contains method.

Let's start by creating a sample DataFrame to work with:

import pandas as pd

data = {
    'Name': ['John', 'Jane', 'Jim', 'Jessica', 'Jack'],
    'City': ['New York', 'San Francisco', 'London', 'Paris', 'Berlin']
}

df = pd.DataFrame(data)

To filter data based on a pattern in the 'Name' column, we can use the str.contains method and pass in the pattern we want to match:

pattern = 'J'
filtered_df = df[df['Name'].str.contains(pattern, case=False)]

In this example, we set case=False to ignore case when matching the pattern.

The resulting DataFrame, filtered_df, will contain all rows where the 'Name' column contains the letter 'J'.

It's also possible to match multiple patterns by using the | operator:

pattern = 'J|i'
filtered_df = df[df['Name'].str.contains(pattern, case=False)]

This will return all rows where the 'Name' column contains either the letter 'J' or the letter 'i'.

You can also match patterns at the start or end of a string using the ^ and $ symbols respectively:

start_pattern = '^J'
filtered_df = df[df['Name'].str.contains(start_pattern, case=False)]

end_pattern = 'n$'
filtered_df = df[df['Name'].str.contains(end_pattern, case=False)]

In this example, the first line returns all rows where the 'Name' column starts with the letter 'J', while the second line returns all rows where the 'Name' column ends with the letter 'n'.

Summary

In conclusion, the "like" condition is an essential tool for filtering data based on patterns in string columns in Pandas.

It provides a simple and flexible way to extract specific rows from a DataFrame based on the contents of string columns. With this knowledge, you can now efficiently manipulate and analyze your data to gain deeper insights.

TrackingJoy