Pandas str.contains Method

Authors

Pandas is a powerful library for data analysis and manipulation in Python.

One of the most important operations when working with a dataset is filtering data based on certain conditions.

In this blog post, we will discuss the str.contains method in Pandas, a powerful tool for filtering data based on patterns in string columns.

str.contains Method

The str.contains method is used to filter data based on patterns in string columns.

It is often used in SQL-style data filtering, and Pandas provides similar functionality.

The str.contains method returns a Boolean mask indicating whether each element in the string column matches a specified pattern.

This Boolean mask can then be used to index into the DataFrame to extract the rows of interest.

Let's start by creating a sample DataFrame to work with:

import pandas as pd

data = {
    'Name': ['John', 'Jane', 'Jim', 'Jessica', 'Jack'],
    'City': ['New York', 'San Francisco', 'London', 'Paris', 'Berlin']
}

df = pd.DataFrame(data)

To filter data based on a pattern in the 'Name' column, we can use the str.contains method and pass in the pattern we want to match:

pattern = 'J'
filtered_df = df[df['Name'].str.contains(pattern, case=False)]

In this example, we set case=False to ignore case when matching the pattern.

The resulting DataFrame, filtered_df, will contain all rows where the 'Name' column contains the letter 'J'.

Match Multiple Patterns

It's also possible to match multiple patterns by using the | operator:

pattern = 'J|i'
filtered_df = df[df['Name'].str.contains(pattern, case=False)]

This will return all rows where the 'Name' column contains either the letter 'J' or the letter 'i'.

Match Patterns At Start And End

You can also match patterns at the start or end of a string using the ^ and $ symbols respectively:

start_pattern = '^J'
filtered_df = df[df['Name'].str.contains(start_pattern, case=False)]

end_pattern = 'n$'
filtered_df = df[df['Name'].str.contains(end_pattern, case=False)]

In this example, the first line returns all rows where the 'Name' column starts with the letter 'J', while the second line returns all rows where the 'Name' column ends with the letter 'n'.

Summary

In conclusion, the str.contains method is an essential tool for filtering data based on patterns in string columns in Pandas.

It provides a simple and flexible way to extract specific rows from a DataFrame based on the contents of string columns.

With this knowledge, you can now efficiently manipulate and analyze your data to gain deeper insights.

TrackingJoy