Pandas JSON Normalizing
- Authors
- Name
- Brent
Normalizing JSON Data with Pandas: A Step-by-Step Guide
When working with data in JSON format, it is often necessary to normalize the data to make it more organized and accessible.
Normalizing data involves converting nested structures into a flat table with a single row for each record.
In this guide post, we will explore how to normalize JSON data using the json_normalize function in Pandas.
Importing the Data
The first step in normalizing JSON data with Pandas is to import the data into a Pandas data frame.
We can use the read_json
function for this.
For example:
import pandas as pd
data = {'students': [{'name': 'John', 'age': 25, 'courses': ['Math', 'Science']},
{'name': 'Jane', 'age': 28, 'courses': ['History', 'Literature']}]}
df = pd.read_json(json.dumps(data))
Normalizing the Data
Next, we can use the json_normalize
function to normalize the data.
This function takes the data frame and a string specifying the path to the JSON data to be normalized.
For example:
import pandas as pd
from pandas.io.json import json_normalize
data = {'students': [{'name': 'John', 'age': 25, 'courses': ['Math', 'Science']},
{'name': 'Jane', 'age': 28, 'courses': ['History', 'Literature']}]}
df = pd.read_json(json.dumps(data))
normalized_df = json_normalize(df['students'])
This will produce a data frame with the following structure:
name age courses
0 John 25 [Math, Science]
1 Jane 28 [History, Literature]
Specifying the Path to the JSON Data
The json_normalize
function can also be used to normalize specific nested elements of the JSON data.
To do this, we need to specify the path to the data in the form of a list of strings.
For example:
import pandas as pd
from pandas.io.json import json_normalize
data = {'students': [{'name': 'John', 'age': 25, 'courses': ['Math', 'Science']},
{'name': 'Jane', 'age': 28, 'courses': ['History', 'Literature']}]}
df = pd.read_json(json.dumps(data))
normalized_df = json_normalize(df['students'], 'courses', ['name', 'age'])
This will produce a data frame with the following structure:
courses name age
0 Math John 25
1 Science John 25
2 History Jane 28
3 Literature Jane 28
Summary
In conclusion, normalizing JSON data with Pandas is a simple process that can be performed using the json_normalize
function.
Whether you need to normalize the entire JSON data or just specific nested elements, this function provides a convenient and efficient solution.