Output: {"Name":{"0":"Alice","1":"Bob","2":"Charlie"},"Date":{"0":"2021-01-01T00:00:00.000Z","1":"2022-02-01T00:00:00.000Z","2":"2023-03-01T00:00:00.000Z"},"Age":{"0":25,"1":30,"2":35}}
Pandas Json
JSON (**J**ava**S**cript **O**bject **N**otation, JavaScript Object Notation) is a syntax for storing and exchanging text information, similar to XML.
JSON is smaller, faster, and easier to parse than XML. For more JSON content, refer to the (#).
Pandas provides powerful methods for handling JSON-formatted data, supporting reading data from JSON files or strings and converting it to a DataFrame, as well as converting a DataFrame back to JSON format.
| **Operation** | **Method** | **Description** |
| --- | --- | --- |
| Read data from JSON file/string | `pd.read_json()` | Reads and loads data from JSON into a DataFrame |
| Convert DataFrame to JSON | `DataFrame.to_json()` | Converts a DataFrame to JSON-formatted data, allowing specification of the structure |
| Support for JSON structure | `orient` parameter | Supports multiple structure types, such as `split`, `records`, `columns` |
Pandas can handle JSON data conveniently. This tutorial uses [sites.json](https://static.jyshare.com/download/sites.json) as an example, with the following content:
## Example
[{"id": "A001", "name": "", "url": "www.", "likes": 61}, {"id": "A002", "name": "Google", "url": "www.google.com", "likes": 124}, {"id": "A003", "name": "Taobao", "url": "www.taobao.com", "likes": 45}]
* * *
## Load Data from JSON File/String
### pd.read_json() - Read JSON Data
read_json() is used to read and load data from JSON format into a DataFrame. It supports loading data from a JSON file, a JSON string, or a JSON URL.
**Syntax:**
import pandas as pd df = pd.read_json( path_or_buffer, # JSON file path, JSON string, or URL orient=None, # Structure of the JSON data, default is 'columns' dtype=None, # Force data types for columns convert_axes=True, # Whether to convert axes to appropriate data types convert_dates=True, # Whether to parse dates as date types keep_default_na=True # Whether to keep default missing value markers)
**Parameter Description:**
| **Parameter** | **Description** | **Default Value** |
| --- | --- | --- |
| `path_or_buffer` | Path to the JSON file, JSON string, or URL | Required parameter |
| `orient` | Defines the format of the JSON data. Common values are `split`, `records`, `index`, `columns`, `values`. | `None` (automatically inferred from the file) |
| `dtype` | Force data types for columns | `None` |
| `convert_axes` | Whether to convert axes to appropriate data types | `True` |
| `convert_dates` | Whether to parse dates as date types | `True` |
| `keep_default_na` | Whether to keep default missing value markers (e.g., `NaN`) | `True` |
**Common orient parameter options:**
| **orient Value** | **JSON Format Example** | **Description** |
| --- | --- | --- |
| `split` | `{"index":["a","b"],"columns":["A","B"],"data":[[1,2],[3,4]]}` | Uses keys `index`, `columns`, and `data` for structure |
| `records` | `[{"A":1,"B":2},{"A":3,"B":4}]` | Each record is a dictionary representing a row of data |
| `index` | `{"a":{"A":1,"B":2},"b":{"A":3,"B":4}}` | Uses index as keys, with values as dictionaries |
| `columns` | `{"A":{"a":1,"b":3},"B":{"a":2,"b":4}}` | Uses column names as keys, with values as dictionaries |
| `values` | `[[1,2],[3,4]]` | Returns only the data, without index or column names |
Load data from a JSON file:
## Example
import pandas as pd
df = pd.read_json('sites.json')
print(df.to_string())
to_string() is used to return data in DataFrame type. We can also directly process JSON strings.
## Example
import pandas as pd
data =[
{
"id": "A001",
"name": "",
"url": "www.",
"likes": 61
},
{
"id": "A002",
"name": "Google",
"url": "www.google.com",
"likes": 124
},
{
"id": "A003",
"name": "Taobao",
"url": "www.taobao.com",
"likes": 45
}
]
df = pd.DataFrame(data)
print(df)
The output of the above example is:
id name url likes 0 A001 www. 611 A002 Google www.google.com 1242 A003 Taobao www.taobao.com 45
JSON objects have the same format as Python dictionaries, so we can directly convert a Python dictionary to a DataFrame:
## Example
import pandas as pd
# JSON in dictionary format
s ={
"col1":{"row1":1,"row2":2,"row3":3},
"col2":{"row1":"x","row2":"y","row3":"z"}
}
# Read JSON and convert to DataFrame
df = pd.DataFrame(s)
print(df)
The output of the above example is:
col1 col2 row1 1 x row2 2 y row3 3 z
Read JSON data from a URL:
## Example
import pandas as pd
URL ='https://static.jyshare.com/download/sites.json'
df = pd.read_json(URL)
print(df)
The output of the above example is:
id name url likes 0 A001 www. 611 A002 Google www.google.com 1242 A003 Taobao www.taobao.com 45
Load data from a JSON string:
## Example
import pandas as pd
# JSON string
json_data ='''
[
{"Name": "Alice", "Age": 25, "City": "New York"},
{"Name": "Bob", "Age": 30, "City": "Los Angeles"},
{"Name": "Charlie", "Age": 35, "City": "Chicago"}
]
'''
# Read data from JSON string
df = pd.read_json(json_data)
print(df)
Output:
Name Age City0 Alice 25 New York1 Bob 30 Los Angeles2 Charlie 35 Chicago
Read from JSON data (specifying orient as records):
## Example
import pandas as pd
# JSON data
json_data ='''
[
{"Name": "Alice", "Age": 25, "City": "New York"},
{"Name": "Bob", "Age": 30, "City": "Los Angeles"},
{"Name": "Charlie", "Age": 35, "City": "Chicago"}
]
'''
# Read data from JSON string, specifying orient='records'
df = pd.read_json(json_data, orient='records')
print(df)
Output:
Name Age City0 Alice 25 New York1 Bob 30 Los Angeles2 Charlie 35 Chicago
### Nested JSON Data
Suppose there is a nested JSON data file **nested_list.json**:
## nested_list.json File Content
{
"school_name":"ABC primary school",
"class":"Year 1",
"students":[
{
"id":"A001",
"name":"Tom",
"math":60,
"physics":66,
"chemistry":61
},
{
"id":"A002",
"name":"James",
"math":89,
"physics":76,
"chemistry":51
},
{
"id":"A003",
"name":"Jenny",
"math":79,
"physics":90,
"chemistry":78
}]
}
Use the following code to format the complete content:
## Example
import pandas as pd
df = pd.read_json('nested_list.json')
print(df)
The output of the above example is:
school_name class students 0 ABC primary school Year 1 {'id': 'A001', 'name': 'Tom', 'math': 60, 'phy... 1 ABC primary school Year 1 {'id': 'A002', 'name': 'James', 'math': 89, 'p...2 ABC primary school Year 1 {'id': 'A003', 'name': 'Jenny', 'math': 79, 'p...
At this point, we need to use the json_normalize() method to completely parse the nested data:
## Example
import pandas as pd
import json
# Load data using Python JSON module
with open('nested_list.json','r')as f:
data = json.loads(f.read())
# Flatten the data
df_nested_list = pd.json_normalize(data, record_path =['students'])
print(df_nested_list)
The output of the above example is:
id name math physics chemistry 0 A001 Tom 60 66 611 A002 James 89 76 512 A003 Jenny 79 90 78
data = json.loads(f.read()) loads data using the Python JSON module.
json_normalize() uses the parameter **record_path** set to **['students']** to flatten the nested JSON data **students**.
The result does not yet include the school_name and class elements. If you need to display them, you can use the meta parameter to show this metadata:
## Example
import pandas as pd
import json
# Load data using Python JSON module
with open('nested_list.json','r')as f:
data = json.loads(f.read())
# Flatten the data
df_nested_list = pd.json_normalize(
data,
record_path =['students'],
meta=['school_name','class']
)
print(df_nested_list)
The output of the above example is:
id name math physics chemistry school_name class0 A001 Tom 60 66 61 ABC primary school Year 11 A002 James 89 76 51 ABC primary school Year 12 A003 Jenny 79 90 78 ABC primary school Year 1
Next, let's try reading more complex JSON data, which contains nested lists and dictionaries. The data file **nested_mix.json** is as follows:
## nested_mix.json File Content
{
"school_name":"local primary school",
"class":"Year 1",
"info":{
"president":"John Kasich",
"address":"ABC road, London, UK",
"contacts":{
"email":"admin@e.com",
"tel":"123456789"
}
},
"students":[
{
"id":"A001",
"name":"Tom",
"math":60,
"physics":66,
"chemistry":61
},
{
"id":"A002",
"name":"James",
"math":89,
"physics":76,
"chemistry":51
},
{
"id":"A003",
"name":"Jenny",
"math":79,
"physics":90,
"chemistry":78
}]
}
Convert nested_mix.json file to DataFrame:
## Example
import pandas as pd
import json
# Load data using Python JSON module
with open('nested_mix.json','r')as f:
data = json.loads(f.read())
df = pd.json_normalize(
data,
record_path =['students'],
meta=[
'class',
['info','president'],
['info','contacts','tel']
]
)
print(df)
The output of the above example is:
id name math physics chemistry class info.president info.contacts.tel 0 A001 Tom 60 66 61 Year 1 John Kasich 1234567891 A002 James 89 76 51 Year 1 John Kasich 1234567892 A003 Jenny 79 90 78 Year 1 John Kasich 123456789
### Read a Set of Data from Nested Data
Below is the example file **nested_deep.json**. We will only read the **math** field from the nested data:
## nested_deep.json File Content
{
"school_name":"local primary school",
"class":"Year 1",
"students":[
{
"id":"A001",
"name":"Tom",
"grade":{
"math":60,
"physics":66,
"chemistry":61
}
},
{
"id":"A002",
"name":"James",
"grade":{
"math":89,
"physics":76,
"chemistry":51
}
},
{
"id":"A003",
"name":"Jenny",
"grade":{
"math":79,
"physics":90,
"chemistry":78
}
}]
}
Here we need to use the **glom** module to handle data nesting. The **glom** module allows us to use . to access attributes of nested objects.
For the first use, we need to install glom:
pip3 install glom
## Example
import pandas as pd
from glom import glom
df = pd.read_json('nested_deep.json')
data = df['students'].apply(lambda row: glom(row,'grade.math'))
print(data)
The output of the above example is:
0 601 892 79Name: students, dtype: int64
* * *
## Convert DataFrame to JSON
### DataFrame.to_json() - Convert DataFrame to JSON Data
The to_json() method is used to convert a DataFrame to JSON-formatted data, allowing specification of the JSON structure.
Syntax:
df.to_json( path_or_buffer=None, # Output file path or file object. If None, returns a JSON string orient=None, # JSON format, supports 'split', 'records', 'index', 'columns', 'values' date_format=None, # Date format, supports 'epoch', 'iso' default_handler=None, # Custom handler for non-standard types lines=False, # Whether to output each row as a separate line (applicable to 'records' or 'split') encoding='utf-8' # Encoding format)
**Parameter Description:**
| **Parameter** | **Description** | **Default Value** |
| --- | --- | --- |
| `path_or_buffer` | Output file path or file object. If `None`, returns a JSON string | `None` |
| `orient` | Specifies the JSON format structure. Supports `split`, `records`, `index`, `columns`, `values` | `None` (default is `columns`) |
| `date_format` | Date format, supports `'epoch'` or `'iso'` format | `None` |
| `default_handler` | Custom handler for non-standard types (e.g., `datetime`) | `None` |
| `lines` | Whether to output each row as a separate line (applicable to `records` or `split`) | `False` |
| `encoding` | Encoding format for the output file | `'utf-8'` |
## Example
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'Name': ['Alice','Bob','Charlie'],
'Age': [25,30,35],
'City': ['New York','Los Angeles','Chicago']
})
# Convert DataFrame to JSON string
json_str = df.to_json()
print(json_str)
Convert DataFrame to JSON file (specifying orient='records'):
## Example
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'Name': ['Alice','Bob','Charlie'],
'Age': [25,30,35],
'City': ['New York','Los Angeles','Chicago']
})
# Convert DataFrame to JSON file, specifying orient='records'
df.to_json('data.json', orient='records', lines=True)
# Output the generated file content:
# [
# {"Name":"Alice","Age":25,"City":"New York"},
# {"Name":"Bob","Age":30,"City":"Los Angeles"},
# {"Name":"Charlie","Age":35,"City":"Chicago"}
# ]
Convert DataFrame to JSON and specify date format:
## Example
import pandas as pd
# Create DataFrame with date data
df = pd.DataFrame({
'Name': ['Alice','Bob','Charlie'],
'Date': pd.to_datetime(['2021-01-01','2022-02-01','2023-03-01']),
'Age': [25,30,35]
})
# Convert DataFrame to JSON, specifying date format as 'iso'
json_str = df.to_json(date_format='iso')
print(json_str)
YouTip