Pandas Df Dropna

[![Image 1: Pandas Common Functions](#) Pandas Common Functions](#) * * * `df.dropna()` is a function in Pandas used to delete rows or columns containing missing values. In data analysis, missing values (`NaN`, `None`, or `NULL`) are a common problem. The `dropna()` function provides a simple way to clean data. You can choose to delete rows or columns containing missing values, or filter data based on the proportion of missing values. * * * ## Basic Syntax and Parameters `dropna()` is a member function of DataFrame, called via the dot operator `.`. ### Syntax Format DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False) ### Parameter Description | Parameter | Type | Required | Description | Default Value | | --- | --- | --- | --- | --- | | axis | int or str | Optional | Specifies whether to delete rows or columns. `0` or `'index'` means deleting rows containing missing values; `1` or `'columns'` means deleting columns containing missing values. | 0 | | how | str | Optional | Specifies the deletion condition. `'any'` means deleting if there is any missing value; `'all'` means deleting only if all values are missing. | 'any' | | thresh | int | Optional | Sets the minimum number of non-missing values. If the number of non-missing values in a row (or column) is less than `thresh`, the row (or column) is deleted. | None | | subset | array-like | Optional | Specifies in which columns (or rows) to check for missing values. If `axis=0`, `subset` specifies column names; if `axis=1`, `subset` specifies row indices. | None | | inplace | bool | Optional | If `True`, modifies the original DataFrame directly without returning a new object; if `False`, returns a new DataFrame, leaving the original data unchanged. | False | ### Return Value Description * Returns a new DataFrame (if `inplace=False`), or `None` (if `inplace=True`). * The returned DataFrame does not contain rows or columns with missing values. * * * ## Examples Let's thoroughly master the usage of `dropna()` through a series of examples. ### Example 1: Deleting Rows with Missing Values This is the most common usage, deleting any row that contains missing values. ## Example import pandas as pd import numpy as np # Create a DataFrame containing missing values data = { 'Name': ['Zhang San', 'Li Si', 'Wang Wu', 'Zhao Liu', 'Qian Qi'], 'Age': [25, 30, np.nan, 35, 28], # Wang Wu's age is missing 'Salary': [5000, 6000, 5500, np.nan, 7000], # Zhao Liu's salary is missing 'Department': ['Tech', 'Marketing', 'Tech', 'Marketing', 'Tech'] } df = pd.DataFrame(data) print("Original Data:") print(df) print("=" * 50) # Delete rows containing missing values (default behavior) df_cleaned = df.dropna() print("Data after deleting missing values:") print(df_cleaned) **Expected Output:** Original Data: Name Age Salary Department 0 Zhang San 25.0 5000.0 Tech 1 Li Si 30.0 6000.0 Marketing 2 Wang Wu NaN 5500.0 Tech 3 Zhao Liu 35.0 NaN Marketing 4 Qian Qi 28.0 7000.0 Tech ================================================== Data after deleting missing values: Name Age Salary Department 0 Zhang San 25.0 5000.0 Tech 1 Li Si 30.0 6000.0 Marketing 4 Qian Qi 28.0 7000.0 Tech **Code Analysis:** 1. We created a DataFrame containing missing values, where Wang Wu's age is `NaN` and Zhao Liu's salary is `NaN`. 2. Using t

YouTip

Pandas Df Dropna

📂 Categories