\n\n
df.reset_index() is a function in Pandas used to reset the row index of a DataFrame.
During data processing, indices may become discontinuous or not meet your needs. reset_index() can reset the index to the default integer index (0, 1, 2, ...), or convert the current index into a column. This is very useful when cleaning, sorting, and filtering data.
\n\n
Basic Syntax and Parameters
\n\nreset_index() is a member function of DataFrame, called via the dot operator ..
Syntax Format
\n\nDataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')\n\n\nParameter Description
\n\n| Parameter | \nType | \nRequired | \nDescription | \nDefault | \n
|---|---|---|---|---|
| level | \nint, str, tuple or list | \nOptional | \nSpecify the index level(s) to reset. For MultiIndex, you can choose to reset certain levels. Default resets all levels. None | \nNone | \n
| drop | \nbool | \nOptional | \nIf True, discard the original index without converting it to a column; if False, keep the original index as a new column. False | \nFalse | \n
| inplace | \nbool | \nOptional | \nIf True, modify the original DataFrame directly without returning a new object; if False, return a new DataFrame, leaving the original unchanged. False | \nFalse | \n
| col_level | \nint or str | \nOptional | \nIf column names are MultiIndex, specify which level to insert the index into. 0 | \n0 | \n
| col_fill | \nstr | \nOptional | \nIf column names are MultiIndex, used to fill column names at other levels when the original index is converted to a column. '' | \n'' | \n
Return Value Description
\n\n- \n
- Returns a new DataFrame (if
inplace=False), orNone(ifinplace=True). \n - The returned DataFrame has its index reset, and the original index (if
drop=False) will be retained as a new column. \n
\n\n
Examples
\n\nLet's go through a series of examples to fully master the usage of reset_index().
Example 1: Basic Usage - Reset Index
\n\nThe most common use case is resetting the index to the default integer index.
\n\nExample
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a DataFrame\n\ndata = {\n\n'Name': ['Zhang San','Li Si','Wang Wu'],\n\n'Score': [85,92,78]\n\n}\n\ndf = pd.DataFrame(data)\n\nprint("Original Data (Default Index):")\nprint(df)\nprint("=" * 50)\n\n# Sort data, causing index to be discontinuous\n\ndf_sorted = df.sort_values(by='Score', ascending=False)\n\nprint("Sorted Data (Discontinuous Index):")\nprint(df_sorted)\nprint("=" * 50)\n\n# Reset index\n\ndf_reset = df_sorted.reset_index()\n\nprint("After Resetting Index:")\nprint(df_reset)\n\n\nExpected Output:
\n\n\nOriginal Data (Default Index):\n Name Score\n0 Zhang San 85\n1 Li Si 92\n2 Wang Wu 78\n==================================================\nSorted Data (Discontinuous Index):\n Name Score\n1 Li Si 92\n0 Zhang San 85\n2 Wang Wu 78\n==================================================\nAfter Resetting Index:\n Name Score index\n0 Li Si 92 1\n1 Zhang San 85 0\n2 Wang Wu 78 2\n\n\n
Code Explanation:
\n\n- \n
- After sorting, the index becomes 1, 0, 2 instead of continuous 0, 1, 2. \n
- Using
reset_index(), the index is reset to 0, 1, 2. \n - The original index is preserved as a new column named "index". \n
Example 2: Using drop Parameter to Discard Original Index
\n\nIf you don't need to retain the original index, you can use the drop=True parameter.
Example
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a DataFrame\n\ndata = {\n\n'Name': ['Zhang San','Li Si','Wang Wu'],\n\n'Score': [85,92,78]\n\n}\n\ndf = pd.DataFrame(data)\n\n# Sort data\n\ndf_sorted = df.sort_values(by='Score', ascending=False)\n\nprint("Sorted Data:")\nprint(df_sorted)\nprint("=" * 50)\n\n# Reset index, discarding original index\n\ndf_reset = df_sorted.reset_index(drop=True)\n\nprint("After Resetting Index (Discarding Original Index):")\nprint(df_reset)\n\n\nExpected Output:
\n\n\nSorted Data:\n Name Score\n1 Li Si 92\n0 Zhang San 85\n2 Wang Wu 78\n==================================================\nAfter Resetting Index (Discarding Original Index):\n Name Score\n0 Li Si 92\n1 Zhang San 85\n2 Wang Wu 78\n\n\n
Code Explanation:
\n\n- \n
- With
drop=True, the original index is discarded and not kept as a new column. \n - This is useful when you don't need to preserve original index information. \n
Example 3: Reset Index After Filtering Data
\n\nAfter filtering data, the index might be discontinuous. Using reset_index() can organize the data.
Example
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a DataFrame with all student scores\n\ndata = {\n\n'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu','Qian Qi'],\n\n'Score': [85,92,78,90,88]\n\n}\n\ndf = pd.DataFrame(data)\n\nprint("Original Data:")\nprint(df)\nprint("=" * 50)\n\n# Filter students with scores greater than 85\n\ndf_filtered = df[df['Score']>85]\n\nprint("Filtered Data (Discontinuous Index):")\nprint(df_filtered)\nprint("=" * 50)\n\n# Reset index\n\ndf_reset = df_filtered.reset_index(drop=True)\n\nprint("After Resetting Index:")\nprint(df_reset)\n\n\nExpected Output:
\n\n\nOriginal Data:\n Name Score\n0 Zhang San 85\n1 Li Si 92\n2 Wang Wu 78\n3 Zhao Liu 90\n4 Qian Qi 88\n==================================================\nFiltered Data (Discontinuous Index):\n Name Score\n1 Li Si 92\n3 Zhao Liu 90\n4 Qian Qi 88\n==================================================\nAfter Resetting Index:\n Name Score\n0 Li Si 92\n1 Zhao Liu 90\n2 Qian Qi 88\n\n\n
Code Explanation:
\n\n- \n
- After filtering, the retained indices are 1, 3, 4, not continuous 0, 1, 2. \n
- Using
reset_index(drop=True)gives continuous indices, making subsequent processing easier. \n
Example 4: Reset Index After Removing Duplicate Rows
\n\nAfter removing duplicate rows, you can also use reset_index() to organize the index.
Example
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a DataFrame with duplicate rows\n\ndata = {\n\n'Name': ['Zhang San','Li Si','Zhang San','Wang Wu'],\n\n'City': ['Beijing','Shanghai','Beijing','Guangzhou']\n\n}\n\ndf = pd.DataFrame(data)\n\nprint("Original Data (With Duplicate Rows):")\nprint(df)\nprint("=" * 50)\n\n# Remove duplicate rows\n\ndf_unique = df.drop_duplicates()\n\nprint("After Removing Duplicates (Discontinuous Index):")\nprint(df_unique)\nprint("=" * 50)\n\n# Reset index\n\ndf_reset = df_unique.reset_index(drop=True)\n\nprint("After Resetting Index:")\nprint(df_reset)\n\n\nExpected Output:
\n\n\nOriginal Data (With Duplicate Rows):\n Name City\n0 Zhang San Beijing\n1 Li Si Shanghai\n2 Zhang San Beijing\n3 Wang Wu Guangzhou\n==================================================\nAfter Removing Duplicates (Discontinuous Index):\n Name City\n0 Zhang San Beijing\n1 Li Si Shanghai\n3 Wang Wu Guangzhou\n==================================================\nAfter Resetting Index:\n Name City\n0 Zhang San Beijing\n1 Li Si Shanghai\n2 Wang Wu Guangzhou\n\n\n
Code Explanation:
\n\n- \n
- After removing duplicates, the retained indices are 0, 1, 3, not continuous 0, 1, 2. \n
- Resetting the index makes the data cleaner. \n
Example 5: Reset Index After Handling Missing Values
\n\nAfter deleting rows with missing values, you can also reset the index.
\n\nExample
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a DataFrame with missing values\n\ndata = {\n\n'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu'],\n\n'Score': [85, np.nan,92,78]\n\n}\n\ndf = pd.DataFrame(data)\n\nprint("Original Data:")\nprint(df)\nprint("=" * 50)\n\n# Delete rows with missing values\n\ndf_cleaned = df.dropna()\n\nprint("After Deleting Missing Values (Discontinuous Index):")\nprint(df_cleaned)\nprint("=" * 50)\n\n# Reset index\n\ndf_reset = df_cleaned.reset_index(drop=True)\n\nprint("After Resetting Index:")\nprint(df_reset)\n\n\nExpected Output:
\n\n\nOriginal Data:\n Name Score\n0 Zhang San 85.0\n1 Li Si NaN\n2 Wang Wu 92.0\n3 Zhao Liu 78.0\n==================================================\nAfter Deleting Missing Values (Discontinuous Index):\n Name Score\n0 Zhang San 85.0\n2 Wang Wu 92.0\n3 Zhao Liu 78.0\n==================================================\nAfter Resetting Index:\n Name Score\n0 Zhang San 85.0\n1 Wang Wu 92.0\n2 Zhao Liu 78.0\n\n\n
Code Explanation:
\n\n- \n
- After deleting the second row (Li Si's score is NaN), the retained indices are 0, 2, 3. \n
- Using
reset_index()gives continuous indices 0, 1, 2. \n
Example 6: Using inplace Parameter for In-Place Modification
\n\nUsing inplace=True modifies the original DataFrame directly.
Example
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a DataFrame\n\ndata = {\n\n'Name': ['Zhang San','Li Si','Wang Wu'],\n\n'Score': [85,92,78]\n\n}\n\ndf = pd.DataFrame(data)\n\n# Filter data\n\ndf_filtered = df[df['Score']>80]\n\nprint("Filtered Data (Not Reset):")\nprint(df_filtered)\nprint(f"Data ID: {id(df_filtered)}")\nprint("=" * 50)\n\n# Use inplace=True to reset index in place\n\ndf_filtered.reset_index(drop=True, inplace=True)\n\nprint("After Resetting with inplace=True:")\nprint(df_filtered)\nprint(f"Data ID: {id(df_filtered)}")\n\n\nExpected Output:
\n\n\nFiltered Data (Not Reset):\n Name Score\n1 Li Si 92\n0 Zhang San 85\nData ID: 140234567890\n==================================================\nAfter Resetting with inplace=True:\n Name Score\n0 Li Si 92\n1 Zhang San 85\nData ID: 140234567890 # Same object\n\n\n
Code Explanation:
\n\n- \n
- With
inplace=True, the DataFrame's memory address (ID) remains unchanged. \n - This method saves memory. \n
Example 7: Resetting Multi-Level Index
\n\nFor MultiIndex, you can reset specific levels or all levels.
\n\nExample
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a DataFrame with MultiIndex\n\ndf = pd.DataFrame({\n\n'Score': [85,92,78,88,90,95]\n\n}, index=pd.MultiIndex.from_tuples([\n\n('Class 1','Zhang San'),('Class 1','Li Si'),('Class 1','Wang Wu'),\n\n('Class 2','Zhao Liu'),('Class 2','Qian Qi'),('Class 2','Sun Ba')\n\n], names=['Class','Name']))\n\nprint("Original Data (MultiIndex):")\nprint(df)\nprint("=" * 50)\n\n# Reset all index levels\n\ndf_reset_all = df.reset_index()\n\nprint("Reset All Index Levels:")\nprint(df_reset_all)\nprint("=" * 50)\n\n# Reset only the first index level (Class)\n\ndf_reset_first = df.reset_index(level=0)\n\nprint("Reset Only Class Index:")\nprint(df_reset_first)\n\n\nExpected Output:
\n\n\nOriginal Data (MultiIndex):\n Score\nClass Name\nClass 1 Zhang San 85\n Li Si 92\n Wang Wu 78\nClass 2 Zhao Liu 88\n Qian Qi 90\n Sun Ba 95\n==================================================\nReset All Index Levels:\n Class Name Score\n0 Class 1 Zhang San 85\n1 Class 1 Li Si 92\n2 Class 1 Wang Wu 78\n3 Class 2 Zhao Liu 88\n4 Class 2 Qian Qi 90\n5 Class 2 Sun Ba 95\n==================================================\nReset Only Class Index:\n Score\nName\nZhang San 85\nLi Si 92\nWang Wu 78\nZhao Liu 88\nQian Qi 90\nSun Ba 95\n\n\n
Code Explanation:
\n\n- \n
- A DataFrame with MultiIndex can use
reset_index()to convert indices into columns. \n - The
levelparameter allows you to choose which index level to reset. \n
\n\n
Notes
\n\n- \n
reset_index()does not modify the original DataFrame by default. To modify in place, use theinplace=Trueparameter. \n- By default (
drop=False), the original index is retained as a new column. If not needed, usedrop=True. \n - Before using
reset_index(), ensure no important index information needs to be retained. \n - For time-series data, after resetting the index, you may need to re-set the time index. \n
- In chained operations (e.g., filtering then sorting), the index may change after each operation. Using
reset_index()appropriately can maintain code predictability. \n
\n\n
YouTip