YouTip LogoYouTip

Pandas Pd Cut

[![Image 1: Python math Module](#) Pandas Common Functions](#)\n\n* * *\n\n`pd.cut()` is a function in the Pandas library used for **binning continuous data**. It divides numerical data into discrete categories based on specified intervals, making it very suitable for grouped statistics and visualization in data analysis.\n\nBinning operations are very common in data analysis, for example, dividing age into "Youth", "Middle-aged", and "Elderly", or dividing scores into "Fail", "Pass", "Good", and "Excellent".\n\n**Word meaning**: `cut` means "to cut", and here it refers to "cutting" continuous data intervals into multiple discrete intervals.\n\n* * *\n\n## Basic Syntax and Parameters\n\n`pd.cut()` is a top-level function in the Pandas library, used to discretize continuous variables.\n\n### Syntax Format\n\npd.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)\n### Parameter Description\n\n* **Parameter**: `x`\n * Type: Array-like object, such as list, Series, array, etc.\n * Description: The continuous numerical data to be binned.\n\n* **Parameter**: `bins`\n * Type: Integer, scalar (interval boundary), or IntervalIndex.\n * Description: The binning method. Can be an integer (indicating the number of equal-width bins) or a list (specifying specific interval boundaries).\n\n* **Parameter**: `right`\n * Type: Boolean.\n * Description> Whether to use right-closed interval. Default is `True` (i.e., [a, b] form). If set to `False`, it is left-closed right-open interval (a, b].\n\n* **Parameter**: `labels`\n * Type: Array or None.\n * Description: Specify custom labels for each interval. If not specified, interval representation is used (such as "(0, 10]").\n\n* **Parameter**: `retbins`\n * Type> Boolean.\n * Description> If `True`, the returned result includes the bin boundaries. Default is `False`.\n\n* **Parameter**: `include_lowest`\n * Type> Boolean.\n * Description> If `True`, the first interval includes the left boundary. Default is `False`.\n\n### Function Description\n\n* **Return value**: Returns a Categorical type Series, with each value corresponding to its interval.\n* **Effect**: Maps continuous numerical values to discrete categories (intervals).\n\n* * *\n\n## Examples\n\nLet's thoroughly master the usage of `pd.cut()` through a series of examples from simple to complex.\n\n### Example 1: Basic Usage - Equal-width Binning\n\n## Example\n\nimport pandas as pd\n\nimport numpy as np\n\n# 1. Create numerical data\n\n scores = pd.Series([55,70,82,45,90,65,78,88,92,58])\n\nprint("=== Original Scores ===")\n\nprint(scores)\n\n# 2. Divide scores into 3 equal-width intervals\n\n result = pd.cut(scores, bins=3)\n\nprint("n=== pd.cut(scores, bins=3) Equal-width Binning ===")\n\nprint(result)\n\n# 3. View bin boundaries\n\n result_bins = pd.cut(scores, bins=3, retbins=True)\n\nprint("n=== Return Bin Boundaries ===")\n\nprint(f"Bin boundaries: {result_bins}")\n\n**Expected Output:**\n\n=== Original Scores ===0 551 702 823 454 905 656 787 888 929 58=== pd.cut(scores, bins=3) Equal-width Binning ===0 (44.667, 60.667]1 (60.667, 76.333]2 (76.333, 92.0]3 (44.667, 60.667]4 (76.333, 92.0]5 (60.667, 76.333]6 (60.667, 76.333]7 (76.333, 92.0]8 (76.333, 92.0]9 (44.667, 60.667] dtype: category Categories (3, interval): [(44.667, 60.667] < (60.667, 76.333] < (76.333, 92.0]=== Return Bin Boundaries ===Bin boundaries: [44.667 60.667 76.333 92. ]\n**Code Analysis:**\n\n1. `bins=3` means automatically dividing the data into 3 equal-width intervals.\n2. Pandas will automatically calculate interval boundaries based on the minimum and maximum values of the data.\n3. The returned type is Categorical, with each value being an Interval object.\n4. Right-closed interval is used by default, i.e., (a, b] form.\n\n### Example 2: Custom Interval Boundaries\n\nIn practical applications, we often need to define bin boundaries based on business requirements.\n\n## Example\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create score data\n\n scores = pd.Series([55,70,82,45,90,65,78,88,92,58,73,67,81,49,95])\n\n# Custom bin boundaries: Fail, Pass, Good, Excellent\n\n bins =[0,60,75,90,100]\n\n labels =['Fail','Pass','Good','Excellent']\n\nresult = pd.cut(scores, bins=bins, labels=labels, include_lowest=True)\n\nprint("=== Original Scores ===")\n\nprint(scores.values)\n\nprint("n=== Binning Result ===")\n\nprint(result)\n\n# Count number of people in each grade\n\nprint("n=== Score Distribution Statistics ===")\n\nprint(result.value_counts().sort_index())\n\n**Expected Output:**\n\n=== Original Scores=== Binning Result0 Pass1 Good2 Good3 Fail4 Excellent...12 Good13 Fail14 Excellent dtype: category Categories (4, interval): [Fail < Pass < Good **Important Notes:**\n> \n> \n> * If data values exceed the range specified by `bins`, `NaN` will be produced\n> * The boundaries of `bins` must be monotonically increasing\n> * When using custom labels, the number of labels must be 1 less than the number of boundaries\n> * The binned result is of Categorical type, and string methods can be used to process it\n\n* * Pandas Common Functions](#)
← Pandas Pd To NumericPandas Pd Merge β†’