Summary of using case_when method in Pandas

In Python data analysis, Pandas is a powerful library for processing and analyzing data. It provides a wide variety of methods and functions to make data conversion and manipulation easier. In this article, we will explore in-depthcase_when()Method, which can be used to conditionally create new columns, similar to the one in SQLCASE WHENSentence. We will discuss the usage of this method in detail and provide rich sample code.

What is the case_when() method?

case_when()The method is a function in the Pandas library that allows us to create new columns based on conditions. This method is usually used to generate new data columns based on certain characteristics or conditions of the data, similar to using the if-else statement for conditional judgment.

In Pandas,case_when()Methods are usually withapply()Methods are used in conjunction to operate on each row of data according to conditions. It provides a more flexible way to process data without writing a large number of conditional judgment statements.

The syntax of the case_when() method

case_when()The syntax of the method is as follows:

.case_when(conditions, values, default=None, *args, **kwargs)

Parameter description:

conditions: A list of conditions that defines when to apply which value.
values: A list of values corresponding to each condition in the condition list, which specifies the value to be applied when the condition is satisfied.
default: Optional parameter to specify the default value, which will be used when no condition matches.
*argsand**kwargs: Other parameters, used to pass toapply()method.

Sample code

Demonstrate with several examplescase_when()Usage of the method.

Example 1: Basic usage

Suppose there is a dataset containing student scores, and we want to judge their rating based on the scores.

Can be usedcase_when()Method to implement this task:

import pandas as pd

# Create a sample datasetdata = {'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Fraction': [85, 70, 95, 60, 75]}

df = (data)

# Define the conditions and the corresponding valuesconditions = [df['Fraction'] &gt;= 90, (df['Fraction'] &gt;= 80) &amp; (df['Fraction'] &lt; 90), df['Fraction'] &lt; 80]
values = ['excellent', 'good', 'Pass']

# Create a new column using the case_when() methoddf['grade'] = df['Fraction'].case_when(conditions, values, default='Failed')

# Output resultprint(df)

The above code will create a new rating column based on the student's scores and assign the corresponding rating to each student according to the criteria.

Example 2: Use default values

Sometimes, some data may not meet any criteria. In this case, the default values can be used to handle these cases:

import pandas as pd

# Create a sample datasetdata = {'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Fraction': [85, 70, 95, 60, None]}

df = (data)

# Define the conditions and the corresponding valuesconditions = [df['Fraction'] &gt;= 90, (df['Fraction'] &gt;= 80) &amp; (df['Fraction'] &lt; 90), df['Fraction'] &lt; 80]
values = ['excellent', 'good', 'Pass']

# Create a new column using the case_when() method and set the default value to 'Unknown'df['grade'] = df['Fraction'].case_when(conditions, values, default='unknown')

# Output resultprint(df)

In this example, a dataset containing null values is used anddefaultThe parameter marks rows without matching conditions as "Unknown".

Example 3: Use the apply() method

case_when()Methods are usually withapply()Methods are used together to operate on each row of the dataset based on multiple conditions.

Here is an example of calculating students’ final grades based on their scores and attendance:

import pandas as pd

# Create a sample datasetdata = {'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Fraction': [85, 70, 95, 60, 75],
        'Attendance': [90, 80, 95, 70, 85]}

df = (data)

# Use the apply() method to apply multiple conditions to each rowdef calculate_grade(row):
    if row['Fraction'] &gt;= 90 and row['Attendance'] &gt;= 90:
        return 'excellent'
    elif row['Fraction'] &gt;= 80 and row['Attendance'] &gt;= 80:
        return 'good'
    elif row['Fraction'] &gt;= 60 and row['Attendance'] &gt;= 70:
        return 'Pass'
    else:
        return 'Failed'

df['Final Results'] = (calculate_grade, axis=1)

# Output resultprint(df)

In this example, useapply()The method defines a custom functioncalculate_grade(),The function calculates the final grade based on multiple conditions and applies it to each row of the dataset.

Example 4: Multi-conditional complexity

Sometimes, a new column needs to be generated based on a combination of multiple conditions.

For example, students can determine whether they are able to receive a scholarship based on their scores and attendance:

import pandas as pd

# Create a sample datasetdata = {'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Fraction': [85, 70, 95, 60, 75],
        'Attendance': [90, 80, 95, 70, 85]}

df = (data)

# Define multiple conditions and corresponding valuesconditions = [(df['Fraction'] &gt;= 90) &amp; (df['Attendance'] &gt;= 90),
              (df['Fraction'] &gt;= 80) &amp; (df['Attendance'] &gt;= 80),
              (df['Fraction'] &gt;= 60) &amp; (df['Attendance'] &gt;= 70)]

values = ['Scholarship', 'Honorary Award', 'qualified']

# Create a new column using the case_when() methoddf['award'] = (lambda row: row['Fraction'].case_when(conditions, values, default='Not won'), axis=1)

# Output resultprint(df)

In this example, multiple conditions and corresponding values are defined and usedapply()The method willcase_when()The method is applied to each row of data and determines whether to receive a scholarship based on a combination of multiple conditions.

Example 5: Generate new columns from multiple columns

Sometimes, new columns need to be generated based on the values of multiple columns.

For example, a total score column can be generated based on student scores and attendance:

import pandas as pd

# Create a sample datasetdata = {'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Score 1': [85, 70, 95, 60, 75],
        'Score 2': [90, 80, 85, 70, 90]}

df = (data)

# Use the apply() method to generate a new columndf['Total Score'] = (lambda row: row['Score 1'] + row['Score 2'], axis=1)

# Output resultprint(df)

In this example, useapply()The method adds the scores of the two columns to generate a new total column.

Example 6: Handling Missing Values

case_when()Methods can also be used to handle missing values.

For example, students can be assigned a grade based on whether the score is missing:

import pandas as pd
import numpy as np

# Create a sample datasetdata = {'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Fraction': [85, , 95, 60, 75]}

df = (data)

# Define the conditions and the corresponding valuesconditions = [df['Fraction'].notna() &amp; (df['Fraction'] &gt;= 90),
              df['Fraction'].notna() &amp; (df['Fraction'] &gt;= 80),
              df['Fraction'].notna() &amp; (df['Fraction'] &gt;= 60)]

values = ['excellent', 'good', 'Pass']

# Create a new column using the case_when() method and handle missing valuesdf['grade'] = df['Fraction'].case_when(conditions, values, default='Failed')

# Output resultprint(df)

In this example, usenotna()Methods to check whether scores are missing and assign grades to each student according to the conditions.

Summarize

In this article, we will explore in-depth thecase_when()The usage of methods includes basic and advanced usage. This method is very useful for data analysis and data conversion tasks, creating new columns based on conditions, processing multi-condition combinations and missing values, and generating new data sets. I hope that through the example code and explanation of this article, I can better understand and apply it.case_when()method. This will help to handle a variety of data analysis and data processing tasks more flexibly.

This is the article about the summary of the case_when() method in Pandas. For more information about the Pandas case_when() method, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!