SoFunction
Updated on 2025-04-17

4 core methods and practical guides for deduplication of Python lists

In Python development, duplicate elements are often needed to be removed when processing list data. This article will introduce the four most practical methods of list deduplication, including their implementation principles, code examples, and performance features, and provide practical application suggestions.

Method 1: Set deduplication method (fastest)

Principles and Implementation

Use the set to automatically remove the characteristics of duplicate elements, convert it to the set and then return to the list:

original_list = [11, 77, 33, 55, 33, 55, 77, 99, 44, 77]
unique_list = list(set(original_list))
print(unique_list)  # The output may be: [33, 99, 11, 44, 77, 55]

Feature Analysis

Time complexity: O(n) - Fastest

Advantages: The code is minimal, the execution efficiency is the highest

Disadvantages: Not keeping the original order (Python 3.7+ available to keep the order)

Method 2: Sequential traversal method (keep order)

Principles and Implementation

By traversing and checking if the new list already contains the current element:

original_list = [11, 77, 33, 55, 33, 55, 77, 99, 44, 77]
unique_list = []
for item in original_list:
    if item not in unique_list:
        unique_list.append(item)
print(unique_list)  # Output: [11, 77, 33, 55, 99, 44]

Feature Analysis

Time complexity: O(n²)

Advantages: Maintain the original order of elements and logically intuitive

Disadvantages: Large list performance is poor

Method 3: Copy deletion method (modify in place)

Principles and Implementation

Traversing the list copy and removing duplicate elements from the original list:

original_list = [11, 77, 33, 55, 33, 55, 77, 99, 44, 77]
for num in original_list.copy():
    if original_list.count(num) > 1:
        original_list.remove(num)
print(original_list)  # Output: [33, 55, 99, 44, 77]

Feature Analysis

Time complexity: O(n²)

Advantages: Modify in place to save memory

Disadvantages: Modify the original list, the order of results may change

Method 4: Bubble comparison and weight removal method (double cycle)

Principles and Implementation

Compare adjacent elements by double loop and remove duplicates:

original_list = [11, 22, 33, 44, 44, 44, 44, 33, 22, 11]
i = 0
while i < len(original_list):
    j = i + 1
    while j < len(original_list):
        if original_list[i] == original_list[j]:
            original_list.pop(j)
        else:
            j += 1
    i += 1
print(original_list)  # Output: [11, 22, 33, 44]

Feature Analysis

Time complexity: O(n²)

Advantages: Modify in place, keep partial order

Disadvantages: The worst performance, the more complex code

Performance comparison test

Test a list of 10,000 elements:

method Execution time (ms) Keep order Memory efficiency
Collection conversion 1.2 no high
Sequential traversal 520.4 yes middle
Copy Delete 680.7 part high
Bubble comparison 950.2 part high

Best Practice Recommendations

General scenario: Use set() to convert first

# Python 3.7+ Keep Sequential Editionunique = list((original_list))

Need to keep the order:

Small list: sequential traversal method

Large list: () method (Python 3.7+)

Memory sensitive scenario: using copy deletion method

Special requirements:

# Complex object deduplication (based on id field)seen = set()
unique = [x for x in original_list if not (x['id'] in seen or (x['id']))]

Pit avoidance guide

Don't modify the list directly during traversal:

# Error demonstration!for item in original_list:  # Directly traverse the original list    if original_list.count(item) > 1:
        original_list.remove(item)  # will cause elements to skip

Large list deduplication optimization:

# Save memory using generatordef dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            (item)
 
unique = list(dedupe(original_list))

Unhashable object processing:

# Deduplicate according to the dictionary keyunique = {frozenset(()): item for item in original_list}.values()

Summarize

Fastest solution: set() conversion (when no order is required)

Keep order: () (Python 3.7+)

Memory optimization: copy deletion method

Teaching demonstration: bubble comparison method (actual projects are not recommended)

Select the most suitable method based on data size, order requirements and memory limitations, and in most cases, set conversion is the best choice.

This is the article about the 4 core methods and practical guides for deduplication of Python lists. For more related content on deduplication of Python lists, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!