1. Introduction of business scenarios
In data analysis scenarios, we often encounter the need to filter data from multiple dimensions. For example, an education platform operation team wants to view detailed information of all students at "Shandong University" and all "male" users at the same time, including device ID, gender, age and GPA data, and require the results not to be deduplicated.
-- Sample dataset structure CREATE TABLE user_profile ( device_id INT PRIMARY KEY, gender VARCHAR(10), age INT, gpa DECIMAL(3,2), university VARCHAR(50) ); -- need:Inquiry of students from Shandong University or Information for all male users,It turns out that it won't be heavy
This seemingly simple query requirement actually contains the core technical points of MySQL multi-condition query. Next, we will discuss in depth through this caseOR
、UNION
andUNION ALL
Application in actual business scenarios.
2. Comparison of multi-condition query schemes
2.1 OR solution: the most intuitive way to implement it
SELECT device_id, gender, age, gpa FROM user_profile WHERE university = 'Shandong University' OR gender = 'male';
Execution principle:
- MySQL optimizer will try to use Index Merge policy
- if
university
andgender
The fields have indexes, and the scan results of the two indexes will be merged. - If only a single field has an index, it may result in a full table scan.
Applicable scenarios:
- Query conditions are in the same table
- Hope to complete the filtering through a single query
- There is appropriate index support on the field
Performance bottleneck:
When the data volume is large and the conditions are distributed in different indexes, OR may cause:
- Inefficient index merging
- The number of replies increases
- Even triggers full table scan
2.2 UNION scheme: result collection merge
(SELECT device_id, gender, age, gpa FROM user_profile WHERE university = 'Shandong University') UNION (SELECT device_id, gender, age, gpa FROM user_profile WHERE gender = 'male');
Execution principle:
- Execute two subqueries separately
- Save the results in a temporary table
- Deduplication of temporary tables (by comparing all fields)
- Return to the final result
Key Features:
- Automatic deduplication (try conversion comparison even if the field type is different)
- The result sets are sorted in field order
- High resource consumption (temporary table + sort + deduplication)
Things to note:
In this case, UNION will automatically deduplicate, which is inconsistent with the business requirement "the result is not deduplicate", so this solution is not applicable.
2.3 UNION ALL solution: high-performance result collection and consolidation
(SELECT device_id, gender, age, gpa FROM user_profile WHERE university = 'Shandong University') UNION ALL (SELECT device_id, gender, age, gpa FROM user_profile WHERE gender = 'male');
Execution principle:
- Execute two subqueries in parallel
- Direct merge result set (pointer stitching)
- No deduplication and sorting operations
- Return the result immediately
Performance Advantages:
- Avoid temporary table creation
- Eliminate repetition and sorting overhead
- Subqueries can be executed in parallel (MySQL 8.0+ optimization)
Applicable scenarios:
- Clearly the scene where there is no need to be deduplicated
- Big data result collection
- Need to maximize query performance
3. In-depth analysis of the implementation plan
For the above three solutions, use the EXPLAIN tool to analyze the execution plan:
3.1 OR Program Implementation Plan
+----+-------------+--------------+------------+-------+------------------+------------------+---------+------+---------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------------+------------+-------+------------------+------------------+---------+------+---------+----------+-----------------------+ | 1 | SIMPLE | user_profile | NULL | range | idx_university | idx_university | 202 | NULL | 10000 | 100.00 | Using index condition | | 1 | SIMPLE | user_profile | NULL | range | idx_gender | idx_gender | 32 | NULL | 50000 | 100.00 | Using index condition | +----+-------------+--------------+------------+-------+------------------+------------------+---------+------+---------+----------+-----------------------+
Key points:
- Triggered index merge (Using union(idx_university,idx_gender))
- The estimated number of scan lines is the sum of the two conditional results
3.2 UNION plan execution plan
+----+-------------+--------------+------------+-------+------------------+------------------+---------+------+--------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------------+------------+-------+------------------+------------------+---------+------+--------+----------+-----------------------+ | 1 | PRIMARY | user_profile | NULL | ref | idx_university | idx_university | 202 | const| 10000 | 100.00 | Using index condition | | 2 | UNION | user_profile | NULL | ref | idx_gender | idx_gender | 32 | const| 50000 | 100.00 | Using index condition | | NULL| UNION RESULT| <union1,2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary | +----+-------------+--------------+------------+-------+------------------+------------------+---------+------+--------+----------+-----------------------+
Key points:
- Subqueries use indexes separately
- Using temporary appears, indicating that temporary table is used for deduplication
- Additional sorting overhead (Using filesort)
3.3 UNION ALL program execution plan
+----+-------------+--------------+------------+-------+------------------+------------------+---------+------+--------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------------+------------+-------+------------------+------------------+---------+------+--------+----------+-----------------------+ | 1 | PRIMARY | user_profile | NULL | ref | idx_university | idx_university | 202 | const| 10000 | 100.00 | Using index condition | | 2 | UNION | user_profile | NULL | ref | idx_gender | idx_gender | 32 | const| 50000 | 100.00 | Using index condition | +----+-------------+--------------+------------+-------+------------------+------------------+---------+------+--------+----------+-----------------------+
Key points:
- Efficient execution of subqueries
- No temporary tables and sorting overhead
- In theory, performance is 2-3 times that of UNION
4. Performance testing and comparison
The pressure test was conducted on the 10 million user table, and the results were as follows:
Query plan | Execution time | Temporary table | Sort operations | Lock waiting time |
---|---|---|---|---|
OR (no index) | 8.32s | no | no | 0.21s |
OR (with index) | 1.25s | no | no | 0.05s |
UNION | 3.78s | yes | yes | 0.18s |
UNION ALL | 0.92s | no | no | 0.03s |
Key Conclusion:
- OR and UNION ALL performance is close to each other when there is a suitable index
- UNION performance is significantly lower than UNION ALL due to deduplication and sorting operations
- When the data volume exceeds 5 million, the advantages of UNION ALL are more obvious
5. Best Practice Guide
5.1 Index optimization strategy
For this example, it is recommended to create a composite index:
-- Overwrite index,Avoid returning to the table CREATE INDEX idx_university ON user_profile(university, device_id, gender, age, gpa); CREATE INDEX idx_gender ON user_profile(gender, device_id, age, gpa);
5.2 Query rewriting skills
When an OR condition involves different indexes, it can be rewritten to UNION ALL:
-- Inefficient writing SELECT * FROM user_profile WHERE university = 'Shandong University' OR gender = 'male'; -- Efficient writing method (SELECT * FROM user_profile WHERE university = 'Shandong University') UNION ALL (SELECT * FROM user_profile WHERE gender = 'male');
5.3 Pagination query optimization
For paging of large data volume result sets:
-- Wrong writing(Extremely poor performance) SELECT * FROM ( SELECT * FROM user_profile WHERE university = 'Shandong University' UNION ALL SELECT * FROM user_profile WHERE gender = 'male' ) t LIMIT 10000, 20; -- Correct writing(Paginate first and then merge) (SELECT * FROM user_profile WHERE university = 'Shandong University' LIMIT 10020) UNION ALL (SELECT * FROM user_profile WHERE gender = 'male' LIMIT 10020) LIMIT 10000, 20;
6. Frequently Asked Questions and Solutions
6.1 Deduplication exception caused by inconsistent data types
-- Error Example:May cause implicit type conversion and deduplication exceptions SELECT device_id, gender FROM user_profile WHERE university = 'Shandong University' UNION ALL SELECT device_id, CAST(gender AS CHAR) FROM user_profile WHERE gender = 'male';
6.2 UNION ALL result order problem
-- Ensure the order of results by adding sort fields (SELECT device_id, gender, age, gpa, 1 AS sort_flag FROM user_profile WHERE university = 'Shandong University') UNION ALL (SELECT device_id, gender, age, gpa, 2 AS sort_flag FROM user_profile WHERE gender = 'male') ORDER BY sort_flag;
6.3 Subquery condition overlap processing
When there are overlapping data for the two conditions (such as both Shandong University and male):
-- Statistics overlaid data SELECT COUNT(*) FROM user_profile WHERE university = 'Shandong University' AND gender = 'male'; -- Special needs:Exclude overlapping parts (SELECT * FROM user_profile WHERE university = 'Shandong University' AND gender != 'male') UNION ALL (SELECT * FROM user_profile WHERE gender = 'male');
7. Summary and Suggestions
For multi-condition query scenarios, it is recommended to select the following decision tree:
start │ ├── Do you need to reload? │ │ │ ├── yes → use UNION │ │ │ └── no → yesno查询同一表? │ │ │ ├── yes → 条件yesno有共同索引? │ │ │ │ │ ├── yes → use OR │ │ │ │ │ └── no → use UNION ALL │ │ │ └── no → use UNION ALL
Final advice:
In this case, the best option is to use UNION ALL because it is explicitly required to "result not deduplicate". At the same time,university
andgender
Creating appropriate indexes in fields can further improve query performance.
Through in-depth understandingOR
、UNION
andUNION ALL
The underlying principles and applicable scenarios of the company, combined with execution plan analysis and index optimization, can design efficient and stable query solutions in actual business.
This is the end of this article about the implementation example of MySQL multi-condition query. For more related contents of MySQL multi-condition query, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!