MySQLGROUP BY
It is the core function of data aggregation analysis, mainly used to group the result sets by specified columns and perform statistical calculations in combination with the aggregation function. The following is a detailed analysis from basic syntax to advanced usage:
1. Basic syntax and core functions
SELECT Grouped columns, Aggregation function(Calculate columns) FROM Table name [WHERE condition] GROUP BY Grouped columns [HAVING 分组过滤condition] [ORDER BY Sort columns];
Core functions:
- Data grouping: Divide data into logical groups by the value of one or more columns.
-
Aggregation calculation: Apply an aggregate function to each group (e.g.
COUNT
、SUM
、AVG
、MAX
、MIN
) conduct statistics. -
Result filtering:pass
HAVING
Filter the results after grouping (difficult toWHERE
pre-group filter).
2. Basic usage examples
1. Single column grouping statistics
Statistics on the number of employees and average salary in each department:
SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary FROM employees GROUP BY department; --
2. Multi-column combination grouping
Number of employees by department and position:
SELECT department, job_title, COUNT(*) FROM employees GROUP BY department, job_title; --
3. Use in combination with WHERE
Only the average salary of employees with salary exceeding RMB 2,000 is counted:
SELECT department, AVG(salary) FROM employees WHERE salary > 2000 GROUP BY department; --
3. Advanced features and extensions
1. HAVING clause filters grouping
Filter departments with more than 5 employees:
SELECT department, COUNT(*) AS emp_count FROM employees GROUP BY department HAVING emp_count > 5; --
2. WITH ROLLUP generation summary line
Subtotal and total salary for generating departments and positions:
SELECT department, job_title, SUM(salary) FROM employees GROUP BY department, job_title WITH ROLLUP; --
3. GROUP_CONCAT merge column values
Statistics all products purchased by each user (comma-separated):
SELECT user_id, GROUP_CONCAT(product_name SEPARATOR ', ') FROM orders GROUP BY user_id; --
4. Group by expression/function
Number of orders by year:
SELECT YEAR(order_date) AS year, COUNT(*) FROM orders GROUP BY YEAR(order_date); --
4. Precautions and common errors
ONLY_FULL_GROUP_BY mode
MySQL 8.0+ enables this mode by default, requiringSELECT
The non-aggregated columns inGROUP BY
or an error will be reported.
-- Error Example(salaryUnaggregated and ungrouped) SELECT department, salary FROM employees GROUP BY department; -- Correction method:Add an aggregate function or grouping field SELECT department, MAX(salary) FROM employees GROUP BY department;
The difference between WHERE and HAVING
-
WHERE
Filter row data before grouping, and the aggregate function cannot be used. -
HAVING
Filter group data after grouping, which must be combined with the aggregation conditions.
Performance optimization suggestions
- Create an index on a grouped column (e.g.
ALTER TABLE employees ADD INDEX(department)
)。 - To avoid grouping large tables directly, you can first narrow the data scope through temporary tables or subqueries.
5. Classic case scenarios
1. Aggregate by time dimension
Total monthly sales:
SELECT YEAR(sale_date) AS year, MONTH(sale_date) AS month, SUM(amount) FROM sales GROUP BY year, month; --
2. Multi-level statistics
Analyze the total and average order amount of each customer each year:
SELECT customer_id, YEAR(order_date), SUM(total_amount), AVG(total_amount) FROM orders GROUP BY customer_id, YEAR(order_date); --
3. Data removal
Find users with duplicate mailboxes:
SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1; --
6. Optimization of aggregation efficiency
Optimize in MySQLGROUP BY
Aggregation efficiency needs to start from multiple dimensions such as index design, query logic, and execution engine characteristics. The following summarizes 9 core optimization strategies based on the latest optimization practices and database engine features:
1. Index optimization strategy
Composite index precisely match grouped columns
• Create andGROUP BY
Composite indexes that match exactly in sequence (e.g.GROUP BY a,b
Create(a,b)
index), can triggerLoose index scan, reduce disk I/O by more than 90%.
• Typical cases: when(department, job_title)
Composite index when groupedidx_dept_job
Queries can be made to skip full table scans and groupings are completed directly through the index.
Overwrite indexes to avoid back to tables
• make sureSELECT
Columns and the columns involved in the aggregate function are included in the index. For example, index(category, sales)
, querySELECT category, SUM(sales)
The calculation can be completed directly through the index without accessing data lines.
Use function index to deal with complex grouping
• Grouping expressions (e.g.YEAR(date_col)
), create virtual columns or function indexes (supported by MySQL 8.0+). For example:
ALTER TABLE orders ADD COLUMN year_date INT AS (YEAR(order_date)) VIRTUAL; CREATE INDEX idx_year ON orders(year_date);
2. Query design and execution optimization
Reduce the number and complexity of grouped fields
• For each additional grouping field, the sorting complexity increases exponentially. Preferential merge of related fields (such asprovince
andcity
Merge intoregion
field).
• AvoidGROUP BY
Use functions in , otherwise the index will fail. It needs to be rewritten to group based on the original field, ifGROUP BY DATE(created_at)
Change toGROUP BY created_at_date
Precalculated column.
Stage filtration and aggregation
• Filter irrelevant data through subqueries first and then group them:
SELECT department, AVG(salary) FROM (SELECT * FROM employees WHERE salary > 5000) AS filtered GROUP BY department; -- More directHAVINGImprove efficiency40%
Memory sorting and temporary table optimization
• Adjustmenttmp_table_size
andmax_heap_table_size
Parameters (recommended to set to 20% of physical memory) to avoid temporary table drops.
• MonitoringCreated_tmp_disk_tables
Status variables. If disk temporary tables appear frequently, the index needs to be optimized or split queries are required.
3. Advanced optimization technology
Partition table accelerates big data processing
• Partition by time or business dimension (such as by monthly partition) to makeGROUP BY
Scan only specific partitions. For example, for a 1 billion log tableevent_date
After partitioning, the monthly statistics time is reduced from minute to second level.
Materialized view and result cache
• Use materialized views for high-frequency aggregation queries (such as throughCREATE TABLE mv AS SELECT...
Refresh regularly) to reduce real-time calculation pressure.
• The application layer caches duplicate query results (such as Redis caches daily summary data) to reduce database load.
Parallel Query (MySQL 8.0+)
• Enableparallel_query
Function, processing complex grouping through multithreading:
SET SESSION optimizer_switch='parallel_query=on'; SELECT region, SUM(revenue) FROM sales GROUP BY region; -- Utilize multi-coreCPUaccelerate
4. Diagnostic tools and precautions
• Execution plan analysis
useEXPLAIN FORMAT=JSON
observeusing_index
(Whether to use indexing),using_temporary
(Whether to use a temporary table)filesort
Key indicators such as (sorting method).
• Strict mode evasion errors
EnableONLY_FULL_GROUP_BY
Mode to prevent non-aggregated column misuse and cause unstable results.
Performance optimization comparison case
Scene | Time-consuming before optimization | Optimization means | Time-consuming after optimization |
---|---|---|---|
Analyzing behavior of millions of users | 12.8s | create(user_id,action_time) Overwrite index |
1.2s |
Billion-level log log aggregation | 3 minutes | Daily partition + parallel query | 8 seconds |
Through the above combination of strategies, it can be solved systematicallyGROUP BY
Performance bottleneck. It is recommended to combine it in practical applicationsEXPLAIN
Analysis and A/B testing, and select the optimization solution that is most suitable for business scenarios.
7. Expand knowledge
-
NULL value processing:
GROUP BY
WillNULL
Deemed as independent grouping. -
Sort combined: Use after grouping
ORDER BY
Sort the results (such as descending order by average wage). -
Dynamic grouping:pass
CASE WHEN
Implement conditional grouping (such as statistics by salary range).
By flexibly combining these functions,GROUP BY
Can meet complex data analysis needs. In practical applications, index optimization and query logic design need to be combined to improve execution efficiency.
This is the end of this article about the advanced usage of group by in mysql. For more related content on the usage of mysql group by, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!