SoFunction
Updated on 2025-04-23

group by advanced usage in mysql

MySQLGROUP BYIt is the core function of data aggregation analysis, mainly used to group the result sets by specified columns and perform statistical calculations in combination with the aggregation function. The following is a detailed analysis from basic syntax to advanced usage:

1. Basic syntax and core functions

SELECT Grouped columns, Aggregation function(Calculate columns)
FROM Table name
[WHERE condition]
GROUP BY Grouped columns
[HAVING 分组过滤condition]
[ORDER BY Sort columns];

Core functions

  • Data grouping: Divide data into logical groups by the value of one or more columns.
  • Aggregation calculation: Apply an aggregate function to each group (e.g.COUNTSUMAVGMAXMIN) conduct statistics.
  • Result filtering:passHAVINGFilter the results after grouping (difficult toWHEREpre-group filter).

2. Basic usage examples

1. Single column grouping statistics

Statistics on the number of employees and average salary in each department:

SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;  -- 

2. Multi-column combination grouping

Number of employees by department and position:

SELECT department, job_title, COUNT(*) 
FROM employees
GROUP BY department, job_title;  -- 

3. Use in combination with WHERE

Only the average salary of employees with salary exceeding RMB 2,000 is counted:

SELECT department, AVG(salary)
FROM employees
WHERE salary > 2000
GROUP BY department;  -- 

3. Advanced features and extensions

1. HAVING clause filters grouping

Filter departments with more than 5 employees:

SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING emp_count > 5;  -- 

2. WITH ROLLUP generation summary line

Subtotal and total salary for generating departments and positions:

SELECT department, job_title, SUM(salary)
FROM employees
GROUP BY department, job_title WITH ROLLUP;  -- 

3. GROUP_CONCAT merge column values

Statistics all products purchased by each user (comma-separated):

SELECT user_id, GROUP_CONCAT(product_name SEPARATOR ', ') 
FROM orders
GROUP BY user_id;  -- 

4. Group by expression/function

Number of orders by year:

SELECT YEAR(order_date) AS year, COUNT(*)
FROM orders
GROUP BY YEAR(order_date);  -- 

4. Precautions and common errors

ONLY_FULL_GROUP_BY mode
MySQL 8.0+ enables this mode by default, requiringSELECTThe non-aggregated columns inGROUP BYor an error will be reported.

-- Error Example(salaryUnaggregated and ungrouped)
SELECT department, salary FROM employees GROUP BY department;
-- Correction method:Add an aggregate function or grouping field
SELECT department, MAX(salary) FROM employees GROUP BY department;

The difference between WHERE and HAVING

  • WHEREFilter row data before grouping, and the aggregate function cannot be used.
  • HAVINGFilter group data after grouping, which must be combined with the aggregation conditions.

Performance optimization suggestions

  • Create an index on a grouped column (e.g.ALTER TABLE employees ADD INDEX(department))。
  • To avoid grouping large tables directly, you can first narrow the data scope through temporary tables or subqueries.

5. Classic case scenarios

1. Aggregate by time dimension

Total monthly sales:

SELECT YEAR(sale_date) AS year, MONTH(sale_date) AS month, SUM(amount)
FROM sales
GROUP BY year, month;  -- 

2. Multi-level statistics

Analyze the total and average order amount of each customer each year:

SELECT customer_id, YEAR(order_date), 
       SUM(total_amount), AVG(total_amount)
FROM orders
GROUP BY customer_id, YEAR(order_date);  -- 

3. Data removal

Find users with duplicate mailboxes:

SELECT email, COUNT(*) 
FROM users 
GROUP BY email 
HAVING COUNT(*) > 1;  -- 

6. Optimization of aggregation efficiency

Optimize in MySQLGROUP BYAggregation efficiency needs to start from multiple dimensions such as index design, query logic, and execution engine characteristics. The following summarizes 9 core optimization strategies based on the latest optimization practices and database engine features:

1. Index optimization strategy

Composite index precisely match grouped columns
• Create andGROUP BYComposite indexes that match exactly in sequence (e.g.GROUP BY a,bCreate(a,b)index), can triggerLoose index scan, reduce disk I/O by more than 90%.
• Typical cases: when(department, job_title)Composite index when groupedidx_dept_jobQueries can be made to skip full table scans and groupings are completed directly through the index.

Overwrite indexes to avoid back to tables
• make sureSELECTColumns and the columns involved in the aggregate function are included in the index. For example, index(category, sales), querySELECT category, SUM(sales)The calculation can be completed directly through the index without accessing data lines.

Use function index to deal with complex grouping
• Grouping expressions (e.g.YEAR(date_col)), create virtual columns or function indexes (supported by MySQL 8.0+). For example:

ALTER TABLE orders ADD COLUMN year_date INT AS (YEAR(order_date)) VIRTUAL;
CREATE INDEX idx_year ON orders(year_date);

2. Query design and execution optimization

Reduce the number and complexity of grouped fields
• For each additional grouping field, the sorting complexity increases exponentially. Preferential merge of related fields (such asprovinceandcityMerge intoregionfield).
• AvoidGROUP BYUse functions in , otherwise the index will fail. It needs to be rewritten to group based on the original field, ifGROUP BY DATE(created_at)Change toGROUP BY created_at_datePrecalculated column.

Stage filtration and aggregation
• Filter irrelevant data through subqueries first and then group them:

SELECT department, AVG(salary) 
FROM (SELECT * FROM employees WHERE salary > 5000) AS filtered 
GROUP BY department;  -- More directHAVINGImprove efficiency40%

Memory sorting and temporary table optimization
• Adjustmenttmp_table_sizeandmax_heap_table_sizeParameters (recommended to set to 20% of physical memory) to avoid temporary table drops.
• MonitoringCreated_tmp_disk_tablesStatus variables. If disk temporary tables appear frequently, the index needs to be optimized or split queries are required.

3. Advanced optimization technology

Partition table accelerates big data processing
• Partition by time or business dimension (such as by monthly partition) to makeGROUP BYScan only specific partitions. For example, for a 1 billion log tableevent_dateAfter partitioning, the monthly statistics time is reduced from minute to second level.

Materialized view and result cache
• Use materialized views for high-frequency aggregation queries (such as throughCREATE TABLE mv AS SELECT...Refresh regularly) to reduce real-time calculation pressure.
• The application layer caches duplicate query results (such as Redis caches daily summary data) to reduce database load.

Parallel Query (MySQL 8.0+)
• Enableparallel_queryFunction, processing complex grouping through multithreading:

SET SESSION optimizer_switch='parallel_query=on';
SELECT region, SUM(revenue) FROM sales GROUP BY region;  -- Utilize multi-coreCPUaccelerate

4. Diagnostic tools and precautions

Execution plan analysis
useEXPLAIN FORMAT=JSONobserveusing_index(Whether to use indexing),using_temporary(Whether to use a temporary table)filesortKey indicators such as (sorting method).

Strict mode evasion errors
EnableONLY_FULL_GROUP_BYMode to prevent non-aggregated column misuse and cause unstable results.

Performance optimization comparison case

Scene Time-consuming before optimization Optimization means Time-consuming after optimization
Analyzing behavior of millions of users 12.8s create(user_id,action_time)Overwrite index 1.2s
Billion-level log log aggregation 3 minutes Daily partition + parallel query 8 seconds

Through the above combination of strategies, it can be solved systematicallyGROUP BYPerformance bottleneck. It is recommended to combine it in practical applicationsEXPLAINAnalysis and A/B testing, and select the optimization solution that is most suitable for business scenarios.

7. Expand knowledge

  • NULL value processingGROUP BYWillNULLDeemed as independent grouping.
  • Sort combined: Use after groupingORDER BYSort the results (such as descending order by average wage).
  • Dynamic grouping:passCASE WHENImplement conditional grouping (such as statistics by salary range).

By flexibly combining these functions,GROUP BYCan meet complex data analysis needs. In practical applications, index optimization and query logic design need to be combined to improve execution efficiency.

This is the end of this article about the advanced usage of group by in mysql. For more related content on the usage of mysql group by, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!