In MySQL, the field containing a NULL value itself does not directly cause the index to fail, butOperations involving NULL in query conditions may affect the use of indexes, depending on the data distribution and the selection of the optimizer.
The following is a detailed analysis:
1. The mechanism for processing NULL values by index
Index Store NULL Values
- If the field is allowed to be NULL and has an index,NULL values will be recorded in the index。
- The B+Tree index of InnoDB treats NULL as a special value that coexists with other non-NULL values in the index structure.
Exceptions to the only index
- For unique indexes (
UNIQUE
), MySQL allows insertion of multiple NULL values because NULL is considered "unknown value" and does not conflict with each other. - For example, a unique index
UNIQUE(email)
Multiple lines are allowedemail
is NULL.
2. Scenarios involving NULL in query conditions
1. IS NULL or IS NOT NULL
Whether to use indexes:
Depends on the distribution ratio of the NULL value. The optimizer will decide whether to use the index based on statistics such as index cardinality.
- High NULL ratio: If most of the values of a column are NULL, the optimizer may consider that the full table scan is faster than the index scan, thus giving up the index.
- Low NULL ratio: If the NULL value is small, the optimizer may choose to locate data through index.
Example:
-- Assumptions `address` column has index and 90% The value of NULL EXPLAIN SELECT * FROM users WHERE address IS NULL;
result:type
Listed asALL
(Full table scan), the index does not take effect.
2. Equivalent query (= NULL or = value)
= NULL
invalid:
In SQL standard= NULL
Will returnUNKNOWN
, should be usedIS NULL
。
Non-NULL equivalent query:
SELECT * FROM users WHERE email = 'user@'; -- like email Indexed and not NULL,Index takes effect
3. Range query or comparison operator
<
, >
, BETWEEN
:
If a NULL value is included in the query condition, the optimizer may cause the index to be abandoned.
For example:
-- Assumptions `price` There is an index and some values are NULL SELECT * FROM products WHERE price > 100; -- NULL Values will be filtered,But whether the index takes effect depends on the non NULL Distribution of values
3. The impact of data distribution on index use
The optimizer passes statistics (e.g.cardinality
) Evaluate query cost. The following scenarios may cause index failure:
High NULL ratio:
If most of the values of a column are NULL, the optimizer believes that the full table scan is faster.
Low distinction:
Even if the column is not NULL, but the value repeat rate is high (such as gender columns), the optimizer may abandon the index.
4. Methods to verify whether the index takes effect
useEXPLAIN
Analyze query plan:
EXPLAIN SELECT * FROM users WHERE address IS NULL;
Key fields:
-
type
:ref
orrange
Indicates the use of index,ALL
Indicates full table scan. -
key
: Displays the actual index used. -
Extra
: If displayedUsing index condition
, means that the index pushdown (ICP) is effective.
5. Optimization suggestions
Avoid storing large amounts of NULL in index columns:
If NULL has no practical meaning, the field can be set toNOT NULL
And assign default values (such as empty string, 0).
For example:
ALTER TABLE users MODIFY address VARCHAR(100) NOT NULL DEFAULT '';
-
Override index optimization
IS NULL
Query:
CREATE INDEX idx_address ON users (address) INCLUDE (name); -- MySQL 8.0+ support INCLUDE
If you need to query frequentlyIS NULL
, create overlay indexes that contain query fields to avoid back to tables.
- Forced index use:
SELECT * FROM users USE INDEX (idx_address) WHERE address IS NULL;
- Regularly update statistics:
ANALYZE TABLE users; -- Update index statistics,Help optimizers make more accurate decisions
VI. Sample Analysis
1. Data table structure
CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(50), salary INT, bonus INT, -- allow NULL,and 80% The value of NULL INDEX idx_bonus (bonus) );
2. Query the scene
-- Query bonus for NULL Employees EXPLAIN SELECT * FROM employees WHERE bonus IS NULL;
Possible results: The optimizer selects full table scan (type: ALL
), because the NULL value accounts for too high.
3. Optimization plan
-
Plan 1:for
bonus
Set the default value 0 to reduce the NULL ratio. - Plan 2: Forced use of indexes (need to test whether performance is improved):
SELECT * FROM employees USE INDEX (idx_bonus) WHERE bonus IS NULL;
Summarize
- The index will not be invalidated due to the NULL value in the field, but when query conditions involve NULL, the optimizer may abandon the index due to the data distribution.
- Key factors: The ratio of NULL value, query condition type, index design.
- Optimization direction: Reduce NULL values, design indexes reasonably, use overlay indexes or update statistics.
By rationally designing the table structure and index, the query performance containing NULL value fields can be significantly improved.
The above is personal experience. I hope you can give you a reference and I hope you can support me more.