SoFunction
Updated on 2025-05-12

Will index failure occur when the Mysql field is NULL

In MySQL, the field containing a NULL value itself does not directly cause the index to fail, butOperations involving NULL in query conditions may affect the use of indexes, depending on the data distribution and the selection of the optimizer.

The following is a detailed analysis:

1. The mechanism for processing NULL values ​​by index

Index Store NULL Values

  • If the field is allowed to be NULL and has an index,NULL values ​​will be recorded in the index
  • The B+Tree index of InnoDB treats NULL as a special value that coexists with other non-NULL values ​​in the index structure.

Exceptions to the only index

  • For unique indexes (UNIQUE), MySQL allows insertion of multiple NULL values ​​because NULL is considered "unknown value" and does not conflict with each other.
  • For example, a unique indexUNIQUE(email)Multiple lines are allowedemailis NULL.

2. Scenarios involving NULL in query conditions

1. IS NULL or IS NOT NULL

Whether to use indexes

Depends on the distribution ratio of the NULL value. The optimizer will decide whether to use the index based on statistics such as index cardinality.

  • High NULL ratio: If most of the values ​​of a column are NULL, the optimizer may consider that the full table scan is faster than the index scan, thus giving up the index.
  • Low NULL ratio: If the NULL value is small, the optimizer may choose to locate data through index.

Example

-- Assumptions `address` column has index and 90% The value of NULL
EXPLAIN SELECT * FROM users WHERE address IS NULL;

resulttypeListed asALL(Full table scan), the index does not take effect.

2. Equivalent query (= NULL or = value)

= NULLinvalid

In SQL standard= NULLWill returnUNKNOWN, should be usedIS NULL

Non-NULL equivalent query

SELECT * FROM users WHERE email = 'user@';  -- like email Indexed and not NULL,Index takes effect

3. Range query or comparison operator

<, >, BETWEEN

If a NULL value is included in the query condition, the optimizer may cause the index to be abandoned.

For example:

-- Assumptions `price` There is an index and some values ​​are NULL
SELECT * FROM products WHERE price &gt; 100;  -- NULL Values ​​will be filtered,But whether the index takes effect depends on the non NULL Distribution of values

3. The impact of data distribution on index use

The optimizer passes statistics (e.g.cardinality) Evaluate query cost. The following scenarios may cause index failure:

High NULL ratio

If most of the values ​​of a column are NULL, the optimizer believes that the full table scan is faster.

Low distinction

Even if the column is not NULL, but the value repeat rate is high (such as gender columns), the optimizer may abandon the index.

4. Methods to verify whether the index takes effect

useEXPLAINAnalyze query plan:

EXPLAIN SELECT * FROM users WHERE address IS NULL;

Key fields

  • typereforrangeIndicates the use of index,ALLIndicates full table scan.
  • key: Displays the actual index used.
  • Extra: If displayedUsing index condition, means that the index pushdown (ICP) is effective.

5. Optimization suggestions

Avoid storing large amounts of NULL in index columns

If NULL has no practical meaning, the field can be set toNOT NULLAnd assign default values ​​(such as empty string, 0).

For example:

ALTER TABLE users MODIFY address VARCHAR(100) NOT NULL DEFAULT '';
  • Override index optimizationIS NULLQuery
CREATE INDEX idx_address ON users (address) INCLUDE (name);  -- MySQL 8.0+ support INCLUDE

If you need to query frequentlyIS NULL, create overlay indexes that contain query fields to avoid back to tables.

  • Forced index use
SELECT * FROM users USE INDEX (idx_address) WHERE address IS NULL;
  • Regularly update statistics
ANALYZE TABLE users;  -- Update index statistics,Help optimizers make more accurate decisions

VI. Sample Analysis

1. Data table structure

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  salary INT,
  bonus INT,  -- allow NULL,and 80% The value of NULL
  INDEX idx_bonus (bonus)
);

2. Query the scene

-- Query bonus for NULL Employees
EXPLAIN SELECT * FROM employees WHERE bonus IS NULL;

Possible results: The optimizer selects full table scan (type: ALL), because the NULL value accounts for too high.

3. Optimization plan

  • Plan 1:forbonusSet the default value 0 to reduce the NULL ratio.
  • Plan 2: Forced use of indexes (need to test whether performance is improved):
SELECT * FROM employees USE INDEX (idx_bonus) WHERE bonus IS NULL;

Summarize

  • The index will not be invalidated due to the NULL value in the field, but when query conditions involve NULL, the optimizer may abandon the index due to the data distribution.
  • Key factors: The ratio of NULL value, query condition type, index design.
  • Optimization direction: Reduce NULL values, design indexes reasonably, use overlay indexes or update statistics.

By rationally designing the table structure and index, the query performance containing NULL value fields can be significantly improved.

The above is personal experience. I hope you can give you a reference and I hope you can support me more.