SoFunction
Updated on 2025-05-09

Summary of the use of HyperLogLog in Redis

Redis's HyperLogLog is aProbability data structure, used forCount the number of unique elements (cardinality)For example, you want to know how many independent access users (UVs) a website has. Its main advantages are:Very small memory usage, no matter how much data you add, it usually only takes up 12KB of memory.

Here is a detailed explanation of HyperLogLog:

🧠 1. What is HyperLogLog?

HyperLogLog is aCardinality estimation calculation method, it can be used to estimateThe number of different elements in a collection (Cardinality). Unlike Set, Set will store all elements, while HyperLogLog does not save specific data, only saves one statistical state.

✅ Advantages: It can count the number of unique elements on a large scale, and the memory usage is extremely low
❌ Disadvantages: Not accurate statistics, there isError range ±0.81%

🛠️ 2. How to use

1. Add data

PFADD key element [element ...]
PFADD myloglog user1 user2 user3

2. Query the cardinality

PFCOUNT key [key ...]
PFCOUNT myloglog

3. Merge multiple HyperLogLogs

PFMERGE destkey sourcekey1 [sourcekey2 ...]
PFMERGE totalLog userLog1 userLog2

📊 3. Actual examples

# Add some user IDsPFADD uv:20250414 user1 user2 user3 user4 user5

# Query today's UVPFCOUNT uv:20250414
# Return 5
# Add duplicate user ID (no repeated counting)PFADD uv:20250414 user1 user2

# Check againPFCOUNT uv:20250414
# Still 5

🧮 4. Brief principle (not necessary to understand)

HyperLogLog BasedProbability Algorithm, the core idea is:
Hash each element, observe the maximum length of consecutive 0 in the prefix of the hash value, and use this statistical information to estimate the cardinality.

  • Hash function allows input to be distributed randomly
  • The probability of long continuous 0 is very low, indicating that there are many data
  • Redis spreads the results into 2^14 (16,384) buckets for improved accuracy

💡 5. Applicable scenarios

  • Website UV statistics (number of users visits per day and hourly)
  • Statistics of active users
  • Where data deduplication statistics are not required

⚠️ 6. Things to note

  • Cannot get which elements were added (it does not store data)
  • It is an estimated value (not an exact value), with an accuracy error of ±0.81%
  • Always only occupy about 12KB of memory (precisely control memory usage)

🧪 7. Compare with Set

project Set HyperLogLog
Accuracy accurate Estimated (±0.81%)
Memory Increases with data Constant approximately 12KB
Support deduplication counting ✅ (based on probability)
Support element viewing ❌ (No data stored)
Use scenarios Small data controllable collection Super large-scale data statistics

If you have specific usage scenarios or code requirements, you can also tell me that I can write specific Redis commands or Python/Lua examples for you. Need?

This is the end of this article about the use of HyperLogLog in Redis. For more related content on the use of Redis HyperLogLog, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!