HyperLogLog

Cardinality Counting: HyperLogLog


1. 参考文章

Redis New Data Structure: HyperLogLog
Count-Distinct Problem


2. HyperLogLog in Redis

使用Redis中的数据结构HyperLogLog, 测试比较统计结果和实际数据之间的差异。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
package com.quantums.redis;

import redis.clients.jedis.Jedis;

/**
* Redis : HyperLogLog可以用于统计不重复的网页user view count
* 当数据到一定程度时,只能统计大概数据
*/
public class TestHyperLogLog {
private static Jedis jedis = JedisUtil.getJedis();


public static void main(String[] ar) {
/**初始化**/
jedis.del("homepage");
System.out.println("初始数量: "+jedis.pfcount("homepage"));

System.out.printf("%-13s %-14s %-10s\n","实际数据","HyperLogLog","误差率");

long actual = 0;
for (long i = 1; i <= 100000; i++) {
jedis.pfadd("homepage", "user" + i);
actual++;
if (actual % 10000 == 0) {
long hypers = jedis.pfcount("homepage");
double error = ((double) (actual - hypers)) / actual * 100;
System.out.printf("%-15d %-15d %-1.2f%%\n", actual, hypers, error);
}
}
}
}

测试结果:

1
2
3
4
5
6
7
8
9
10
11
实际数据        HyperLogLog     误差率       
10000 10064 -0.64%
20000 20100 -0.50%
30000 30222 -0.74%
40000 40165 -0.41%
50000 49827 0.35%
60000 59807 0.32%
70000 69995 0.01%
80000 80118 -0.15%
90000 89931 0.08%
100000 99715 0.29%