离散分布随机采样Alias Method

目标:将在一个离散分布上采样的时间复杂度由O(n)/O(log(n))优化至O(1)

wikipedia:
“In computing, the alias method is a family of efficient algorithms for sampling from a discrete probability distribution, published in 1974 by A. J. Walker.”

Darts, Dice, and Coins: Sampling from a Discrete Distribution
Alias Method离散分布随机取样
The Alias Method: Efficient Sampling with Many Discrete Outcomes

1. 常用方法

1
2
3
4
5
6
7
8
import numpy as np

p = np.linspace(1,10)
cdf = np.cumsum(p)
cdf

samp_k = sum(cdf < np.random.rand()*cdf[len(p)-1]) + 1;
samp_k

2. Alias Method

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import numpy        as np
import numpy.random as npr

def alias_setup(probs):
K = len(probs)
q = np.zeros(K)
J = np.zeros(K, dtype=np.int)

smaller = []
larger = []
for kk, prob in enumerate(probs):
q[kk] = K*prob
if q[kk] < 1.0:
smaller.append(kk)
else:
larger.append(kk)

while len(smaller) > 0 and len(larger) > 0:
small = smaller.pop()
large = larger.pop()

J[small] = large
q[large] = q[large] - (1.0 - q[small])

if q[large] < 1.0:
smaller.append(large)
else:
larger.append(large)

return J, q

def alias_draw(J, q):
K = len(J)

kk = int(np.floor(npr.rand()*K))

if npr.rand() < q[kk]:
return kk
else:
return J[kk]

K = 5
N = 1000

probs = npr.dirichlet(np.ones(K), 1).ravel()

J, q = alias_setup(probs)

X = np.zeros(N)
for nn in range(N):
X[nn] = alias_draw(J, q)
`