# pangrank算法--PageRank算法并行实现

1. PageRank算法并行化原理

2. MapReduce分步式编程

## 1. PageRank算法分步式原理

PageRank的分步式算法原理，简单来讲，就是通过矩阵计算实现并行化。

1). 把邻接矩阵的列，按数据行存储

`[,1]   [,2]   [,3]   [,4][1,] 0.0375000 0.0375 0.0375 0.0375[2,] 0.3208333 0.0375 0.0375 0.8875[3,] 0.3208333 0.4625 0.0375 0.0375[4,] 0.3208333 0.4625 0.8875 0.0375`

`1       0.037499994,0.32083333,0.32083333,0.320833332       0.037499994,0.037499994,0.4625,0.46253       0.037499994,0.037499994,0.037499994,0.887500054       0.037499994,0.88750005,0.037499994,0.037499994`

2). 迭代：求矩阵特征值

map过程：

• input: 邻接矩阵, pr值

• output: key为pr的行号，value为邻接矩阵和pr值的乘法求和公式

reduce过程：

• input: key为pr的行号，value为邻接矩阵和pr值的乘法求和公式

• output: key为pr的行号, value为计算的结果，即pr值

`0.0375000 0.0375 0.0375 0.0375     1     0.1500000.3208333 0.0375 0.0375 0.8875  *  1  =  1.2833330.3208333 0.4625 0.0375 0.0375     1     0.8583330.3208333 0.4625 0.8875 0.0375     1     1.708333`

`0.0375000 0.0375 0.0375 0.0375     0.150000      0.1500000.3208333 0.0375 0.0375 0.8875  *  1.283333  =   1.64458330.3208333 0.4625 0.0375 0.0375     0.858333      0.73791670.3208333 0.4625 0.8875 0.0375     1.708333      1.4675000`

… 10次迭代

`0.15000001.49557210.82550341.5289245`

3). 标准化PR值

`0.150000                                              0.03750001.4955721  / (0.15+1.4955721+0.8255034+1.5289245) =   0.37389300.8255034                                             0.20637591.5289245                                             0.3822311`

MapReduce流程分解

TA 点赞

58

314

• 推荐
• 评论
• 收藏
• 共同学习，写下你的评论

100积分直接送

0/150