为什么“set”只有一个元素,而例如前 5 行输入应该有 4 个元素,这些元素具有相同的 URL 和四个不同的 IP。我还使用了“for-each”而不是“迭代器”,但不起作用。有人能帮我吗?映射器public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, Text> { private Text IP = new Text(); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] tokens = line.split(","); word.set(tokens[2]); IP.set(tokens[0]); context.write(word, IP); } }减速器 public static class IntSumReducer extends Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { Set<String> set = new HashSet<String>(); Iterator<Text> iterator = values.iterator(); while (iterator.hasNext()) { set.add(iterator.next().toString()); } int a = set.size(); String str = String.format("%d", a); context.write(key, new Text(str)); } }工作 public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}
1 回答
森林海
TA贡献2011条经验 获得超2个赞
Reducer 工作正常,但Combiner 并没有按照你的想法做。打开组合器时发生的情况是:
映射器输出:
("GET / HTTP/1.1", "10.31.0.1")
("GET / HTTP/1.1", "10.31.0.2")组合输入:
("GET / HTTP/1.1", {"10.31.0.1", "10.31.0.2"})组合器输出:
("GET / HTTP/1.1", "2") //You have the right answer here...减速机输入:
("GET / HTTP/1.1", {"2"}) //...but then it gets passed into the Reducer again减速机输出:
("GET / HTTP/1.1", "1")只有一个元素进入 Reducer,因此它减少到“1”。
删除组合器(删除job.setCombinerClass(IntSumReducer.class);,这将起作用。
其他建议的更改:
使用 Reducer 输出
IntWritable而不是将数字转换为Text.用
SetaSet<Text>代替Set<String>,以节省昂贵的Text -> String转换。
添加回答
举报
0/150
提交
取消
