这个作业的作用是计算单词在输入文件中出现的次数。
Recall that the point of the job is to calculate the number of times words occurred in the input files.
在以前的作业中,输入、解析和输出用于将多个XML文件解析为关系记录。
In the previous job, the input, parser and output steps are used to parse multiple XML files into relational records. The following steps describe how to create the assembly.
在提供输入数据时(进入Hadoop文件系统[hdfs]),首先分段,然后分配给map工作线程(通过作业跟踪器)。
When input data is provided (into the Hadoop file system [HDFS]), it is first partitioned, and then distributed to map workers (via the job tracker).
应用推荐