Data science Software Course Training in Ameerpet Hyderabad

Data science Software Course Training in Ameerpet Hyderabad

Wednesday 3 May 2017

Pig : Word Count Using Pig Data Flow

Word Count Using Pig DataFlow:

[cloudera@quickstart ~]$ cat comment
hadoop is great
spark is great
hadoop and spark combination is great
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal comment piglab
[cloudera@quickstart ~]$

grunt> ls piglab
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/comment<r 1> 69
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/emp<r 1> 158
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/results1 <dir>
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/results2 <dir>
grunt> cat piglab/comment
hadoop is great
spark is great
hadoop and spark combination is great
grunt>


 lines = load 'piglab/comment' as (line:chararray);

 words = foreach lines generate
         FLATTEN(TOKENIZE(line)) as word;
 gwords = group words by word;
 wcnt = foreach gwords generate
        group as word, COUNT(words) as cnt;

(is,3)
(and,1)
(great,3)
(spark,2)
(hadoop,2)
(combination,1)

------------------------------------














No comments:

Post a Comment