SreeRam Hadoop Notes: Pig : Word Count Using Pig Data Flow

Wednesday, 3 May 2017

Pig : Word Count Using Pig Data Flow

Word Count Using Pig DataFlow:

[cloudera@quickstart ~]$ cat comment
hadoop is great
spark is great
hadoop and spark combination is great
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal comment piglab
[cloudera@quickstart ~]$

grunt> ls piglab
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/comment<r 1> 69
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/emp<r 1> 158
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/results1 <dir>
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/results2 <dir>
grunt> cat piglab/comment
hadoop is great
spark is great
hadoop and spark combination is great
grunt>

lines = load 'piglab/comment' as (line:chararray);

words = foreach lines generate
FLATTEN(TOKENIZE(line)) as word;
gwords = group words by word;
wcnt = foreach gwords generate
group as word, COUNT(words) as cnt;

(is,3)
(and,1)
(great,3)
(spark,2)
(hadoop,2)
(combination,1)

------------------------------------

SreeRam Hadoop Notes

Data science Software Course Training in Ameerpet Hyderabad

Wednesday, 3 May 2017

Pig : Word Count Using Pig Data Flow

No comments:

Post a Comment