Word Count Using Pig DataFlow:
[cloudera@quickstart ~]$ cat comment
hadoop is great
spark is great
hadoop and spark combination is great
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal comment piglab
[cloudera@quickstart ~]$
grunt> ls piglab
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/comment<r 1> 69
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/emp<r 1> 158
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/results1 <dir>
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/results2 <dir>
grunt> cat piglab/comment
hadoop is great
spark is great
hadoop and spark combination is great
grunt>
lines = load 'piglab/comment' as (line:chararray);
words = foreach lines generate
FLATTEN(TOKENIZE(line)) as word;
gwords = group words by word;
wcnt = foreach gwords generate
group as word, COUNT(words) as cnt;
(is,3)
(and,1)
(great,3)
(spark,2)
(hadoop,2)
(combination,1)
------------------------------------
[cloudera@quickstart ~]$ cat comment
hadoop is great
spark is great
hadoop and spark combination is great
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal comment piglab
[cloudera@quickstart ~]$
grunt> ls piglab
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/comment<r 1> 69
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/emp<r 1> 158
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/results1 <dir>
hdfs://quickstart.cloudera:8020/user/cloudera/piglab/results2 <dir>
grunt> cat piglab/comment
hadoop is great
spark is great
hadoop and spark combination is great
grunt>
lines = load 'piglab/comment' as (line:chararray);
words = foreach lines generate
FLATTEN(TOKENIZE(line)) as word;
gwords = group words by word;
wcnt = foreach gwords generate
group as word, COUNT(words) as cnt;
(is,3)
(and,1)
(great,3)
(spark,2)
(hadoop,2)
(combination,1)
------------------------------------
No comments:
Post a Comment