Data science Software Course Training in Ameerpet Hyderabad

Data science Software Course Training in Ameerpet Hyderabad

Wednesday, 3 May 2017

Pig : How to perform grouping by Multiple Columns


 how to perform grouping by multiple columns.
-------------------------------------------
 task: mutiple grouping with mulitiple aggregations .

 sql:
  select dno, sex , sum(sal) ,
         count(*), avg(sal), max(sal),
           min(sal) from emp
         group by dno, sex;

grunt> emp = load 'piglab/emp' using PigStorage(',')
>>   as (id:int, name:chararray,  sal:int,
>>     sex:chararray, dno:int);
grunt>

grunt> e = foreach emp generate dno, sex, sal;
grunt> grp = group e by dno, sex;
2017-05-03 18:54:14,501 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 7, column 25>  Syntax error, unexpected symbol at or near ';'
Details at logfile: /home/cloudera/pig_1493862740272.log
grunt>

above grouping is invalid.
 pig does not allow grouping by multiple fields.


solution:
  make mulitple fields as a tuple.
  and group it by the tuple field.

grunt> grp = group e by (dno,sex);
grunt> describe grp;
grp: {group: (dno: int,sex: chararray),e: {(dno: int,sex: chararray,sal: int)}}
grunt> dump grp
((11,m),{(11,m,50000),(11,m,40000)})
((12,f),{(12,f,50000)})
((12,m),{(12,m,40000),(12,m,10000),(12,m,50000)})
((13,f),{(13,f,80000),(13,f,90000)})

grunt> res = foreach grp generate
>>         group.dno as dno,
>>        group.sex as gender,
>>        SUM(e.sal) as tot,
>>        COUNT(e) as cnt,
>>       AVG(e.sal) as avg,
>>       MAX(e.sal) as max, MIN(e.sal) as min;
grunt>dump res
(11,m,90000,2,45000.0,50000,40000)
(12,f,50000,1,50000.0,50000,50000)
(12,m,100000,3,33333.333333333336,50000,10000)
(13,f,170000,2,85000.0,90000,80000)
-----------------------------------------












1 comment:

  1. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    Big Data Hadoop Training in electronic city

    ReplyDelete