how to perform grouping by multiple columns.
-------------------------------------------
task: mutiple grouping with mulitiple aggregations .
sql:
select dno, sex , sum(sal) ,
count(*), avg(sal), max(sal),
min(sal) from emp
group by dno, sex;
grunt> emp = load 'piglab/emp' using PigStorage(',')
>> as (id:int, name:chararray, sal:int,
>> sex:chararray, dno:int);
grunt>
grunt> e = foreach emp generate dno, sex, sal;
grunt> grp = group e by dno, sex;
2017-05-03 18:54:14,501 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 7, column 25> Syntax error, unexpected symbol at or near ';'
Details at logfile: /home/cloudera/pig_1493862740272.log
grunt>
above grouping is invalid.
pig does not allow grouping by multiple fields.
solution:
make mulitple fields as a tuple.
and group it by the tuple field.
grunt> grp = group e by (dno,sex);
grunt> describe grp;
grp: {group: (dno: int,sex: chararray),e: {(dno: int,sex: chararray,sal: int)}}
grunt> dump grp
((11,m),{(11,m,50000),(11,m,40000)})
((12,f),{(12,f,50000)})
((12,m),{(12,m,40000),(12,m,10000),(12,m,50000)})
((13,f),{(13,f,80000),(13,f,90000)})
grunt> res = foreach grp generate
>> group.dno as dno,
>> group.sex as gender,
>> SUM(e.sal) as tot,
>> COUNT(e) as cnt,
>> AVG(e.sal) as avg,
>> MAX(e.sal) as max, MIN(e.sal) as min;
grunt>dump res
(11,m,90000,2,45000.0,50000,40000)
(12,f,50000,1,50000.0,50000,50000)
(12,m,100000,3,33333.333333333336,50000,10000)
(13,f,170000,2,85000.0,90000,80000)
-----------------------------------------
No comments:
Post a Comment