Data science Software Course Training in Ameerpet Hyderabad

Data science Software Course Training in Ameerpet Hyderabad

Wednesday, 3 May 2017

Pig : How to perform grouping by Multiple Columns

 how to perform grouping by multiple columns.
 task: mutiple grouping with mulitiple aggregations .

  select dno, sex , sum(sal) ,
         count(*), avg(sal), max(sal),
           min(sal) from emp
         group by dno, sex;

grunt> emp = load 'piglab/emp' using PigStorage(',')
>>   as (id:int, name:chararray,  sal:int,
>>     sex:chararray, dno:int);

grunt> e = foreach emp generate dno, sex, sal;
grunt> grp = group e by dno, sex;
2017-05-03 18:54:14,501 [main] ERROR - ERROR 1200: <line 7, column 25>  Syntax error, unexpected symbol at or near ';'
Details at logfile: /home/cloudera/pig_1493862740272.log

above grouping is invalid.
 pig does not allow grouping by multiple fields.

  make mulitple fields as a tuple.
  and group it by the tuple field.

grunt> grp = group e by (dno,sex);
grunt> describe grp;
grp: {group: (dno: int,sex: chararray),e: {(dno: int,sex: chararray,sal: int)}}
grunt> dump grp

grunt> res = foreach grp generate
>>         group.dno as dno,
>> as gender,
>>        SUM(e.sal) as tot,
>>        COUNT(e) as cnt,
>>       AVG(e.sal) as avg,
>>       MAX(e.sal) as max, MIN(e.sal) as min;
grunt>dump res

1 comment:

  1. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    Big Data Hadoop Training in electronic city