Data science Software Course Training in Ameerpet Hyderabad

Data science Software Course Training in Ameerpet Hyderabad

Monday 8 May 2017

Pig : Subsetting using Filter, Limit, Sample

Techniques of subsetting relations:
 i) filter: used for condiational filtering.
 ii) limit : takes first n number of tuples.
 iii) sample: to take random sample sets.
     " with replace " model.

filter: conditional subsetting.
       
 e1 = filter emp by  sex='m';
 grunt> dump e1
(101,aaaa,40000,m,11)
(103,cccc,50000,m,12)
(105,ee,10000,m,12)
(106,dkd,40000,m,12)
(108,iiii,50000,m,11)


 e2 = filter emp by (sex=='m' and sal>=40000);
 e3 = filter sales by (pid=='p1' or pid=='p4'
            or pid='p20')
 e4 = filter sales by (city!='hyd'   and
        and        city!='del'
            and  city!='pune')
 e5 = filter emp  by  (
          (sex=='m'  and (city=='hyd' or city=='del'))
              or
           (sex=='f'  and (city='blore' or
         city=='chennai'))
              or
           city=='noida'
         )
---------------------------------------
-- to fetch first n number of tuples.
grunt> f3 = limit emp 3;
grunt> dump f3
 -- limitation: can not fetch last n number of
  tuples.
-----------------------------
sample: --> used to get random samples.
  s1 = sample sales 0.1;
  s2 = sample sales 0.5;
----------------------------------























 


1 comment: