Data science Software Course Training in Ameerpet Hyderabad

Data science Software Course Training in Ameerpet Hyderabad

Monday, 8 May 2017

Pig : Subsetting using Filter, Limit, Sample

Techniques of subsetting relations:
 i) filter: used for condiational filtering.
 ii) limit : takes first n number of tuples.
 iii) sample: to take random sample sets.
     " with replace " model.

filter: conditional subsetting.
       
 e1 = filter emp by  sex='m';
 grunt> dump e1
(101,aaaa,40000,m,11)
(103,cccc,50000,m,12)
(105,ee,10000,m,12)
(106,dkd,40000,m,12)
(108,iiii,50000,m,11)


 e2 = filter emp by (sex=='m' and sal>=40000);
 e3 = filter sales by (pid=='p1' or pid=='p4'
            or pid='p20')
 e4 = filter sales by (city!='hyd'   and
        and        city!='del'
            and  city!='pune')
 e5 = filter emp  by  (
          (sex=='m'  and (city=='hyd' or city=='del'))
              or
           (sex=='f'  and (city='blore' or
         city=='chennai'))
              or
           city=='noida'
         )
---------------------------------------
-- to fetch first n number of tuples.
grunt> f3 = limit emp 3;
grunt> dump f3
 -- limitation: can not fetch last n number of
  tuples.
-----------------------------
sample: --> used to get random samples.
  s1 = sample sales 0.1;
  s2 = sample sales 0.5;
----------------------------------























 


6 comments:

  1. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    https://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/

    ReplyDelete
  2. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    Big Data Hadoop Training in electronic city

    ReplyDelete