Pig : Subsetting using Filter, Limit, Sample
Techniques of subsetting relations:
i) filter: used for condiational filtering.
ii) limit : takes first n number of tuples.
iii) sample: to take random sample sets.
" with replace " model.
filter: conditional subsetting.
e1 = filter emp by sex='m';
grunt> dump e1
(101,aaaa,40000,m,11)
(103,cccc,50000,m,12)
(105,ee,10000,m,12)
(106,dkd,40000,m,12)
(108,iiii,50000,m,11)
e2 = filter emp by (sex=='m' and sal>=40000);
e3 = filter sales by (pid=='p1' or pid=='p4'
or pid='p20')
e4 = filter sales by (city!='hyd' and
and city!='del'
and city!='pune')
e5 = filter emp by (
(sex=='m' and (city=='hyd' or city=='del'))
or
(sex=='f' and (city='blore' or
city=='chennai'))
or
city=='noida'
)
---------------------------------------
-- to fetch first n number of tuples.
grunt> f3 = limit emp 3;
grunt> dump f3
-- limitation: can not fetch last n number of
tuples.
-----------------------------
sample: --> used to get random samples.
s1 = sample sales 0.1;
s2 = sample sales 0.5;
----------------------------------
nice blog
ReplyDelete