Foreach Operator:
-------------------
grunt> emp = load 'piglab/emp' using PigStorage
(',')
>> as (id:int, name:chararray, sal:int,
>> sex:chararray, dno:int);
i) to copy data from one relation to
another relation.
emp2 = emp
or
emp2 = foreach emp generate *;
-------------------------------------
ii) selecting wanted fields;
grunt> e2 = foreach emp generate name,sal,dno;
grunt> describe e2
e2: {name: chararray,sal: int,dno: int}
grunt>
iii) changing field order:
grunt> describe emp;
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e3 = foreach emp generate
id,name,dno,sex,sal;
grunt> describe e3;
e3: {id: int,name: chararray,dno: int,sex:
chararray,sal: int}
grunt> illustrate e3
------------------------------------------------
---------------------------------------------
| emp | id:int | name:chararray |
sal:int | sex:chararray | dno:int |
------------------------------------------------
---------------------------------------------
| | 101 | aaaa |
40000 | m | 11 |
------------------------------------------------
---------------------------------------------
------------------------------------------------
--------------------------------------------
| e3 | id:int | name:chararray |
dno:int | sex:chararray | sal:int |
------------------------------------------------
--------------------------------------------
| | 101 | aaaa | 11
| m | 40000 |
------------------------------------------------
--------------------------------------------
grunt>
iv) generating new fields.
grunt> e4 = foreach emp generate * ,
>> sal*0.1 as tax, sal*0.2 as hra,
sal+hra-tax as net;
--above statement will be failed, bcoz, new
field aliases can not be reused in same
statement.
solution:
grunt> e4 = foreach emp generate *,
>> sal*0.1 as tax, sal*0.2 as hra;
grunt> e4 = foreach e4 generate *,
sal+hra-tax as net;
grunt> dump e4
v) converting data types:
[ explicit casting ]
runt> e5 = foreach e4 generate
>> id,name,sal, (int)tax, (int)hra,
(int)net, sex, dno;
grunt> describe e5
e5: {id: int,name: chararray,sal: int,tax:
int,hra: int,net: int,sex: chararray,dno: int}
grunt>
vi) renaming fields.
grunt> describe emp
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e6 = foreach emp generate
>> id as ecode, name , sal as income, sex as
gender, dno;
grunt> describe e6
e6: {ecode: int,name: chararray,income:
int,gender: chararray,dno: int}
grunt>
vii) conditional transformations:
----------------------------------
[cloudera@quickstart ~]$ cat > test1
100 200
300 120
400 220
300 500
10 90
10 5
[cloudera@quickstart ~]$ hadoop fs -
copyFromLocal test1 piglab
[cloudera@quickstart ~]$
grunt> r1 = load 'piglab/test1' as
(a:int, b:int);
grunt> r2 = foreach r1 generate *,
(a>b ? a:b) as big;
grunt> dump r2
(100,200,200)
(300,120,300)
(400,220,400)
(300,500,500)
(10,90,90)
(10,5,10)
grunt>
tenary operator(conditional operator) is used
for transformations(conditional).
syntax:
(criteria ? TrueValue : FalseValue)
--nested conditions
grunt> cat piglab/samp1
100 200 300
400 500 900
100 120 23
123 900 800
grunt> s1 = load 'piglab/samp1'
as (a:int, b:int, c:int);
grunt> s2 = foreach s1 generate *,
(a>b ? (a>c ? a:c): (b>c ? b:c)) as big,
(a<b ? (a<c ? a:c): (b<c ? b:c))
as small;
grunt> dump s2
(100,200,300,300,100)
(400,500,900,900,400)
(100,120,23,120,23)
(123,900,800,900,123)
----------------
grunt> describe emp
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e7 = foreach emp generate
id, name , sal, (sal>=70000 ? 'A':
(sal>=50000 ? 'B':
(sal>=30000 ? 'C':'D')))
as grade,
(sex=='m' ? 'Male':'Female') as sex,
(dno==11 ? 'Marketing':
(dno==12 ? 'Hr':
(dno==13 ? 'Finance':'Others')))
as dname;
grunt> store e7 into 'piglab/e7'
using PigStorage(',')
grunt> ls piglab/e7
hdfs://quickstart.cloudera:8020/user/cloudera/pi
glab/e7/_SUCCESS<r 1> 0
hdfs://quickstart.cloudera:8020/user/cloudera/pi
glab/e7/part-m-00000<r 1>228
grunt> cat piglab/e7/part-m-00000
101,aaaa,40000,C,Male,Marketing
102,bbbbbb,50000,B,Female,Hr
103,cccc,50000,B,Male,Hr
104,dd,90000,A,Female,Finance
105,ee,10000,D,Male,Hr
106,dkd,40000,C,Male,Hr
107,sdkfj,80000,A,Female,Finance
108,iiii,50000,B,Male,Marketing
grunt>
--------------------------
-- cleaning nulls using conditional
transformations.
[cloudera@quickstart ~]$ cat > damp
100,200,
,300,500
500,,700
500,,
,,700
,800,
1,2,3
10,20,30
[cloudera@quickstart ~]$ hadoop fs -
copyFromLocal damp piglab
[cloudera@quickstart ~]$
grunt> d = load 'piglab/damp'
using PigStorage(',')
as (a:int, b:int, c:int);
(100,200,)
(,300,500)
(500,,700)
(500,,)
(,,700)
(,800,)
(1,2,3)
(10,20,30)
d1 = foreach d generate *, a+b+c as tot;
dump d1
(100,200,,)
(,300,500,)
(500,,700,)
(500,,,)
(,,700,)
(,800,,)
(1,2,3,6)
(10,20,30,60)
grunt> d2 = foreach d generate
(a is null ? 0:a) as a,
(b is null ? 0:b) as b,
(c is null ? 0:c) as c;
grunt> res = foreach d2 generate *, a+b+c as d;
grunt> dump res
(100,200,0,300)
(0,300,500,800)
(500,0,700,1200)
(500,0,0,500)
(0,0,700,700)
(0,800,0,800)
(1,2,3,6)
(10,20,30,60)
----------------------------------------
foreach:
-- copy relation to realtion
-- selecting fields
-- changing field order
-- generating new fields.
-- changing data types.
-- renaming fields
-- conditional transformations.
----------------------------------