Foreach Operator:
-------------------
grunt> emp = load 'piglab/emp' using PigStorage
(',')
>> as (id:int, name:chararray, sal:int,
>> sex:chararray, dno:int);
i) to copy data from one relation to
another relation.
emp2 = emp
or
emp2 = foreach emp generate *;
-------------------------------------
ii) selecting wanted fields;
grunt> e2 = foreach emp generate name,sal,dno;
grunt> describe e2
e2: {name: chararray,sal: int,dno: int}
grunt>
iii) changing field order:
grunt> describe emp;
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e3 = foreach emp generate
id,name,dno,sex,sal;
grunt> describe e3;
e3: {id: int,name: chararray,dno: int,sex:
chararray,sal: int}
grunt> illustrate e3
------------------------------------------------
---------------------------------------------
| emp | id:int | name:chararray |
sal:int | sex:chararray | dno:int |
------------------------------------------------
---------------------------------------------
| | 101 | aaaa |
40000 | m | 11 |
------------------------------------------------
---------------------------------------------
------------------------------------------------
--------------------------------------------
| e3 | id:int | name:chararray |
dno:int | sex:chararray | sal:int |
------------------------------------------------
--------------------------------------------
| | 101 | aaaa | 11
| m | 40000 |
------------------------------------------------
--------------------------------------------
grunt>
iv) generating new fields.
grunt> e4 = foreach emp generate * ,
>> sal*0.1 as tax, sal*0.2 as hra,
sal+hra-tax as net;
--above statement will be failed, bcoz, new
field aliases can not be reused in same
statement.
solution:
grunt> e4 = foreach emp generate *,
>> sal*0.1 as tax, sal*0.2 as hra;
grunt> e4 = foreach e4 generate *,
sal+hra-tax as net;
grunt> dump e4
v) converting data types:
[ explicit casting ]
runt> e5 = foreach e4 generate
>> id,name,sal, (int)tax, (int)hra,
(int)net, sex, dno;
grunt> describe e5
e5: {id: int,name: chararray,sal: int,tax:
int,hra: int,net: int,sex: chararray,dno: int}
grunt>
vi) renaming fields.
grunt> describe emp
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e6 = foreach emp generate
>> id as ecode, name , sal as income, sex as
gender, dno;
grunt> describe e6
e6: {ecode: int,name: chararray,income:
int,gender: chararray,dno: int}
grunt>
vii) conditional transformations:
----------------------------------
[cloudera@quickstart ~]$ cat > test1
100 200
300 120
400 220
300 500
10 90
10 5
[cloudera@quickstart ~]$ hadoop fs -
copyFromLocal test1 piglab
[cloudera@quickstart ~]$
grunt> r1 = load 'piglab/test1' as
(a:int, b:int);
grunt> r2 = foreach r1 generate *,
(a>b ? a:b) as big;
grunt> dump r2
(100,200,200)
(300,120,300)
(400,220,400)
(300,500,500)
(10,90,90)
(10,5,10)
grunt>
tenary operator(conditional operator) is used
for transformations(conditional).
syntax:
(criteria ? TrueValue : FalseValue)
--nested conditions
grunt> cat piglab/samp1
100 200 300
400 500 900
100 120 23
123 900 800
grunt> s1 = load 'piglab/samp1'
as (a:int, b:int, c:int);
grunt> s2 = foreach s1 generate *,
(a>b ? (a>c ? a:c): (b>c ? b:c)) as big,
(a<b ? (a<c ? a:c): (b<c ? b:c))
as small;
grunt> dump s2
(100,200,300,300,100)
(400,500,900,900,400)
(100,120,23,120,23)
(123,900,800,900,123)
----------------
grunt> describe emp
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e7 = foreach emp generate
id, name , sal, (sal>=70000 ? 'A':
(sal>=50000 ? 'B':
(sal>=30000 ? 'C':'D')))
as grade,
(sex=='m' ? 'Male':'Female') as sex,
(dno==11 ? 'Marketing':
(dno==12 ? 'Hr':
(dno==13 ? 'Finance':'Others')))
as dname;
grunt> store e7 into 'piglab/e7'
using PigStorage(',')
grunt> ls piglab/e7
hdfs://quickstart.cloudera:8020/user/cloudera/pi
glab/e7/_SUCCESS<r 1> 0
hdfs://quickstart.cloudera:8020/user/cloudera/pi
glab/e7/part-m-00000<r 1>228
grunt> cat piglab/e7/part-m-00000
101,aaaa,40000,C,Male,Marketing
102,bbbbbb,50000,B,Female,Hr
103,cccc,50000,B,Male,Hr
104,dd,90000,A,Female,Finance
105,ee,10000,D,Male,Hr
106,dkd,40000,C,Male,Hr
107,sdkfj,80000,A,Female,Finance
108,iiii,50000,B,Male,Marketing
grunt>
--------------------------
-- cleaning nulls using conditional
transformations.
[cloudera@quickstart ~]$ cat > damp
100,200,
,300,500
500,,700
500,,
,,700
,800,
1,2,3
10,20,30
[cloudera@quickstart ~]$ hadoop fs -
copyFromLocal damp piglab
[cloudera@quickstart ~]$
grunt> d = load 'piglab/damp'
using PigStorage(',')
as (a:int, b:int, c:int);
(100,200,)
(,300,500)
(500,,700)
(500,,)
(,,700)
(,800,)
(1,2,3)
(10,20,30)
d1 = foreach d generate *, a+b+c as tot;
dump d1
(100,200,,)
(,300,500,)
(500,,700,)
(500,,,)
(,,700,)
(,800,,)
(1,2,3,6)
(10,20,30,60)
grunt> d2 = foreach d generate
(a is null ? 0:a) as a,
(b is null ? 0:b) as b,
(c is null ? 0:c) as c;
grunt> res = foreach d2 generate *, a+b+c as d;
grunt> dump res
(100,200,0,300)
(0,300,500,800)
(500,0,700,1200)
(500,0,0,500)
(0,0,700,700)
(0,800,0,800)
(1,2,3,6)
(10,20,30,60)
----------------------------------------
foreach:
-- copy relation to realtion
-- selecting fields
-- changing field order
-- generating new fields.
-- changing data types.
-- renaming fields
-- conditional transformations.
----------------------------------
nice blog
ReplyDeleteGood Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
Individual likely hour effort whatever security. Happy yet rock both in.entertainment
ReplyDelete