Data science Software Course Training in Ameerpet Hyderabad

Data science Software Course Training in Ameerpet Hyderabad

Monday, 8 May 2017

Pig : Foreach Operator


Foreach Operator:
-------------------
grunt> emp = load 'piglab/emp' using PigStorage
(',')
>>  as (id:int, name:chararray, sal:int,
>>     sex:chararray, dno:int);
i) to copy data from one relation to
   another relation.
 emp2 = emp
 or
 emp2 = foreach emp generate *;
-------------------------------------
ii) selecting wanted fields;
grunt> e2 = foreach emp generate name,sal,dno;
grunt> describe e2
e2: {name: chararray,sal: int,dno: int}
grunt>
iii) changing field order:
grunt> describe emp;
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e3 = foreach emp generate
id,name,dno,sex,sal;
grunt> describe e3;
e3: {id: int,name: chararray,dno: int,sex:
chararray,sal: int}
grunt> illustrate e3
------------------------------------------------
---------------------------------------------
| emp     | id:int     | name:chararray     |
sal:int     | sex:chararray     | dno:int     |
------------------------------------------------
---------------------------------------------
|         | 101        | aaaa               |
40000       | m                 | 11          |
------------------------------------------------
---------------------------------------------
------------------------------------------------
--------------------------------------------
| e3     | id:int     | name:chararray     |
dno:int     | sex:chararray     | sal:int     |
------------------------------------------------
--------------------------------------------
|        | 101        | aaaa               | 11
        | m                 | 40000       |
------------------------------------------------
--------------------------------------------
grunt>
iv) generating new fields.
grunt> e4 = foreach emp generate * ,
>>        sal*0.1 as tax, sal*0.2 as hra,
   sal+hra-tax as net;
 --above statement will be failed, bcoz, new
field aliases can not be reused in same
statement.
solution:
grunt> e4 = foreach emp generate *,
>>     sal*0.1 as tax, sal*0.2 as hra;
grunt> e4 = foreach e4 generate *,
                     sal+hra-tax as net;
grunt> dump e4
v) converting data types:
  [ explicit casting ]
runt> e5 = foreach e4 generate
>>        id,name,sal, (int)tax, (int)hra,
(int)net, sex, dno;
grunt> describe e5
e5: {id: int,name: chararray,sal: int,tax:
int,hra: int,net: int,sex: chararray,dno: int}
grunt>
vi) renaming fields.
grunt> describe emp
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e6 = foreach emp generate
>>    id as ecode, name , sal as income, sex as
gender, dno;
grunt> describe e6
e6: {ecode: int,name: chararray,income:
int,gender: chararray,dno: int}
grunt>
vii) conditional transformations:
----------------------------------
[cloudera@quickstart ~]$ cat > test1
100 200
300 120
400 220
300 500
10 90
10 5
[cloudera@quickstart ~]$ hadoop fs -
copyFromLocal test1 piglab
[cloudera@quickstart ~]$
grunt> r1 = load 'piglab/test1' as
            (a:int, b:int);
grunt> r2 = foreach r1 generate *,
          (a>b ?  a:b) as big;
grunt> dump r2
(100,200,200)
(300,120,300)
(400,220,400)
(300,500,500)
(10,90,90)
(10,5,10)
grunt>

 tenary operator(conditional operator) is used
for transformations(conditional).
 syntax:
  (criteria  ?  TrueValue : FalseValue)


--nested conditions
grunt> cat piglab/samp1
100 200 300
400 500 900
100 120 23
123 900 800
grunt> s1 = load 'piglab/samp1'
       as (a:int, b:int, c:int);
grunt> s2 = foreach s1 generate *,
      (a>b ? (a>c ? a:c): (b>c ? b:c)) as big,
       (a<b ? (a<c ? a:c): (b<c ? b:c))
            as small;
grunt> dump s2
(100,200,300,300,100)
(400,500,900,900,400)
(100,120,23,120,23)
(123,900,800,900,123)
----------------
grunt> describe emp
emp: {id: int,name: chararray,sal: int,sex:
chararray,dno: int}
grunt> e7 = foreach emp generate
    id, name , sal, (sal>=70000 ? 'A':
                     (sal>=50000 ? 'B':
                    (sal>=30000 ? 'C':'D')))
                     as grade,
    (sex=='m'  ? 'Male':'Female') as sex,
    (dno==11 ? 'Marketing':
       (dno==12 ? 'Hr':
           (dno==13 ? 'Finance':'Others')))
               as   dname;
grunt> store e7 into 'piglab/e7'
                 using    PigStorage(',')
grunt> ls piglab/e7
hdfs://quickstart.cloudera:8020/user/cloudera/pi
glab/e7/_SUCCESS<r 1> 0
hdfs://quickstart.cloudera:8020/user/cloudera/pi
glab/e7/part-m-00000<r 1>228
grunt> cat piglab/e7/part-m-00000
101,aaaa,40000,C,Male,Marketing
102,bbbbbb,50000,B,Female,Hr
103,cccc,50000,B,Male,Hr
104,dd,90000,A,Female,Finance
105,ee,10000,D,Male,Hr
106,dkd,40000,C,Male,Hr
107,sdkfj,80000,A,Female,Finance
108,iiii,50000,B,Male,Marketing
grunt>
--------------------------
-- cleaning nulls using conditional
transformations.
[cloudera@quickstart ~]$ cat > damp
100,200,
,300,500
500,,700
500,,
,,700
,800,
1,2,3
10,20,30
[cloudera@quickstart ~]$ hadoop fs -
copyFromLocal damp piglab
[cloudera@quickstart ~]$
grunt> d = load 'piglab/damp'
    using PigStorage(',')
    as (a:int, b:int, c:int);
(100,200,)
(,300,500)
(500,,700)
(500,,)
(,,700)
(,800,)
(1,2,3)
(10,20,30)

 d1 = foreach d generate *, a+b+c as tot;
 dump d1
(100,200,,)
(,300,500,)
(500,,700,)
(500,,,)
(,,700,)
(,800,,)
(1,2,3,6)
(10,20,30,60)
grunt> d2 = foreach d generate
     (a is null ? 0:a)  as a,
     (b is null ? 0:b) as b,
     (c is null ? 0:c) as c;
grunt> res = foreach d2 generate *, a+b+c as d;
grunt> dump res
(100,200,0,300)
(0,300,500,800)
(500,0,700,1200)
(500,0,0,500)
(0,0,700,700)
(0,800,0,800)
(1,2,3,6)
(10,20,30,60)
----------------------------------------
foreach:
  -- copy relation to realtion
  -- selecting fields
  -- changing field order
  -- generating new fields.
  -- changing data types.
  -- renaming fields
  -- conditional transformations.
----------------------------------































   












3 comments: