Data science Software Course Training in Ameerpet Hyderabad

Data science Software Course Training in Ameerpet Hyderabad

Wednesday, 17 May 2017

Pig : Udfs using Python

we can keep multiple functions
  under one program(.py)

 transoform.py
-------------------------
from pig_util  import outputSchema
@outputSchema(name:Chararray)
def  firstUpper(x):
   fc = x[0].upper()
   rc = x[1:].lower()
   n = fc+rc
   return n
@outputSchema(sex:Chararray)
def  gender(x):
   if x=='m':
      x = 'Male'
   else:
      x = 'Female'
   return x

@outputSchema(dname:chararray)
def dept(dno):
   dname="Others"
   if dno==11:
       dname = 'Marketing'
   elif dno==12:
       dname="Hr"
   elif dno==13:
      dname="Finance"
   return dname

-----------------
register  'transform.py' using jython
  as myprog

res = foreach emp generate
     id, myprog.firstUpper(name) as name,
     sal , myprog.gender(sex) as sex,
       myprog.dept(dno) as dname;
-------------------------------------


pig unions :


 data1 = load 'file1' using PigStorage(',')
   as (name:chararray, city:chararray);

 data2 = load 'file2' using PigStorage(',')
   as (name:chararray, sal:int);

d1 = foreach data1 generate
        name, city, null as sal ;
d2 = foreach data2 generate
        name, null as city, sal;

d = union d1, d2;




7 comments: