SreeRam Hadoop Notes: Pig Lab 11: Executing Scripts and UDFs.

Excecuting scripts(pig) :-
____________________________

three commands (operators) are used to execute scripts.

i) pig
ii) exec.
iii) run.

i) Pig:-

to execute script from Command Line(operating sys).

$ pig script1.pig
--> script will be executed,
but relation aliases are not available with grunt shell. so that we can not reuse them.

ii) exec:

--> to execute script from grunt shell.
still aliases will not be available with grunt.
so "No reuse".

grunt> exec script1.pig

__________________________________

iii) run:

---> to execute script from grunt shell,
Aliases will be available with grunt. So we can reuse them.

grunt> run script1.pig
______________________________

run:
adv --> aliases will be available.
disadv --> overriding previous aliases with same name.

exec:
adv --> aliases will not be available.
so no -0verriding.
disadv --> no reusability.

pig :
adv --->
-- used for production operators.
--- can be called other evenvironments , like shell script.

disadv --> aliases will not be reflected into grunt.

_________________________________

Pig Udfs:
_____________j

User defined functions.

adv:
i) Custom functionality.
ii) Reusabilty .

Udf life cycle:

step 1) Develop UDF class
step 2) Export into jar file.
step 3) register jar file into pig.
step 4) create temporary function for the UDF class.
step 5) call the function.
__________________________

[training@localhost ~]$ cat > samp1
101,ravi
102,mani
103,Deva
104,Devi
105,AmAr
[training@localhost ~]$ hadoop fs -copyFromLocal samp1 piglab
[training@localhost ~]$

grunt> s = load 'piglab/samp1'
>> using PigStorage(',')
>> as (id:int, name:chararray);

eclipse navigations:
i) create java project.

file --> new --> java project.
ex: PigDemo

ii) create package>

pigDemo ---> new ---> package.

ex: pig.test

iii) configure pig jar.

src -- build path --> configure build path --> libraries ---> add external jars.

/usr/lib/pig/pig-core.jar

iv) create jar class

pig.test ---> new --> class

FirstUpper

v) export into jar.

pigdemo ---> export --> java --java jar --
/home/training/Desktop/pigudfs.jar

_____________________
package pig.test;
import java.io.IOException;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class FirstUpper extends EvalFunc<String>
{
public String exec(Tuple v) throws IOException
{ // raVI --> Ravi

String name = (String)v.get(0);
String fc = name.substring(0,1).toUpperCase();
String rc = name.substring(1).toLowerCase();
String n = fc+rc;
return n;
}

}

grunt> register Desktop/pigudfs.jar;

grunt> define cconvert pig.test.FirstUpper();

grunt> r = foreach s generate
>> id, cconvert(name) as name;

grunt> dump r;

______________________________

[training@localhost ~]$ cat > f1
100 200 120
300 450 780
120 56 90
1000 3456 789
[training@localhost ~]$ hadoop fs -copyFromLocal f1 piglab
[training@localhost ~]$

task:
write udf , to find max value for a row.

package pig.test;

import java.io.IOException;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class RowMax extends EvalFunc<Integer>
{
public Integer exec(Tuple v) throws IOException
{
int a = (Integer)v.get(0);
int b = (Integer)v.get(1);
int c = (Integer)v.get(2);
int big =0;
if (a>big) big=a;
if (b>big) big=b;
if (c>big) big=c;
return new Integer(big);
}

}

export into jar.

/home/training/Desktop/pigudfs.jar

grunt> s1 = load 'piglab/f1'
>> as (a:int, b:int, c:int);
grunt> register Desktop/pigudfs.jar;
grunt> define rowmax pig.test.RowMax();
grunt> r1 = foreach s1 generate *,
>> rowmax(*) as rmax;
grunt> dump r1
(100,200,120,200)
(300,450,780,780)
(120,56,90,120)
(1000,3456,789,3456)

[training@localhost ~]$ cat f2
-10,-30,-56,-23,-21,-5
1,2,3,45,67,9
[training@localhost ~]$ hadoop fs -copyFromLocal f2 piglab
[training@localhost ~]$

package pig.test;

import java.io.IOException;
import java.util.List;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class DynRowMax extends EvalFunc<Integer>
{
public Integer exec(Tuple v) throws IOException
{
List<Object> olist = v.getAll();
int max = 0;// 10 30 20
int cnt=0;
for( Object o : olist){
cnt++;
int val= (Integer)o;
if (cnt==1) max = val;
max = Math.max(val, max);
}
return new Integer(max);
}

}

export into jar /home/training/Desktop/pigudfs.jar

grunt> register Desktop/pigudfs.jar;
grunt> define dynmax pig.test.DynRowMax();
grunt> ss = load 'piglab/f2'
>> using PigStorage(',')
>> as (a:int, b:int, c:int, d:int,
>> e:int, f:int);
grunt> define rmax pig.test.RowMax();
grunt> rr = foreach ss generate *,
>> dynmax(*) as max;
grunt> dump rr

SreeRam Hadoop Notes

Data science Software Course Training in Ameerpet Hyderabad

Saturday, 23 July 2016

Pig Lab 11: Executing Scripts and UDFs.

1 comment: