Data science Software Course Training in Ameerpet Hyderabad

Data science Software Course Training in Ameerpet Hyderabad

Tuesday 16 May 2017

Pig : Order [ Sorting ] , exec, run , pig


 order :-
   to sort data (tuples) in ascending or descending order.

 emp = load 'piglab/emp'
     using PigStorage(',')
     as (id:int, name:chararray,
    sal:int, sex:chararray, dno:int);

 e1 = order emp by name;
 e2 = order emp by sal desc;
 e3 = order emp by sal desc, sex, dno desc;

 ---------------------------------------
sql:
   select * from emp order by sal desc limit 3;

 e = order emp by sal desc;
 top3 = limit e 3;


limitation:

 101,aaa,30000,.....
 102,bbb,90000,....
 103,cccc,90000,....
 104,dd,90000,....
 105,ee,80000,.....

 above process is correct,
  if there are no duplicated salaries.





--------------------------
 sql:
   select dno, sum(sal) as tot from emp
     group by dno
     order by tot desc
     limit 3;


 e =  foreach emp generate dno, sal;
 grp = group e by dno;
 res = foreach grp generate
         group as dno, SUM(e.sal) as tot;
 sorted = order res by tot desc;
 top3 = limit sorted 3;

limitation:

 dump res
 11,2l
 12,3l
 13,3l
 14,3l
 15,3l
 16,0.5l

 above process is correct,
  if there are no , duplicated sums for each dno.
solution to above two problems.
 i)udf
 ii)joins.

-------------------------------------

 executing scripts in pig.
--------------------------
 3 commands :
 i)pig
 ii) exec
 iii) run

Pig:
  to execute script file from command prompt.
  alias of reltions are not available with
   grunt shell.

 [cloudera@quickstart ~]$ pig  script1.pig

exec:
  to execute script from grunt shell.
 still aliases of relations are not available.
 so we can not reuse them.

grunt> exec script1.pig

run:
 
 to execute script from grunt shell.
 aliases of relations are available in grunt.
 so that you can reuse them.

--------------------------------------

environment availability
                            of aliases in grunt

 pig command prompt no

 exec grunt no

 run grunt yes
-----------------------------------------

when to use what.
 
  to deploy scripts from shell script or oozie,
  use Pig.

  to execute from grunt, and not to override previous alias of existed flows.
   use "Exec".

 to execute from grunt, and to reuse them
  use "run".

---------------------------------






























1 comment: