Topics In Demand
Notification
New

No notification found.

Blog
Hadoop Job Operations

March 15, 2018

431

0

Job Operations

Submitting a work flow, coordinator or Bundle Job:-

src=https://mindmajix.com/docs/images/Capture-15(32).png Submitting Bundle feature is only supported in zones 3.0 or later. Similarly, all Bundle OPERATION features below are supported in zones 3.0 or later
 
Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- config job. Properties –submit
 
Job: 14-20010525161321- oozie –job
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The Properties for the job must be provided in a file, either a Java Properties file(.properties) or a Hadoop XML configuration file(.xml) and the file must be specified with the-config option.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The work flow application path must be specified in the file with the oozie.wf. application. path Properties.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The coordinator application path must be specified in the file with the oozie. coord. application. path Properties.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The bundle application path must be specified in the file with the oozie. bundle. application. path Properties. and specified path must be HDFS path.
 
src=https://mindmajix.com/docs/images/Capture-15(32).pngThe job will be created, but it will not be started and will be in preparation status.
 

Starting a work flow, coordinator job Bundle Job:-

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- start 14-20090525161321- oozie– joe
src=https://mindmajix.com/docs/images/Capture-15(32).png The start option start a previously submitted work flow job, coordinator job or bundle job that is in preparation status.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed, the work flow job will be in RUNNING Status, coordinator job and bundle job will also be in RUNNING Status

Running a work flow, coordinator or Bundle job:-

Example:-
 

$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- con fig job. properties –run

Job: 15-20090525161321- oozie– Joe

src=https://mindmajix.com/docs/images/Capture-15(32).png The run option creates and states a work flow job, coordinator job or bundle job
src=https://mindmajix.com/docs/images/Capture-15(32).png The Parameters for the job and work flow, coordinator and bundle application path are specified same as in submitting method.

Suspending a work flow, coordinator or Bundle job:-

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Suspend  14-20090525161321- oozie– Joe
 
src=https://mindmajix.com/docs/images/Capture-15(32).pngThe Suspend option Suspends a work flow job in RUNNING Status
 
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed, the work flow job will be in SUSPEND Status.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The Suspend option Suspends a coordinator/bundle job in RUNNING  WITH ERROR or PREP Status
 
src=https://mindmajix.com/docs/images/Capture-15(32).png When the coordinator job is suspended, running coordinator actions will stay in running and the work flow will be in Suspended.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png If the coordinator job is in RUNNING Status, it will transit to SUSPEND Status. If it is in RUNNING WITH ERROR Status, it will transit to SUSPEND WITH ERROR and if it is in PREP Status, it will Transit to PRE SUSPEND Status
 
src=https://mindmajix.com/docs/images/Capture-15(32).png When the bundle job is suspended, running coordinator will also be suspended.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png If the bundle job is in RUNNING Status, it will transit to SUSPENDED Status. If it is in RUNNING WITH ERROR Status, it will transit to SUSPEND WITH ERROR and if it is in PREP Status, it will Transit to PRE SUSPEND Status.

Resuming a work flow, coordinator or Bundle job:-

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Resume  14-20090525161321- oozie– Joe
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The Resume option resumes a work flow job in SUSPENDED Status
 
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed, the work flow job will be in RUNNING Status.
 

Killing a work flow, coordinator or Bundle job:-

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Kill  14-20090525161321- oozie– Joe
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The Kill option Kills a work flow job in PREP, SUSPENDED  or Status and coordinator or Bundle job in =PREP RUNNING, PREP SUSPENDED, SUSPENDED, or PAUSED Status
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed, the job will be in KILLED Status.

Changing end time/concurrency/pause time of a coordiantorjob:-

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Change 14-20090525161321- oozie– Joe –value end time=2011-12-01TOS:
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The Change option Changes a coordinator job that is not in KILLED Status
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Valid value names are end time, concurrency and pause time.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Repeated value names are not allowed.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png New end time must not be before job’s start time and last action time.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png New concurrency value has to be a valid integer.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed. The job’s end time, concurrency or pause time should be changed.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png If an already-SUCECEDED job changes, its end time and its Status will keep running.
 

Changing pause time of a Bundle job:-

Example:-
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Change  14-20090525161321- oozie– Joe –value pause time=2011-12-01TOS:00Z 
src=https://mindmajix.com/docs/images/Capture-15(32).png The Change option Changes a Bundle job as it is not in KILLED Status
src=https://mindmajix.com/docs/images/Capture-15(32).png Valid value names are
 
pause time : the pause time of the Bundle job
src=https://mindmajix.com/docs/images/Capture-15(32).png Repeated value names are not allowed.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed, the job’s pause time must be changed.
 

Rerunning a work flow job:-

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Con fig job. properties.  -rerun 14-20090525161321- oozie– joe 
src=https://mindmajix.com/docs/images/Capture-15(32).png The rerun option reruns a completed (SUCCEDED, FAILED, or KILLED) job by skipping the specified nodes.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed, the job will be in RUNNING Status
 

Rerunning a coordinator Action or Multiple Actions:-

Example:-
 
$ oozie job – rerun[-no cleanup][-refresh][-action1,3-4,7-40] (-action or-date is required to rerun)  [-date 2009-01-01T01:ooz:: 2009-05-31 T23: 59z, 2009-11-10T01: ooz,  2009-12-31T22:ooz](if neither- action nor-data is given, the exception will be thrown)
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The rerun option reruns a terminated (=TIMEOUT=,SUCCEDED, KILLED,FAILED) coordinator action when coordiantor job is not in FAILED or KILLED state.
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed, the rerurn coordinator action will be in WAITING Status.
 
Rerunning a Bundle job:-
 
Example:-
 
$ oozie job – rerun< bundle –job-id >[-no cleanup][-refresh] [-coordinator c1,c3,c4)( coordinator or –date is required to rerun)  [-date 2009-01-01T01:ooz:: 2009-05-31 T23: 59z, 2009-11-10T01: ooz, 2009-12-31T22:ooz] (if neither- coordinator nor –date is give, the exception will be thrown)
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The rerun option reruns coordinator action belonging to specified coordinator within the specified data range.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png After the command is executed, the rerun coordinator action will be in WAITING Status.
 

Checking the status of a work flow, coordinator or Bundle Job or a coordinator Action:-

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- info 14-2009052511613 21-– oozie -Joe
 
Work flow Name     :                  map-reduce-wf
 
App path :   https://locaL host:8020/user/joe/work flows/mapreduce
 
Status                       :                  SUCCEDED
 
Run                           :                  0
 
User                          :                  Joe
 
Group                       :                  Users
 
Created                    :                  2009-05-26      05:01 +0000
 
Stated                       :                  2009-05-26      05:01 +0000
 
Ended                       :                  2009-05-26      05:01 +0000
 
Actions
 
 
Action nameTypeStatusTransactionExternal IdExternal status
Hadoop 1Map-reduceokendjob-20090428135-0524SUCCEDED
Error codeStatus 2009-05-26END 2009-05-26  
_05:01 +0000 05:01 +0000   
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The info option can display information about a Work flow job or coordinator job or coordinator action.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The info command may time out if the number of coordinator actions are very high
 
src=https://mindmajix.com/docs/images/Capture-15(32).png In that case, info should be used with offset and lent option,
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Offset and lent option specifies the display of offset and number of actions to display if checking a Work flow job or coordinator job
 

Checking the server logs  of a work flow, coordinator or Bundle Job

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- log 14-200905251613 21-– oozie –Joe
 

Checking  the server logs  of a particular actions of a Coordinator Job :-

Example:-
 
$ oozie job – log[-action 1,3-4,7-40](-action is optional)

Checking the status of multiple work flow Job :-

 
Example:-
 
$ oozie job – oozie https://local host:11000/ oozie- localtime -len 2 – filter status- RUNNING
 
src=https://mindmajix.com/docs/images/Capture-15(32).png A filter can be specified after all options.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The filter option syntax is : [NAME=VALUE][;NAME=VALUE]*
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Valid Filter names are:
 
name: work flow application name
 
user: the user that submitted the job
 
group: the group for the job
 
status: the status of the job
 
frequency: frequency of the coordinator job
 
unit: the time unit which take months, days, hours or minutes values.
 

Checking the status of multiple coordinator Job :-

Example:-
 
$ Oozie job – oozie HTTP://LOCALhost:11000/ oozie- job type coordinator
 

Job ID App Name status Freq Unit Stated Next Materialized

Successes 1440 minute
 

Checking the status of multiple Bundle Job :-

Example:-
 
$ oozie job – oozie HTTP://LOCAL host:11000/ oozie- job type Bundle
 
Job ID Bundle Name status kick off creator user group
 
0000027-110 oozie-chao-B BUNDLE-TEST RUNNING 2012-01-15 00:24 2011-03 Joe users
 

Admin Operations:-

Checking  the status of the oozie system

Example:-
 
$ oozie admin – oozie HTTP://LOCAL host:11000/ oozie- status safe mode: OFF
 
src=https://mindmajix.com/docs/images/Capture-15(32).png It returns the current status of the oozie system
 

Checking the status of the oozie system(in oozie 20 or later)

Example:-
 
$ oozie admin – oozie HTTP://LOCAL host:11000/ oozie- system mode
 
Safe mode: ON
 
src=https://mindmajix.com/docs/images/Capture-15(32).png It returns the current status of the oozie system

Displaying the Build version of the oozie system

Example:-
 
$ oozie admin – oozie HTTP://LOCAL host:11000/ oozie- version  Oozie server Build version: 2.0.2.1-0.20.1.3092118008--
 
src=https://mindmajix.com/docs/images/Capture-15(32).png It returns the oozie server Build version.
 

Validate Operations

Example:-
 
$ oozie validate my APP/Work flow.xml
 
src=https://mindmajix.com/docs/images/Capture-15(32).png It performs an XML schema validation on the specified Work flow xml file.
 

Pig Operations

Submitting a pig job through HTTP:-

Example:-
 
$ oozie pig – oozie HTTP://LOCAL host:11000/ oozie- file .pig script file -con fig job. Properties –X –param –file params Job: 14-2009052515161321-oozie-joe-w $ cat job. Properties Fs.default.name= hdfs:/1local host:8020 Map reduces. Job tracker. Kerberos. Principal=ccc dfs. Name Node. Kerberos. principal= ddd Oozie. Libpath =hdfs:/1localhost:8020/user/ Oozie/pigl lib/ 
src=https://mindmajix.com/docs/images/Capture-15(32).png The parameters for the job must be provided in a Java properties file(.properties).
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Job tracker, Name Node, lib path must be specified in this file.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Pig script file is a local file
 
src=https://mindmajix.com/docs/images/Capture-15(32).png All jar files including pig jar and all other files needed by the pig job, need to be uploaded on to HDFS under lib path beforehand.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The workflow.xml will be created in Oozie server initially.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The job will be created and run right away.
 

Map- reduce Operations:-

Submitting a map-reduce job

Example:-
 
$ oozie map-reduce- oozie HTTP://LOCAL host:11000/ oozie- con fig .job. properties.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The parameters must be in the Java properties file. And this file must be specified for a map-reduce job.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The properties file must specify the mapped. Mapper-class, mapred.
 

Re Run:-

src=https://mindmajix.com/docs/images/Capture-15(32).png Reloads the config.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Creates a new work flow instance with the same Id
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Deletes the actions that are not skipped from the DB and copies data from old work flow insurance to new one for skipped actions.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Action handler will skip the nodes given in the con fig with the same exit transaction as before.
 

Work flow Re Run:-

 
src=https://mindmajix.com/docs/images/Capture-15(32).png Config
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Pre- conditions
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Reruns.
Config:-
 
.Oozie. wf. application. Path
src=https://mindmajix.com/docs/images/Capture-15(32).png Only one of following two configurations is mandatory and both should not be defined at the same time.
 
Oozie. wf. Return. Fail nodes.  Oozie.wf. rerun .fail nodes
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Skip nodes are comma separated list of action names. And they can be any action nodes including decision node.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png The valid value of oozie. Wf. Re run. Fail nodes is either true or false
 
src=https://mindmajix.com/docs/images/Capture-15(32).png If secured hadoop version is used, the following two properties needs to be specified as well
 
-dfs. Name Node. Kerberos. Principal  -map reduce. Job tracker. Kerberos. principal

Pre- conditions:-

src=https://mindmajix.com/docs/images/Capture-15(32).png Work flow with id WFID should exist.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png Work flow with id WFID should be Succeeded/Killed/failed.
 
src=https://mindmajix.com/docs/images/Capture-15(32).png If specified, nodes in the config oozie.wf. rerun. Skip. Nodes must be completed successfully.


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


Comment

images

Thank You so much for writing this blog.The information you provided in this Blog is very useful.The information is worth and very useful for the beginners. Apache Hadoop is an 100% open source framework for distributed storage and processing of large sets of data. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. If you want more details about Hadoop Click here

© Copyright nasscom. All Rights Reserved.