Cloudera Enterprise 6.0 Beta | Other versions

Configuring Oozie

This page explains how to configure Oozie for upgrades in an unmanaged deployment, without Cloudera Manager.

  Important:

Continue reading:

Configuring Oozie after Upgrading from a CDH 5 Release

  Note: If you are installing Oozie for the first time, skip this section and go to Configuring Oozie after a New Installation.

Step 1: Update Configuration Files

  1. Edit the new Oozie CDH 6 oozie-site.xml, and set all customizable properties to the values you set in the previous oozie-site.xml.
  2. If necessary do the same for the oozie-log4j.properties, oozie-env.sh, and the adminusers.txt files.

Step 2: Upgrade the Database

  Important:
  • Do not proceed before you have edited the configuration files as instructed in Step 1.
  • Before running the database upgrade tool, copy or symbolically link the JDBC driver JAR for the database you are using into the /var/lib/oozie/ directory.

Oozie CDH provides a command-line tool to perform the database schema and data upgrade. The tool uses Oozie configuration files to connect to the database and perform the upgrade.

The database upgrade tool works in two modes: it can do the upgrade in the database or it can produce an SQL script that a database administrator can run manually. If you use the tool to perform the upgrade, you must do it as a database user who has permissions to run DDL operations in the Oozie database.

  • As the oozie Unix user, run the Oozie database upgrade tool against the database:

    $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -run

    You will see output such as the following (the output of the script may differ slightly depending on your database vendor):

    Validate DB Connection
    DONE
    Check DB schema exists
    DONE
    Verify there are not active Workflow Jobs
    DONE
    Check OOZIE_SYS table does not exist
    DONE
    Get Oozie DB version
    DONE
    Upgrade SQL schema
    DONE
    Upgrading to db schema for Oozie 5.0.0-beta1-cdh6.0.0 Beta
    Update db.version in OOZIE_SYS table to 3
    DONE
    Converting text columns to bytea for all tables
    DONE
    Get Oozie DB version
    DONE
    
    Oozie DB has been upgraded to Oozie version 'Oozie 5.0.0-beta1-cdh6.0.0 Beta'
    
    The SQL commands have been written to: /tmp/ooziedb-8676029205446760413.sql
    
  • As the oozie Unix user, create the upgrade script:

    $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -sqlfile SCRIPT

    For example, to name your script oozie-upgrade.sql, run the following command:

    $ sudo -u oozie  /usr/lib/oozie/bin/ooziedb.sh upgrade -sqlfile oozie-upgrade.sql

    You will see output such as the following (the output of the script may differ slightly depending on your database vendor):

    Validate DB Connection
    DONE
    Check DB schema exists
    DONE
    Verify there are not active Workflow Jobs
    DONE
    Check OOZIE_SYS table does not exist
    DONE
    Get Oozie DB version
    DONE
    Upgrade SQL schema
    DONE
    Upgrading to db schema for Oozie 5.0.0-beta1-cdh6.0 Beta
    Update db.version in OOZIE_SYS table to 3
    DONE
    Converting text columns to bytea for all tables
    DONE
    Get Oozie DB version
    DONE
    
    The SQL commands have been written to: oozie-upgrade.sql
    
    WARN: The SQL commands have NOT been executed. You must use the '-run' option.
    The .sql file will be created only if there is something to upgrade. If you run
    the '-sqlscript' option after the '-run' option, then the file will be not created.

Step 3: Upgrade the Oozie Shared Library

  Important: This step is required. The current version of Oozie does not work with shared libraries from an earlier version.

The Oozie installation contains a shared library for YARN named oozie-sharelib-yarn.

To upgrade the shared library, install the Oozie CDH 6 shared libraries. For example:

$ sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-yarn
        

where FS_URI is the HDFS URI of the filesystem that the shared library should be installed. The URI should be in the format of hdfs://<HOST>:<PORT>).

Step 4: Start the Oozie Server

To start Oozie, run the following command:

$ sudo service oozie start

Check the Oozie log (/var/log/oozie/oozie-audit.log) to verify that Oozie has started successfully.

Step 5: Upgrade the Oozie Client

Although older Oozie clients work with the new Oozie server, you need to install the new version of the Oozie client to use all the functionality of the Oozie server.

To upgrade the Oozie client, if you have not already done so, follow the steps under Installing Oozie.

Configuring Oozie after a New Installation

  Note: If you are upgrading Oozie from a CDH 5 release, see Configuring Oozie after Upgrading from a CDH 5 Release.

When you install Oozie from an RPM or Debian package, Oozie server creates all configuration, documentation, and runtime files in the standard Linux directories, as follows.

Type of File Where Installed

binaries

/usr/lib/oozie/

configuration

/etc/oozie/conf/

documentation

  • For SLES:

    /usr/share/doc/packages/oozie/
  • For RHEL:
    /usr/share/doc/oozie-5.0.0-beta1+cdh6.x+0
  • For other platforms:

    /usr/share/doc/oozie/

examples TAR.GZ

  • For SLES:

    /usr/share/doc/packages/oozie/
  • For RHEL:
    /usr/share/doc/oozie-5.0.0-beta1+cdh6.x+0

    For other platforms:

    /usr/share/doc/oozie/

sharelib

/usr/lib/oozie/

data

/var/lib/oozie/

logs

/var/log/oozie/

temp

/var/tmp/oozie/

PID file

/var/run/oozie/

Deciding Which Database to Use

Oozie has a built-in Derby database, but Cloudera recommends that you use a PostgreSQL, MariaDB, MySQL, or Oracle database instead, for the following reasons:
  • Derby runs in embedded mode and it is not possible to monitor its health.
  • Though it might be possible, Cloudera currently has no live backup strategy for the embedded Derby database.
  • Under load, Cloudera has observed locks and rollbacks with the embedded Derby database that do not happen with server-based databases.
See Database Requirements for tested database versions.

Configuring Oozie to Use PostgreSQL

Use the procedure that follows to configure Oozie to use PostgreSQL instead of Apache Derby.

  1. Install PostgreSQL
  2. Create the Oozie User and Oozie Database
  3. Configure PostgreSQL to Accept Network Connections for the Oozie User
  4. Reload the PostgreSQL Configuration
  5. Configure Oozie to Use PostgreSQL

Install PostgreSQL

See the PostgreSQL documentation to install it.

Create the Oozie User and Oozie Database

For example, using the PostgreSQL psql command-line tool:

$ psql -U postgres
Password for user postgres: *****

postgres=# CREATE ROLE oozie LOGIN ENCRYPTED PASSWORD 'oozie' 
 NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;
CREATE ROLE

postgres=# CREATE DATABASE "oozie" WITH OWNER = oozie
 ENCODING = 'UTF8'
 TABLESPACE = pg_default
 LC_COLLATE = 'en_US.UTF-8'
 LC_CTYPE = 'en_US.UTF-8'
 CONNECTION LIMIT = -1;
CREATE DATABASE

postgres=# \q

Configure PostgreSQL to Accept Network Connections for the Oozie User

  1. Edit the postgresql.conf file and set the listen_addresses property to *, to make sure that the PostgreSQL server starts listening on all your network interfaces. Also make sure that the standard_conforming_strings property is set to off.
  2. Edit the PostgreSQL data/pg_hba.conf file as follows:
    host    oozie         oozie         0.0.0.0/0             md5

Reload the PostgreSQL Configuration

$ sudo -u postgres pg_ctl reload -s -D /opt/PostgreSQL/8.4/data

Configure Oozie to Use PostgreSQL

Edit the oozie-site.xml file as follows:

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>org.postgresql.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:postgresql://localhost:5432/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...
  Note: In the JDBC URL property, replace localhost with the hostname where PostgreSQL is running. In the case of PostgreSQL, unlike MySQL or Oracle, you do not need to download and install the JDBC driver separately because it is license-compatible with Oozie and bundled with it.

Configuring Oozie to Use MariaDB

Continue reading:

  1. Install and Start MariaDB
  2. Create the Oozie Database and Oozie MariaDB User
  3. Configure Oozie to Use MariaDB
  4. Add the MariaDB JDBC Driver JAR to Oozie

Use the procedure that follows to configure Oozie to use MariaDB instead of Apache Derby.

Install and Start MariaDB

For more information, see Installing MariaDB Server.

Create the Oozie Database and Oozie MariaDB User

For example, using the MariaDB mysql command-line tool:

$ mysql -u root -p
Enter password:

MariaDB [(none)]> create database oozie default character set utf8;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]>  grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]>  grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> exit
Bye

Configure Oozie to Use MariaDB

Edit properties in the oozie-site.xml file as follows:

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://localhost:3306/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...
  Note: In the JDBC URL property, replace localhost with the hostname where MariaDB is running.

Add the MariaDB JDBC Driver JAR to Oozie

Cloudera recommends that you use the MySQL JDBC driver for MariaDB. Copy or symbolically link the MySQL JDBC driver JAR to the /var/lib/oozie/ directory.

  Note: You must manually download the MySQL JDBC driver JAR file.

Configuring Oozie to Use MySQL

Use the procedure that follows to configure Oozie to use MySQL instead of Apache Derby.

  1. Install and Start MySQL 5.x
  2. Create the Oozie Database and Oozie MySQL User
  3. Configure Oozie to Use MySQL
  4. Add the MySQL JDBC Driver JAR to Oozie

Install and Start MySQL 5.x

See the MySQL 5.x documentation to install and start it.

Create the Oozie Database and Oozie MySQL User

For example, using the MySQL mysql command-line tool:

$ mysql -u root -p
Enter password:

mysql> create database oozie default character set utf8;
Query OK, 1 row affected (0.00 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.00 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye

Configure Oozie to Use MySQL

Edit properties in the oozie-site.xml file as follows:

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://localhost:3306/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...
  Note: In the JDBC URL property, replace localhost with the hostname where MySQL is running.

Add the MySQL JDBC Driver JAR to Oozie

Copy or symbolically link the MySQL JDBC driver JAR into one of the following directories:
  • For installations that use packages: /var/lib/oozie/
  • For installations that use parcels: /opt/cloudera/parcels/CDH/lib/oozie/lib/
directory.
  Note: You must manually download the MySQL JDBC driver JAR file.

Configuring Oozie to use Oracle

Use the procedure that follows to configure Oozie to use Oracle 11g instead of Apache Derby.

  1. Install and Start Oracle 11g
  2. Create the Oozie Oracle User and Grant Privileges
  3. Configure Oozie to Use Oracle
  4. Add the Oracle JDBC Driver JAR to Oozie

Install and Start Oracle 11g

Use Oracle's instructions.

Create the Oozie Oracle User and Grant Privileges

The following example uses the Oracle sqlplus command-line tool, and shows the privileges Cloudera recommends. Oozie needs CREATE SESSION to start and manage workflows. The additional roles are needed for creating and upgrading the Oozie database.

$ sqlplus system@localhost

Enter password: ******

SQL> create user oozie identified by oozie default tablespace users temporary tablespace temp;

User created.

SQL> grant alter index to oozie;
grant alter table to oozie;
grant create index to oozie;
grant create sequence to oozie;
grant create session to oozie;
grant create table to oozie;
grant drop sequence to oozie;
grant select dictionary to oozie;
grant drop table to oozie;
alter user oozie quota unlimited on users; 
alter user oozie quota unlimited on system;

SQL> exit

$
  Important:

For security reasons, do not make the following grant:

grant select any table to oozie;

Configure Oozie to Use Oracle

Edit the oozie-site.xml file as follows.

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>oracle.jdbc.OracleDriver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:oracle:thin:@//myhost:1521/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...
  Note: In the JDBC URL property, replace myhost with the hostname where Oracle is running and replace oozie with the TNS name of the Oracle database.

Add the Oracle JDBC Driver JAR to Oozie

Copy or symbolically link the Oracle JDBC driver JAR into the /var/lib/oozie/ directory.

  Note: You must manually download the Oracle JDBC driver JAR file.

Creating the Oozie Database Schema

After configuring Oozie database information and creating the corresponding database, create the Oozie database schema. Oozie provides a database tool for this purpose.
  Note: The Oozie database tool uses Oozie configuration files to connect to the database to perform the schema creation. Before using the tool, make sure that you have created a database and configured Oozie to work with it as described above.

The Oozie database tool works in two modes: it can create the database, or it can produce an SQL script that a database administrator can run to create the database manually. If you use the tool to create the database schema, you must have the permissions needed to execute DDL operations.

As the oozie Unix user, run the Oozie database tool against the database:

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run

You should see output such as the following (the output of the script may differ slightly depending on your database vendor):

Validate DB Connection.
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '5.0.0-beta1-cdh6.0.0 Beta'

The SQL commands have been written to: /tmp/ooziedb-5737263881793872034.sql

As the oozie Unix user, generate the create script:

In a terminal window, run:

/usr/lib/oozie/bin/ooziedb.sh create -sqlfile SCRIPT
          

For example:

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie-create.sql
          

You should see output such as the following (the output of the script may differ slightly depending on your database vendor):

Validate DB Connection.
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '5.0.0-beta1-cdh6.0.0 Beta'

The SQL commands have been written to: oozie-create.sql

WARN: The SQL commands have NOT been executed, you must use the '-run' option
          
  Important: If you used the -sqlfile option instead of -run, Oozie database schema has not been created. You must run the oozie-create.sql script against your database. The .sql file will be created only if there is something to upgrade. If you run the '-sqlscript' option after the '-run' option, then file will be not created.

Enabling the Oozie Web Console

To enable the Oozie web console, download and add the ExtJS library to the Oozie server.

Step 1: Download the Library

Download the ExtJS version 2.2 library from https://archive.cloudera.com/gplextras/misc/ext-2.2.zip and place it a convenient location.

Step 2: Install the Library

Copy the ext-2.2.zip file into /usr/lib/oozie/embedded-oozie-server/webapp.

Step 3: Configure SPNEGO authentication (in Kerberos clusters only)

The web console shares a port with the Oozie REST API, and the API allows modifications of Oozie jobs (kill, submission, and inspection). SPNEGO authentication ensures that the Kerberos realm trusts the client browser credentials and that configuration of the client web browser passes these credentials. If this configuration is not possible, use the Hue Oozie Dashboard instead of the Oozie Web Console.

See How to Configure Browsers for Kerberos Authentication and Configuring a Dedicated MIT KDC for Cross-Realm Trust.

Configuring Oozie with Kerberos Security

To configure Oozie with Kerberos security, see Oozie Authentication.

Installing the Oozie Shared Library in Hadoop HDFS

The Oozie installation includes the shared library for YARN (oozie-sharelib-yarn), which contains all of the JARs required to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions.

  Important: If Hadoop is configured with Kerberos security enabled, you must first configure Oozie with Kerberos Authentication. For instructions, see Oozie Security Configuration. Before running the commands in the following instructions, you must run the sudo -u oozie kinit -k -t /etc/oozie/oozie.keytab and kinit -k hdfs commands. Then, instead of using commands in the form sudo -u user command, just enter the command at the prompt. For example, $ hadoop fs -mkdir /user/oozie

To install the Oozie shared library in Hadoop HDFS in the oozie user home directory

$ sudo -u hdfs hadoop fs -mkdir /user/oozie
$ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
$ sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-yarn

where FS_URI is the HDFS URI of the filesystem that the shared library should be installed on. For example: hdfs://<HOST>:<PORT>.

Configuring Oozie Support for MapReduce Uber JARs

An uber JAR is a JAR that contains other JARs with dependencies in a lib/ folder inside the JAR.

  Important: When you build an application JAR, do not include CDH JARs, because they are already provided. If you do, upgrading CDH can break your application. To avoid this situation, set the Maven dependency scope to provided. For more information, see Using the CDH 6 Maven Repository.

You can configure the cluster to handle uber JARs properly for the MapReduce action (as long as it does not include any streaming) by setting the following property in the oozie-site.xml file:

...
    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
        <value>true</value>
...

When this property is set, users can use the oozie.mapreduce.uber.jar configuration property in their MapReduce workflows to notify Oozie that the specified JAR file is an uber JAR.

Configuring Oozie to Run against a Federated Cluster

To run Oozie against a federated HDFS cluster using ViewFS, configure the oozie.service.HadoopAccessorService.supported.filesystems property in oozie-site.xml as follows:

<property>
     <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
     <value>hdfs,viewfs</value>
</property>
Page generated March 7, 2018.