Cloudera Enterprise 6.0 Beta | Other versions

Configuring an External Database for Sqoop 2

  Note: This page contains references to CDH 5 components or features that have been removed from CDH 6. These references are only applicable if you are managing a CDH 5 cluster with Cloudera Manager 6. For more information, see the Deprecated Items page in the Cloudera Enterprise 6 Release Notes.
Sqoop 2 has a built-in Derby database, but Cloudera recommends that you use a PostgreSQL database instead, for the following reasons:
  • Derby runs in embedded mode and it is not possible to monitor its health.
  • Though it might be possible, Cloudera currently has no live backup strategy for the embedded Derby database.
  • Under load, Cloudera has observed locks and rollbacks with the embedded Derby database that do not happen with server-based databases.
See Database Requirements for tested database versions.
  Note:

Cloudera currently has no recommended way to migrate data from an existing Derby database into the new PostgreSQL database.

Use the procedure that follows to configure Sqoop 2 to use PostgreSQL instead of Apache Derby.

  1. Install PostgreSQL
  2. Create the Sqoop 2 User and Sqoop 2 Database
  3. Configure Sqoop 2 to use PostgreSQL

Install PostgreSQL

See the PostgreSQL documentation to install it.

Create the Sqoop 2 User and Sqoop 2 Database

$ psql -U postgres
Password for user postgres: *****

postgres=# CREATE ROLE sqoop LOGIN ENCRYPTED PASSWORD 'sqoop'
 NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;
CREATE ROLE

postgres=# CREATE DATABASE "sqoop" WITH OWNER = sqoop
 ENCODING = 'UTF8'
 TABLESPACE = pg_default
 LC_COLLATE = 'en_US.UTF8'
 LC_CTYPE = 'en_US.UTF8'
 CONNECTION LIMIT = -1;
CREATE DATABASE

postgres=# \q

Configure Sqoop 2 to use PostgreSQL

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the Sqoop 2 service.
  2. Click the Configuration tab.
  3. Select Scope > Sqoop 2 Server.
  4. Select Category > Database.
  5. Set the following properties:
    • Sqoop Repository Database Type - postgresql
    • Sqoop Repository Database Host - the hostname on which you installed the PostgreSQL server. If the port is non-default for your database type, use host:port notation.
    • Sqoop Repository Database Name, User, Password - the properties you specified in Create the Sqoop 2 User and Sqoop 2 Database.
  6. Click Save Changes to commit the changes.
  7. Restart the service.
Page generated March 7, 2018.