Cloudera Enterprise 6.0 Beta | Other versions

Troubleshooting for Spark 2

Troubleshooting for Spark mainly involves checking configuration settings and application code to diagnose performance and scalability issues.

Continue reading:

Error instantiating Hive metastore class

A Hive compatibility issue in Cloudera Distribution of Apache Spark 2.0 release 1 affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality.

When you encounter a problem due to the Hive compatibility issue, the error stack starts like this:

java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
  at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1545)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.

The solution is to upgrade to Cloudera Distribution of Apache Spark 2.0 Release 2 or higher. Upgrading to CDH 6 also solves the problem, because of the level of Spark 2 that is included with CDH 6.

Wrong version of Python

Spark 2 requires Python 2.7 or higher. You might need to install a new version of Python on all hosts in the cluster, because some Linux distributions come with Python 2.6 by default. If the right level of Python is not picked up by default, set the PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables to point to the correct Python executable before running the pyspark command.

API changes that are not backward-compatible

Between Spark 1.6 (part of CDH 5) and Spark 2.0 (part of CDH 6), some APIs have changed in ways that are not backward compatible. Recompile all CDH 5 Spark applications under CDH 6 to take advantage of Spark 2 capabilities. For any compilation errors, check if the corresponding function has changed in Spark 2, and if so, change your code to use the latest function name, parameters, and return type.

A Spark component does not work or is unstable

Certain components from the Spark ecosystem are explicitly not supported with the Spark 2 that is included in CDH 6. Check against the compatibility matrix for Spark to make sure the components you are using are all intended to work with Spark in CDH 6.

Page generated March 7, 2018.