Cloudera Enterprise 6.0 Beta | Other versions

Spark 2 New Features and Changes

The following sections describe what's new and changed in Spark for CDH 6. Because CDH 6 includes Spark 2.x, this list includes the features across several releases of the Cloudera Distribution of Apache Spark 2 parcel, which was available separately for CDH 5.

Continue reading:

What's New in Apache Spark for CDH 6 Beta

  • Support for using Spark 2 jobs to read and write data on the Azure Data Lake Store (ADLS) cloud service.

  • Cloudera Distribution of Apache Spark 2.2 requires JDK 8.

  • New direct connector to Kafka that uses the new Kafka consumer API. See Spark 2 Kafka Integration for details.

  • New SparkSession object replaces HiveContext and SQLContext.
    • Most of the Hive logic has been reimplemented in Spark.
    • Some Hive dependencies still exist:
      • SerDe support.
      • UDF support.
  • Added support for the unified Dataset API.
  • Faster Spark SQL achieved with whole stage code generation.
  • More complete SQL syntax now supports subqueries.
  • Adds the spark-csv library.
  • Backport of SPARK-5847. The root for metrics is now the app name (spark.app.name) instead of the app ID. The app ID requires investigation to match to the app name, and changes when streaming jobs are stopped and restarted.
Page generated March 7, 2018.