Cloudera Enterprise 6.0 Beta | Other versions

Installing the Spark Indexer

The Spark indexer uses a Spark or MapReduce ETL batch job to move data from HDFS files into Apache Solr. As part of this process, the indexer uses Morphlines to extract and transform data.

To use the Spark indexer, solr-crunch must be installed on hosts where you want to submit a batch indexing job.

By default, this tool is included with Cloudera Search when you have installed CDH using parcels in a Cloudera Manager deployment. If you are using a package-based installation and this tool does not exist on your system, you can install it using the commands described in this topic.

  • RHEL compatible:
    sudo yum install solr-crunch
  • Ubuntu/Debian:
    sudo apt-get install solr-crunch
  • SLES:
    sudo zypper install solr-crunch

For information on using Spark to batch index documents, see Spark Indexing.

Page generated March 7, 2018.