Indexing Data

Sphinx uses SpatialHadoop for Indexing data, so a stable version of SpatialHadoop must be used to provide the spatial indexed data for Sphinx.

Spatial Data format

Sphinx expects spatial data to be in Well Known Text (WKT) format. SpatialHadoop supports indexing WKT spatial data which makes it easier for Sphinx to index data using SpatialHadoop

Data directory hierarchy:

SpatialHadoop creates the indexed data in a separate folder, each file is a partition and a separate "_master" file is created for the metadata of the index.

Cloudera's Impala creates for partitioned data a directory for each partition named "[column_name]=[column_value]", so to be able to add the data correctly for Sphinx the data will be places in the table directory and each partition file will be in a separate directory following the naming convention of Impala.

Index file:

The Index file can be placed anywhere within HDFS, as Sphinx's query for creating a spatial table takes as a table property the HDFS path to the index file, check out Sphinx's spatial queries for more info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing Data

Spatial Data format

Data directory hierarchy:

Index file:

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally