-
Notifications
You must be signed in to change notification settings - Fork 3
Indexing Data
Sphinx uses SpatialHadoop for Indexing data, so a stable version of SpatialHadoop must be used to provide the spatial indexed data for Sphinx.
Sphinx expects spatial data to be in Well Known Text (WKT) format. SpatialHadoop supports indexing WKT spatial data which makes it easier for Sphinx to index data using SpatialHadoop
SpatialHadoop creates the indexed data in a separate folder, each file is a partition and a separate "_master" file is created for the metadata of the index.
Cloudera's Impala creates for partitioned data a directory for each partition named "[column_name]=[column_value]", so to be able to add the data correctly for Sphinx the data will be places in the table directory and each partition file will be in a separate directory following the naming convention of Impala.
The Index file can be placed anywhere within HDFS, as Sphinx's query for creating a spatial table takes as a table property the HDFS path to the index file, check out Sphinx's spatial queries for more info.