From 43522d7edffbe9b915fd12707f3c02e85a4d7e99 Mon Sep 17 00:00:00 2001
From: Lizy <lgeogy@snappydata.io>
Date: Fri, 29 Mar 2019 18:56:14 +0530
Subject: [PATCH 1/3] Changes to the How-tos: * How to load data from External
 Data Stores - Added a stem sentence to explain that the data load from CSV
 file using SQL is done from the local file system. * How to Load Data into
 SnappyData Tables - Added a Troubleshootin tip as suggested by Trilok and
 Amogh. Organized content and edited based on suggestions from Grammarly.

---
 .../load_data_from_external_data_stores.md    |  8 ++--
 .../howto/load_data_into_snappydata_tables.md | 37 ++++++++++---------
 2 files changed, 25 insertions(+), 20 deletions(-)
diff --git a/docs/howto/load_data_from_external_data_stores.md b/docs/howto/load_data_from_external_data_stores.md
index 36c15b501b..13287bc179 100644
--- a/docs/howto/load_data_from_external_data_stores.md
+++ b/docs/howto/load_data_from_external_data_stores.md
@@ -3,7 +3,9 @@
 
 SnappyData comes bundled with the libraries to access HDFS (Apache compatible). You can load your data using SQL or DataFrame API.
 
-## Example - Loading data from CSV file using SQL
+## Example - Loading Data from CSV File using SQL
+
+The following example demonstrates how you can load data from the CSV file, in a local file system, by using SQL:
 
 ```pre
 // Create an external table based on CSV file
@@ -14,7 +16,7 @@ CREATE TABLE CUSTOMER using column options() as (select * from CUSTOMER_STAGING_
 ```
 
 !!! Tip
-	Similarly, you can create an external table for all data sources and use SQL "insert into" query to load data. For more information on creating external tables refer to, [CREATE EXTERNAL TABLE](../reference/sql_reference/create-external-table/)
+	Similarly, you can create an external table for all data sources and use SQL "insert into" query to load data. For more information on creating external tables refer to, [CREATE EXTERNAL TABLE](../reference/sql_reference/create-external-table/).
 
 
 ## Example - Loading CSV Files from HDFS using API
@@ -73,7 +75,7 @@ val df = session.createDataFrame(rdd, ds.schema)
 df.write.format("column").saveAsTable("columnTable")
 ```
 
-## Importing Data using JDBC from a relational DB
+## Importing Data using JDBC from Rrelational DB
 
 !!! Note
 	Before you begin, you must install the corresponding JDBC driver. To do so, copy the JDBC driver jar file in **/jars** directory located in the home directory and then restart the cluster.
diff --git a/docs/howto/load_data_into_snappydata_tables.md b/docs/howto/load_data_into_snappydata_tables.md
index cbd0b7864c..23c59ece31 100644
--- a/docs/howto/load_data_into_snappydata_tables.md
+++ b/docs/howto/load_data_into_snappydata_tables.md
@@ -3,16 +3,13 @@
 
 SnappyData relies on the Spark SQL Data Sources API to parallelly load data from a wide variety of sources. By integrating the loading mechanism with the Query engine (Catalyst optimizer) it is often possible to push down filters and projections all the way to the data source minimizing data transfer. Here is the list of important features:
 
-**Support for many Sources** </br>There is built-in support for many data sources as well as data formats. Data can be accessed from S3, file system, HDFS, Hive, RDB, etc. And the loaders have built-in support to handle CSV, Parquet, ORC, Avro, JSON, Java/Scala Objects, etc as the data formats. 
+*	**Support for many Sources** </br>There is built-in support for many data sources as well as data formats. Data can be accessed from S3, file system, HDFS, Hive, RDB, etc. Moreover, loaders have built-in support to handle CSV, Parquet, ORC, Avro, JSON, Java/Scala Objects, etc. as the data formats.
+*	**Access virtually any modern data store**</br> Virtually all major data providers have a native Spark connector that complies with the Data Sources API. For example, you can load data from any RDB like Amazon Redshift, Cassandra, Redis, Elastic Search, Neo4J, etc. While thee connectors are not built-in, you can easily deploy these connectors as dependencies into a SnappyData cluster. All the connectors are typically registered in spark-packages.org.
+*	**Avoid Schema wrangling** </br>Spark supports schema inference. Which means, all you need to do is point to the external source in your 'create table' DDL (or Spark SQL API) and schema definition is learned by reading in the data. There is no need to define each column and type explicitly. This is extremely useful when dealing with disparate, complex and wide data sets.
+*	**Read nested, sparse data sets**</br> When data is accessed from a source, the schema inference occurs by not just reading a header but often by reading the entire data set. For instance, when reading JSON files, the structure could change from document to document. The inference engine builds up the schema as it reads each record and keeps unioning them to create a unified schema. This approach allows developers to become very productive with disparate data sets.
 
-**Access virtually any modern data store**</br> Virtually all major data providers have a native Spark connector that complies with the Data Sources API. For e.g. you can load data from any RDB like Amazon Redshift, Cassandra, Redis, Elastic Search, Neo4J, etc. While these connectors are not built-in, you can easily deploy these connectors as dependencies into a SnappyData cluster. All the connectors are typically registered in spark-packages.org
-
-**Avoid Schema wrangling** </br>Spark supports schema inference. Which means, all you need to do is point to the external source in your 'create table' DDL (or Spark SQL API) and schema definition is learned by reading in the data. There is no need to explicitly define each column and type. This is extremely useful when dealing with disparate, complex and wide data sets. 
-
-**Read nested, sparse data sets**</br> When data is accessed from a source, the schema inference occurs by not just reading a header but often by reading the entire data set. For instance, when reading JSON files the structure could change from document to document. The inference engine builds up the schema as it reads each record and keeps unioning them to create a unified schema. This approach allows developers to become very productive with disparate data sets.
-
-**Load using Spark API or SQL** </br> You can use SQL to point to any data source or use the native Spark Scala/Java API to load. 
-For instance, you can first [create an external table](../reference/sql_reference/create-external-table.md). 
+## Loading Data using Spark API or SQL
+You can use SQL to point to any data source or use the native Spark Scala/Java API to load. For instance, you can first [create an external table](../reference/sql_reference/create-external-table.md). 
 
 ```pre
 CREATE EXTERNAL TABLE <tablename> USING <any-data-source-supported> OPTIONS <options>
@@ -20,15 +17,17 @@ CREATE EXTERNAL TABLE <tablename> USING <any-data-source-supported> OPTIONS <opt
 
 Next, use it in any SQL query or DDL. For example,
 
+
 ```pre
 CREATE EXTERNAL TABLE STAGING_CUSTOMER USING parquet OPTIONS(path 'quickstart/src/main/resources/customerparquet')
 
 CREATE TABLE CUSTOMER USING column OPTIONS(buckets '8') AS ( SELECT * FROM STAGING_CUSTOMER)
+
 ```
 
-**Example - Load from CSV**
+## Example - Loading Data from CSV
 
-You can either explicitly define the schema or infer the schema and the column data types. To infer the column names, we need the CSV header to specify the names. In this example we don't have the names, so we explicitly define the schema. 
+You can either explicitly define the schema or infer the schema and the column data types. To infer the column names, we need the CSV header to specify the names. In this example we do not have the names, so we explicitly define the schema. 
 
 ```pre
 // Get a SnappySession in a local cluster
@@ -56,7 +55,7 @@ snSession.sql("CREATE TABLE CUSTOMER ( " +
     "USING COLUMN OPTIONS (PARTITION_BY 'C_CUSTKEY')")
 ```
 
-**Load data in the CUSTOMER table from a CSV file by using Data Sources API**
+**Load Data in the CUSTOMER Table from a CSV File by using Data Sources API**
 
 ```pre
 val tableSchema = snSession.table("CUSTOMER").schema
@@ -66,16 +65,16 @@ customerDF.write.insertInto("CUSTOMER")
 
 The [Spark SQL programming guide](https://spark.apache.org/docs/2.1.1/sql-programming-guide.html#data-sources) provides a full description of the Data Sources API 
 
-**Example - Load from Parquet files**
+## Example - Loading Data from Parquet Files
 
 ```pre
 val customerDF = snSession.read.parquet(s"$dataDir/customer_parquet")
 customerDF.write.insertInto("CUSTOMER")
 ```
 
-**Inferring schema from data file**
+**Inferring Schema from Data File**
 
-A schema for the table can be inferred from the data file. Data is first introspected to learn the schema (column names and types) without requring this input from the user. The example below illustrates reading a parquet data source and creates a new columnar table in SnappyData. The schema is automatically defined when the Parquet data files are read. 
+A schema for the table can be inferred from the data file. Data is first introspected to learn the schema (column names and types) without requiring this input from the user. The example below illustrates reading a parquet data source and creates a new columnar table in SnappyData. The schema is automatically defined when the Parquet data files are read. 
 
 ```pre
 val customerDF = snSession.read.parquet(s"quickstart/src/main/resources/customerparquet")
@@ -100,11 +99,15 @@ customer_csv_DF.write.format("column").mode("append").options(props1).saveAsTabl
 
 The source code to load the data from a CSV/Parquet files is in [CreateColumnTable.scala](https://github.com/SnappyDataInc/snappydata/blob/master/examples/src/main/scala/org/apache/spark/examples/snappydata/CreateColumnTable.scala). 
 
-**Example - reading JSON documents**
+## Example - Reading JSON Documents
 As mentioned before when dealing with JSON you have two challenges - (1) the data can be highly nested (2) the structure of the documents can keep changing. 
 
-Here is a simple example that loads multiple JSON records that show dealing with schema changes across documents -   [WorkingWithJson.scala](https://github.com/SnappyDataInc/snappydata/blob/master/examples/src/main/scala/org/apache/spark/examples/snappydata/WorkingWithJson.scala)
+Here is a simple example that loads multiple JSON records that show dealing with schema changes across documents:   [WorkingWithJson.scala](https://github.com/SnappyDataInc/snappydata/blob/master/examples/src/main/scala/org/apache/spark/examples/snappydata/WorkingWithJson.scala)
 
 !!! Note
 
 	When loading data from sources like CSV or Parquet the files would need to be accessible from all the cluster members in SnappyData. Make sure it is NFS mounted or made accessible through the Cloud solution (shared storage like S3).
+
+## Troubleshooting Tip
+When reading or writing CSV/Parquet to and from S3, the `ConnectionPoolTimeoutException` error may be reported. To avoid this error, in the Spark context, set the value of the `fs.s3a.connection.maximum` property to a number greater than the possible number of partitions. </br>
+For example, `snc.sparkContext.hadoopConfiguration.set("fs.s3a.connection.maximum", "1000")`
\ No newline at end of file

From 7cccce19a6476b8900f87b62c55fa0fe681738c7 Mon Sep 17 00:00:00 2001
From: Lizy <lgeogy@snappydata.io>
Date: Fri, 29 Mar 2019 19:01:22 +0530
Subject: [PATCH 2/3] Minor edit as suggested by chandresh.

---
 docs/programming_guide/tables_in_snappydata.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/programming_guide/tables_in_snappydata.md b/docs/programming_guide/tables_in_snappydata.md
index c80b890d9b..d4e4ecdd45 100644
--- a/docs/programming_guide/tables_in_snappydata.md
+++ b/docs/programming_guide/tables_in_snappydata.md
@@ -31,7 +31,7 @@ CREATE TABLE [IF NOT EXISTS] table_name
 	)
 	[AS select_statement];
 
-DROP TABLE [IF EXISTS] table_name
+DROP TABLE [IF EXISTS] table_name;
 ```
 
 Refer to the [Best Practices](../best_practices/design_schema.md) section for more information on partitioning and colocating data and [CREATE TABLE](../reference/sql_reference/create-table.md) for information on creating a row/column table.</br>

From 26c2499f12f74b141c79f8c44798f0097408d067 Mon Sep 17 00:00:00 2001
From: Lizy <lgeogy@snappydata.io>
Date: Thu, 4 Apr 2019 12:48:10 +0530
Subject: [PATCH 3/3] minor edits

---
 docs/howto/load_data_from_external_data_stores.md          | 2 +-
 docs/reference/command_line_utilities/modify_disk_store.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/howto/load_data_from_external_data_stores.md b/docs/howto/load_data_from_external_data_stores.md
index 13287bc179..6fa9616ab9 100644
--- a/docs/howto/load_data_from_external_data_stores.md
+++ b/docs/howto/load_data_from_external_data_stores.md
@@ -75,7 +75,7 @@ val df = session.createDataFrame(rdd, ds.schema)
 df.write.format("column").saveAsTable("columnTable")
 ```
 
-## Importing Data using JDBC from Rrelational DB
+## Importing Data using JDBC from Relational DB
 
 !!! Note
 	Before you begin, you must install the corresponding JDBC driver. To do so, copy the JDBC driver jar file in **/jars** directory located in the home directory and then restart the cluster.
diff --git a/docs/reference/command_line_utilities/modify_disk_store.md b/docs/reference/command_line_utilities/modify_disk_store.md
index 93aef166dc..fad38926d7 100644
--- a/docs/reference/command_line_utilities/modify_disk_store.md
+++ b/docs/reference/command_line_utilities/modify_disk_store.md
@@ -16,6 +16,8 @@ Snappy>create region --name=regionName --type=PARTITION_PERSISTENT_OVERFLOW
 
 **For non-secured cluster**
 
+## Description
+
 The following table describes the options used for `snappy modify-disk-store`:
 
 | Items | Description |
@@ -27,8 +29,6 @@ The following table describes the options used for `snappy modify-disk-store`:
 !!! Note
 	The name of the disk store, the directories its files are stored in, and the region to target are all required arguments.
 
-## Description
-
 ## Examples 
 
 **Secured cluster**