diff --git a/docs/Web Scraper Cloud.md b/docs/Web Scraper Cloud.md index 0a412df..7596d70 100644 --- a/docs/Web Scraper Cloud.md +++ b/docs/Web Scraper Cloud.md @@ -110,8 +110,8 @@ element click selector. If the timeout is reached, no data will be scraped from [scheduler]: Web%20Scraper%20Cloud/Scheduler.md [api]: Web%20Scraper%20Cloud/API.md [parser]: Web%20Scraper%20Cloud/Parser.md -[data-export]: Web%20Scraper%20Cloud/Data%20Export.md -[image-export]: Web%20Scraper%20Cloud/Image%20Export.md +[data-export]: Web%20Scraper%20Cloud/Data%20export.md +[image-export]: Web%20Scraper%20Cloud/Image%20export.md [scraping-job-performance-graph]: ./images/cloud/scraping-job-performance-graph.png?raw=true [parallel-tasks]: images/cloud/parallel-tasks.png [Subscription manager]: https://cloud.webscraper.io/subscription-manager diff --git a/docs/Web Scraper Cloud/Data Export.md b/docs/Web Scraper Cloud/Data export.md similarity index 95% rename from docs/Web Scraper Cloud/Data Export.md rename to docs/Web Scraper Cloud/Data export.md index d91658f..20da091 100644 --- a/docs/Web Scraper Cloud/Data Export.md +++ b/docs/Web Scraper Cloud/Data export.md @@ -21,7 +21,7 @@ be also downloaded while the scraper is running. ### Automated data export Set up automated data export to `Dropbox`, `Google Sheets` or `S3` -via the `Data Export` section. Currently exported data will be in CSV format. Data +via the `Data export` section. Currently exported data will be in CSV format. Data will be exported to `Apps/Web Scraper` in your `Dropbox` , `Google Drive/Web Scraper` in `Google Sheets` and `bucket/web-scraper` in `S3`. @@ -69,8 +69,8 @@ using a CSV reader library when reading CSV files programmatically. ## Opening CSV file with a spreadsheet program -We recommend using [Libre Office Calc] [libre-office-calc] when opening CSV -files. Microsoft office often is incorrectly interpreting CSV files formatted in +We recommend using [LibreOffice Calc] [libre-office-calc] when opening CSV +files. Microsoft Office often is incorrectly interpreting CSV files formatted in RFC 4180 standard. Mostly this is related to text including newline characters. In case when a CSV file is incorrectly opened by Microsoft Excel try using data diff --git a/docs/Web Scraper Cloud/Image Export.md b/docs/Web Scraper Cloud/Image export.md similarity index 82% rename from docs/Web Scraper Cloud/Image Export.md rename to docs/Web Scraper Cloud/Image export.md index 63324ae..31630ab 100644 --- a/docs/Web Scraper Cloud/Image Export.md +++ b/docs/Web Scraper Cloud/Image export.md @@ -1,24 +1,24 @@ -# Image Export +# Image export Web Scraper Cloud supports automated image export to `Amazon S3, Google Cloud Storage, and Azure Blob Storage`. This feature is available exclusively for `Scale` plan users. Image downloading is performed during the execution of the scraping job. As pages are processed, associated images are downloaded in parallel with data extraction. -## Image Export Configuration +## Image export configuration -The Image Export tab will be visible when the sitemap contains at least one `Image` selector. +The image export tab will be visible when the sitemap contains at least one `Image` selector. -![Fig. 1: Image Export Tab in Web Scraper Cloud][image-export-tab-web-scraper-cloud] +![Fig. 1: Image export tab in Web Scraper Cloud][image-export-tab-web-scraper-cloud] -## Exported Image Location +## Exported image location Images are exported to the same path as the data export, within an `images` subfolder. For example, if data is exported to `bucket/web-scraper/my-sitemap` in S3, images will be exported to `bucket/web-scraper/my-sitemap/images`. -## Image Columns +## Image columns Each `Image` selector creates a separate column in the exported data. The column name follows the format `{image_selector_id}_stored_filename`. File names are generated using the SHA-256 hash of the image URL. -![Fig. 2: Image Export Column Name][image-export-column-name] +![Fig. 2: Image export column name][image-export-column-name] -## Image Column Structure Based on Selector Configuration +## Image column structure based on selector configuration Example image selector ID: `product_image`