Amazon Scraper

Free Amazon Scraper

この無料ツールを使用すると、検索結果ページからAmazonの商品データを直接抽出できます。いくつかの簡単な手順だけで、商品タイトル、価格、評価、レビューなどを簡単に取得できます。

Prerequisites

Python 3.11以上。
必要な依存関係をインストールします（以下の手順を参照してください）。

Quick Setup

ターミナルを開き、このプロジェクトのディレクトリに移動します。
次のコマンドを実行して依存関係をインストールします：
```
pip install -r requirements.txt
```

How to Scrape Amazon Data

Amazonデータのスクレイピングを開始するには、検索クエリを指定するだけです。Amazonのドメインと、スクレイピングしたいページ数も指定できます。

Command:

python main.py "<your_search_query>" --domain="<amazon_domain>" --pages=<number_of_pages>

<your_search_query>: 検索キーワード（例: "coffee maker"）。
<amazon_domain>: スクレイピングしたいAmazonドメイン（デフォルト: Amazon USのcom）。
<number_of_pages>: スクレイピングするページ数（任意。指定しない場合は利用可能な全ページをスクレイピングします）。

Example:

Amazon USドメインで「coffee maker」のデータをスクレイピングし、結果の最初の3ページをスクレイピングするには、以下のコマンドです。

python main.py "coffee maker" --domain="com" --pages=3

Output

スクレイピング後、抽出されたデータはプロジェクトディレクトリ内のamazon_data.csvとして保存されます。CSVファイルには以下の詳細が含まれます：

Name: 商品タイトル。
Current Price: 商品価格（在庫切れの場合は空）。
Rating: 平均顧客評価。
Reviews: 顧客レビュー総数。
ASIN: Amazon Standard Identification Number。
Link: Amazon上の商品ページへの直接URL。

データの表示例は以下のとおりです：

Challenges When Scraping Amazon Data

Amazonデータのスクレイピングは常に簡単というわけではありません。遭遇し得る課題をいくつか挙げます：

高度なアンチボット対策: AmazonはCAPTCHA、不可視のボット検知技術、行動分析（マウス移動の追跡など）を用いてボットをブロックします。
ページ構造の頻繁な更新: AmazonはHTML構造、ID、class名を頻繁に変更するため、新しいページレイアウトに合わせてスクレイパーを定期的に更新する必要があります。
高いリソース消費: PlaywrightやSeleniumなどのツールでJavaScriptの多いページをスクレイピングすると、システムリソースを大きく消費する場合があります。動的コンテンツの処理や複数ブラウザインスタンスの実行は、特に大量データをスクレイピングする際にパフォーマンスを低下させる可能性があります。

以下は、Amazonが自動スクレイピングの試行を検知したときに起きる例です：

上記のとおり、Amazonはさらなるデータスクレイピングを防ぐためにリクエストをブロックしました。これは多くのスクレイパーが遭遇する一般的な問題です。

Solution: Bright Data Amazon Scraper API

Bright Data Amazon Scraper APIは、Amazonの商品データを大規模にスクレイピングするための究極のソリューションです。理由は次のとおりです：

インフラ管理不要: プロキシやアンブロックの仕組みを扱う必要がありません。
ジオロケーションスクレイピング: 任意の地理的地域からスクレイピングできます。
グローバルIPカバレッジ: 99.99%の稼働率で、195か国における7,200万以上の実ユーザーIPにアクセスできます。
柔軟なデータ配信: Amazon S3、Google Cloud、Azure、Snowflake、またはSFTPを介して、JSON、NDJSON、CSV、.gzなどの形式でデータを取得できます。
プライバシー準拠: GDPR、CCPA、その他のデータ保護法に完全準拠しています。
24/7サポート: 専任のサポートチームが24時間365日、API関連の質問や問題を支援します。

また、製品をテストしてニーズに適合するか確認するために、20回の無料APIコールも利用できます。

Amazon Scraper API in Action

Amazon Scraper APIのセットアップに関する詳細ガイドは、Step-by-Step Setup Guideをご確認ください。

Customize Data Collection with API Parameters

次のAPIパラメータを使用して、データ収集をカスタマイズできます：

Parameter	Type	Description	Example
`limit`	`integer`	各入力に対して返される結果数を制限します。	`limit=10`
`include_errors`	`boolean`	トラブルシューティング用に、出力にエラーレポートを含めます。	`include_errors=true`
`notify`	`url`	収集が完了したら通知が送信されるURLです。	`notify=https://notify-me.com/`
`format`	`enum`	データ配信の形式です。対応形式: JSON, NDJSON, JSONL, CSV。	`format=json`

💡追加の配信方法: データはwebhook経由、またはAPI経由で配信することもできます。

Amazon Product Data

商品URLを指定して、Amazonの詳細な商品データを収集します。

Key Input Parameters:

Parameter	Type	Description	Required
`url`	`string`	データをスクレイピングするAmazon商品URL	Yes

Performance:

入力あたりの平均レスポンスタイム: 13秒

Sample Output Data:

Amazonの商品データをスクレイピングした後に受け取る出力例です：

{
    "url": "https://www.amazon.com/KitchenAid-Protective-Dishwasher-Stainless-8-72-Inch/dp/B07PZF3QS3",
    "title": "KitchenAid All Purpose Kitchen Shears with Protective Sheath...",
    "seller_name": "Amazon.com",
    "brand": "KitchenAid",
    "description": "These all-purpose shears from KitchenAid are a valuable addition...",
    "initial_price": 11.99,
    "final_price": 8.99,
    "currency": "USD",
    "availability": "In Stock",
    "reviews_count": 77557,
    "rating": 4.8,
    "categories": [
        "Home & Kitchen",
        "Kitchen & Dining",
        "Kitchen Utensils & Gadgets",
        "Shears"
    ],
    "asin": "B07PZF3QS3",
    "images": [
        "https://m.media-amazon.com/images/I/41E7ALk+uXL._AC_SL1200_.jpg",
        "https://m.media-amazon.com/images/I/710B9HpzMPL._AC_SL1500_.jpg"
    ],
    "delivery": [
        "FREE delivery Friday, October 25 on orders shipped by Amazon over $35",
        "Or fastest Same-Day delivery Today 10 AM - 3 PM. Order within 4 hrs 46 mins"
    ]
}

Code Example:

以下は、Amazonの商品データ収集をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(api_token, dataset_id, datasets):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = (
        f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={dataset_id}"
    )

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = (
        f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
    )

    # Polling until the snapshot data is ready
    while True:
        time.sleep(10)
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None


def store_data(data, filename="amazon_products_data.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "YOUR_API_TOKEN"
    DATASET_ID = "gd_l7q7dkf244hwjntr0"

    datasets = [
        {
            "url": "https://www.amazon.com/Quencher-FlowState-Stainless-Insulated-Smoothie/dp/B0CRMZHDG8"
        },
        {
            "url": "https://www.amazon.com/KitchenAid-Protective-Dishwasher-Stainless-8-72-Inch/dp/B07PZF3QS3"
        },
        {
            "url": "https://www.amazon.com/TruSkin-Naturals-Vitamin-Topical-Hyaluronic/dp/B01M4MCUAF"
        },
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Reviews Data

商品URLに加えて、期間、キーワード、スクレイピングするレビュー数などの特定パラメータを指定して、Amazonレビューを収集します。

Key Input Parameters:

Parameter	Type	Description	Required
`url`	`string`	レビューをスクレイピングする対象のAmazon商品URLです。	Yes
`days_range`	`number`	レビュー収集時に考慮する過去日数（空欄の場合は制限なし）。	No
`keyword`	`string`	特定キーワードでレビューをフィルタリングします。	No
`num_of_reviews`	`number`	スクレイピングするレビュー数（未指定の場合は利用可能なレビューをすべてスクレイピングします）。	No

Performance:

入力あたりの平均レスポンスタイム: 1分1秒

Sample Output Data:

Amazonレビューをスクレイピングする際に受け取る出力例です：

{
    "url": "https://www.amazon.com/RORSOU-R10-Headphones-Microphone-Lightweight/dp/B094NC89P9/",
    "product_name": "RORSOU R10 On-Ear Headphones with Microphone...",
    "product_rating": 4.5,
    "product_rating_object": {
        "one_star": 386,
        "two_star": 237,
        "three_star": 584,
        "four_star": 1493,
        "five_star": 7630
    },
    "rating": 5,
    "author_name": "Amazon Customer",
    "review_header": "Great Sound For the Price!",
    "review_text": "I bought these headphones twice...",
    "badge": "Verified Purchase",
    "review_posted_date": "September 7, 2024",
    "helpful_count": 3
}

Code Example:

以下は、Amazonレビューデータ収集をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(api_token, dataset_id, datasets):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = (
        f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={dataset_id}"
    )

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = (
        f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
    )

    # Polling until the snapshot data is ready
    while True:
        time.sleep(10)
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None


def store_data(data, filename="amazon_reviews_data.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "YOUR_API_TOKEN"
    DATASET_ID = "gd_le8e811kzy4ggddlq"

    datasets = [
        {
            "url": "https://www.amazon.com/RORSOU-R10-Headphones-Microphone-Lightweight/dp/B094NC89P9/",
            "days_range": 0,
            "num_of_reviews": 4,
            "keyword": "great",
        },
        {
            "url": "https://www.amazon.com/Solar-Eclipse-Glasses-Certified-Viewing/dp/B08GB3QC1H",
            "days_range": 0,
            "num_of_reviews": 4,
            "keyword": "",
        },
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Products Search

検索用キーワードを指定してAmazon商品を見つけます。

Key Input Parameters:

Parameter	Type	Description	Required
`keyword`	string	商品検索に使用するキーワード	Yes
`url`	string	検索対象とするドメインURL	Yes
`pages_to_search`	number	検索するページ数	No

Performance:

入力あたりの平均レスポンスタイム: 1秒

Sample Output Data:

Amazonでキーワード検索を実行した後に受け取る出力例です：

{
    "asin": "B08H75RTZ8",
    "url": "https://www.amazon.com/Microsoft-Xbox-Gaming-Console-video-game/dp/B08H75RTZ8/ref=sr_1_1",
    "name": "Xbox Series X 1TB SSD Console - Includes Xbox Wireless Controller...",
    "sponsored": "false",
    "initial_price": 479,
    "final_price": 479,
    "currency": "USD",
    "sold": 2000,
    "rating": 4.8,
    "num_ratings": 28675,
    "variations": null,
    "badge": null,
    "brand": null,
    "delivery": ["FREE delivery"],
    "keyword": "X-box",
    "image": "https://m.media-amazon.com/images/I/616klipzdtL._AC_UY218_.jpg",
    "domain": "https://www.amazon.com/",
    "bought_past_month": 2000,
    "page_number": 1,
    "rank_on_page": 1,
    "timestamp": "2024-10-20T10:39:37.679Z",
    "input": {
        "keyword": "X-box",
        "url": "https://www.amazon.com",
        "pages_to_search": 1
    }
}

Code Example:

以下は、キーワードに基づくAmazon商品検索をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(api_token, dataset_id, datasets):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = (
        f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={dataset_id}"
    )

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = (
        f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
    )

    # Polling until the snapshot data is ready
    while True:
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None
        time.sleep(10)


def store_data(data, filename="amazon_keywords_data.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "YOUR_API_TOKEN"
    DATASET_ID = "gd_lwdb4vjm1ehb499uxs"

    datasets = [
        {"keyword": "X-box", "url": "https://www.amazon.com", "pages_to_search": 1},
        {"keyword": "PS5", "url": "https://www.amazon.de"},
        {
            "keyword": "car cleaning kit",
            "url": "https://www.amazon.es",
            "pages_to_search": 4,
        },
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Sellers Info

特定の出品者URLを指定して、Amazon出品者に関する詳細情報を見つけます。

Key Input Parameters:

Parameter	Type	Description	Required
`url`	`string`	Amazon出品者URL	Yes

Performance:

入力あたりの平均レスポンスタイム: 1秒

Sample Output Data:

出品者情報をスクレイピングした後に受け取る出力例です：

{
    "input": {
        "url": "https://www.amazon.com/sp?seller=A33W53J5GVPZ8K"
    },
    "seller_id": "A33W53J5GVPZ8K",
    "seller_name": "Peckomatic",
    "description": "Peckomatic is committed to providing each customer with the highest standard of customer service.",
    "detailed_info": [
        {"title": "Business Name"},
        {"title": "Business Address"}
    ],
    "stars": "4.5 out of 5 stars",
    "feedbacks": [
        {
            "date": "By Kao y. on November 16, 2021.",
            "stars": "5 out of 5 stars",
            "text": "It say not to exceed 10lbs total but I did anyway. My bird was 8lbs + the 3lb box = 11lbs. Bird arrived in great condition."
        },
        {
            "date": "By JL on June 9, 2021.",
            "stars": "1 out of 5 stars",
            "text": "How this seller packages its items is not acceptable..."
        }
    ],
    "rating_positive": "89%",
    "feedbacks_percentages": {
        "star_5": "80%",
        "star_4": "9%",
        "star_3": "7%",
        "star_2": "0%",
        "star_1": "5%"
    },
    "products_link": "https://www.amazon.com/s?me=A33W53J5GVPZ8K",
    "buisness_name": "Francis Kunnumpurath",
    "buisness_address": "2612 State Route 80, Lafayette, NY, 13084, US",
    "rating_count_lifetime": 44,
    "country": "US"
}

Code Example:

以下は、Amazon出品者データの収集をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(api_token, dataset_id, datasets):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = (
        f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={dataset_id}"
    )

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = (
        f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
    )

    # Polling until the snapshot data is ready
    while True:
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None
        time.sleep(10)


def store_data(data, filename="amazon_seller_data.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "API_TOKEN"
    DATASET_ID = "gd_lhotzucw1etoe5iw1k"

    # Define the dataset with seller URLs
    datasets = [
        {"url": "https://www.amazon.com/sp?seller=A33W53J5GVPZ8K"},
        {"url": "https://www.amazon.com/sp?seller=A33YXLPENB0JBD"},
        {"url": "https://www.amazon.com/sp?seller=A33ZG27WW2U3E6"},
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Products by Best Sellers

Best SellersカテゴリのURLを指定して、Amazonの売れ筋商品を見つけます。

Key Input Parameters:

Parameter	Type	Description	Required
`category_url`	`string`	スクレイピング元となるBest SellersカテゴリURL	Yes

Performance:

入力あたりの平均レスポンスタイム: 6分49秒

Sample Output Data:

AmazonのBest Sellersデータをスクレイピングした後に受け取る出力例です：

{
    "title": "Amazon Basics Multipurpose Copy Printer Paper, 8.5\" x 11\", 1 Ream, 500 Sheets, White",
    "seller_name": "Amazon.com",
    "brand": "Amazon Basics",
    "initial_price": 9.99,
    "final_price": 7.41,
    "currency": "USD",
    "availability": "In Stock",
    "reviews_count": 178695,
    "rating": 4.8,
    "categories": [
        "Office Products",
        "Paper",
        "Copy & Multipurpose Paper"
    ],
    "asin": "B01FV0F8H8",
    "buybox_seller": "Amazon.com",
    "discount": "-26%",
    "root_bs_rank": 1,
    "url": "https://www.amazon.com/AmazonBasics-Multipurpose-Copy-Printer-Paper/dp/B01FV0F8H8?th=1&psc=1",
    "image_url": "https://m.media-amazon.com/images/I/81x0cTHWQJL._AC_SL1500_.jpg",
    "delivery": [
        "FREE delivery Friday, October 25",
        "Same-Day delivery Today 10 AM - 3 PM"
    ],
    "features": [
        "1 ream (500 sheets) of 8.5 x 11 white copier and printer paper",
        "Works with laser/inkjet printers, copiers, and fax machines",
        "Smooth 20lb weight paper for consistent ink and toner distribution"
    ],
    "bought_past_month": 100000,
    "root_bs_category": "Office Products",
    "bs_category": "Copy & Multipurpose Paper",
    "bs_rank": 1,
    "amazon_choice": true,
    "badge": "Amazon's Choice",
    "seller_url": "https://www.amazon.com/sp?ie=UTF8&seller=ATVPDKIKX0DER&asin=B01FV0F8H8",
    "timestamp": "2024-10-20T13:30:56.666Z"
}

Code Example:

以下は、Amazon Best Sellersデータの収集をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(api_token, dataset_id, datasets):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={dataset_id}&type=discover_new&discover_by=best_sellers_url&limit_per_input=3"

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = (
        f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
    )

    # Polling until the snapshot data is ready
    while True:
        time.sleep(10)
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None


def store_data(data, filename="amazon_bestsellers_data.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "YOUR_API_TOKEN"
    DATASET_ID = "gd_l7q7dkf244hwjntr0"

    datasets = [
        {
            "category_url": "https://www.amazon.com/gp/bestsellers/office-products/ref=pd_zg_ts_office-products"
        },
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Products by Category URL

特定のカテゴリURLを指定して、Amazonの商品データを検出・収集します。並べ替えオプションや、ロケーションベースのフィルタで検索をカスタマイズできます。

Key Input Parameters:

Parameter	Type	Description	Required
`url`	`string`	商品をスクレイピングするカテゴリURL	Yes
`sort_by`	`string`	商品結果の並べ替え基準	No
`zipcode`	`string`	ロケーション固有の商品結果のための郵便番号	No

Performance:

入力あたりの平均レスポンスタイム: 16分16秒

Sample Output Data:

指定カテゴリから商品をスクレイピングした後に受け取るデータ例です：

{
    "title": "Quilted Makeup Bag Floral Makeup Bag Cotton Makeup Bag",
    "brand": "WYJ",
    "price": 9.99,
    "currency": "USD",
    "availability": "In Stock",
    "rating": 5,
    "reviews_count": 1,
    "categories": [
        "Beauty & Personal Care",
        "Cosmetic Bags"
    ],
    "asin": "B0DC3WX7RM",
    "seller_name": "yisenshangmaoyouxiangongsi",
    "number_of_sellers": 1,
    "url": "https://www.amazon.com/WYJ-Quilted-Coquette-Aesthetic-Blue/dp/B0DC3WX7RM",
    "image_url": "https://m.media-amazon.com/images/I/71SI04tB6QL._AC_SL1500_.jpg",
    "product_dimensions": "8.7\"L x 2.8\"W x 5.1\"H",
    "item_weight": "2.5 Ounces",
    "variations": [
        {
            "name": "Pink",
            "asin": "B0DC3RKYPF",
            "price": 9.99
        },
        {
            "name": "Blue",
            "asin": "B0DC3WX7RM",
            "price": 9.99
        },
        {
            "name": "Purple",
            "asin": "B0DC47CDDT",
            "price": 9.99
        }
    ],
    "badge": "#1 New Release",
    "top_review": "I love everything about this bag! It's made well and a good size. Super cute!"
}

Code Example:

以下は、指定したカテゴリURLからの商品の収集をトリガーし、データをJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(api_token, dataset_id, datasets):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={dataset_id}&type=discover_new&discover_by=category_url&limit_per_input=4"

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = (
        f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
    )

    # Polling until the snapshot data is ready
    while True:
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None
        time.sleep(10)


def store_data(data, filename="amazon_bestsellers_data.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "YOUR_API_TOKEN"
    DATASET_ID = "gd_l7q7dkf244hwjntr0"

    datasets = [
        {
            "url": "https://www.amazon.com/s?i=luggage-intl-ship",
            "sort_by": "Best Sellers",
            "zipcode": "10001",
        },
        {
            "url": "https://www.amazon.com/s?i=baby-products-intl-ship",
            "sort_by": "Avg. Customer Review",
            "zipcode": "",
        },
        {
            "url": "https://www.amazon.com/s?rh=n%3A16225012011&fs=true&ref=lp_16225012011_sar",
            "sort_by": "Price: Low to High",
            "zipcode": "",
        },
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Products by Keyword

特定のキーワードを使用して商品を検出します。

Key Input Parameters:

Parameter	Type	Description	Required
`keyword`	`string`	商品を検索するキーワード	Yes

Performance:

入力あたりの平均レスポンスタイム: 2分46秒

Sample Output Data:

キーワードで商品を検索した後に受け取る出力例です：

{
    "title": "SYLVANIA ECO LED Light Bulb, A19 60W Equivalent, 750 Lumens, 2700K, Non-Dimmable, Frosted, Soft White - 8 Count (Pack of 1)",
    "brand": "LEDVANCE",
    "seller_name": "Amazon.com",
    "initial_price": 13.99,
    "final_price": 12.12,
    "currency": "USD",
    "discount": "-13%",
    "rating": 4.7,
    "reviews_count": 48418,
    "availability": "In Stock",
    "url": "https://www.amazon.com/Sylvania-40821-Equivalent-Efficient-Temperature/dp/B08FRSS4BF",
    "image_url": "https://m.media-amazon.com/images/I/81wKhRO66oL._AC_SL1500_.jpg",
    "delivery": [
        "FREE delivery Friday, October 25 on orders shipped by Amazon over $35",
        "Or Prime members get FREE delivery Tomorrow, October 21. Order within 8 hrs 8 mins. Join Prime"
    ],
    "features": [
        "60W Incandescent Replacement Bulb - 750 Lumens",
        "Long-lasting – 7 years lifespan",
        "Energy-saving – Estimated energy cost of $1.08 per year"
    ],
    "discovery_input": {
        "keyword": "light bulb"
    },
    "input": {
        "url": "https://www.amazon.com/Sylvania-40821-Equivalent-Efficient-Temperature/dp/B08FRSS4BF"
    }
}

Code Example:

以下は、キーワードに基づくAmazon商品の収集をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(
    api_token, dataset_id, datasets, dataset_type="discover_new", discover_by="keyword"
):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={dataset_id}&type={dataset_type}&discover_by={discover_by}"

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = (
        f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
    )

    # Polling until the snapshot data is ready
    while True:
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None
        time.sleep(10)


def store_data(data, filename="amazon_keyword_data.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "API_TOKEN"
    DATASET_ID = "gd_l7q7dkf244hwjntr0"

    # Define the dataset with keywords
    datasets = [{"keyword": "light bulb"}, {"keyword": "dog toys"}]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Products Global Dataset

URLを指定して、主要なAmazonドメイン全体にわたる商品データを収集します。

Key Input Parameters:

Parameter	Type	Description	Required
`url`	`string`	Amazon商品URL	Yes

Performance:

入力あたりの平均レスポンスタイム: 1秒未満

Sample Output Data:

商品データ収集後に受け取る出力例です：

{
    "title": "Toys of Wood Oxford Wooden Stacking Rings – Learning to Count – Counting Game with 45 Rings – Wooden Toy for Ages 3 and Above",
    "brand": "Toys of Wood Oxford",
    "seller_name": "Toys of Wood Oxford",
    "initial_price": 23.99,
    "currency": "EUR",
    "final_price": 23.99,
    "availability": "Only 20 left in stock.",
    "rating": 4.5,
    "reviews_count": 1677,
    "asin": "B078TNNZK3",
    "url": "https://www.amazon.de/dp/B078TNNZK3?th=1&psc=1",
    "image_url": "https://m.media-amazon.com/images/I/815t1-d+7BL._AC_SL1500_.jpg",
    "product_dimensions": "43.31 x 11.61 x 11.51 cm; 830 g",
    "categories": [
        "Toys",
        "Baby & Toddler Toys",
        "Early Development & Activity Toys",
        "Sorting, Stacking & Plugging Toys"
    ],
    "delivery": [
        "FREE delivery Friday, 25 October on eligible first order",
        "Or fastest delivery Thursday, 24 October. Order within 4 hrs 40 mins"
    ],
    "features": [
        "Sturdy and stable base plate with 9 pins and 45 beautiful large wooden rings and 10 removable square number plates in rainbow colours.",
        "Great for learning counting, sorting, and matching colors and numbers, as well as practicing simple mathematics.",
        "Made from sustainable wood with eco-friendly and non-toxic paints. Complies with EN71 / CPSA standards."
    ],
    "top_review": "Sehr lehrreich",
    "variations": [
        {
            "name": "Caterpillar Threading Toy",
            "price": 13.99,
            "currency": "EUR"
        },
        {
            "name": "Pack of 15",
            "price": 16.99,
            "currency": "EUR"
        },
        {
            "name": "Pack of 45",
            "price": 23.99,
            "currency": "EUR"
        }
    ],
    "product_rating_object": {
        "one_star": 35,
        "two_star": 0,
        "three_star": 82,
        "four_star": 227,
        "five_star": 1308
    }
}

Code Example:

以下は、主要なAmazonドメイン全体の商品収集をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(
    api_token, dataset_id, datasets, dataset_type="trigger", discover_by="url"
):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={
        dataset_id}&type={dataset_type}&discover_by={discover_by}"

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = f"https://api.brightdata.com/datasets/v3/snapshot/{
        snapshot_id}?format=json"

    # Polling until the snapshot data is ready
    while True:
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None
        time.sleep(10)


def store_data(data, filename="amazon_products_global_dataset.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "API_TOKEN"
    DATASET_ID = "gd_lwhideng15g8jg63s7"

    # Define the dataset with URLs
    datasets = [
        {"url": "https://www.amazon.com/dp/B0CHHSFMRL/"},
        {
            "url": "https://www.amazon.de/-/en/dp/B078TNNZK3/ref=sspa_dk_browse_2/?_encoding=UTF8&ie=UTF8&sp_csd=d2lkZ2V0TmFtZT1zcF9icm93c2VfdGhlbWF0aWM%3D&pd_rd_w=fHlOu&content-id=amzn1.sym.642a11a6-0e1e-47fa-93c2-5dc9d607a7a1&pf_rd_p=642a11a6-0e1e-47fa-93c2-5dc9d607a7a1&pf_rd_r=4JX920KFM8Q7PR83HJ7V&pd_rd_wg=K1OVN&pd_rd_r=be656f87-1a09-4144-b7cf-4e932d6a73c4&ref_=sspa_dk_browse&th=1"
        },
        {
            "url": "https://www.amazon.co.jp/X-TRAK-Folding-Bicycle-Carbon-Adjustable/dp/B0CWV9YTLV/ref=sr_1_1_sspa?crid=3MKZ2ALHSLFOM&dib=eyJ2IjoiMSJ9.YnBVPwJ7nLxlNGHktwDTFM5v2evnsXlnZTJHJKuG8dLeeRCILpy0Knr3ofiKpUGQYi6xR6y4tgdtal85DJ8u6DD_n9r1oVCXdVo0NFmNAfStU6E-MhBig5p_gZGjluAYv5HgUIoEPl0v3iMiRxZNRfivqB-utxOkPOOfXIBHLemry17XcltUDTQqtJv-kP-ZqdP29mjD2cRlbkALtHPKU44MvBC9WUrNcUHAMrlAxtTAByuriywMqz-w2P0HCeehcZTJ1EiLf2VR8cxCiwuaUbIOU3tr1kDN6D7yYPrgRn4.6AOdSmJsksZkqLg8kNM6EvWxIFOijCsP2zo5NLHn1P4&dib_tag=se&keywords=Bicycles&qid=1716973495&sprefix=%2Caps%2C851&sr=8-1-spons&sp_csd=d2lkZ2V0TmFtZT1zcF9hdGY&psc=1"
        },
        {
            "url": "https://www.amazon.in/Watches-Women%EF%BC%8CLadies-Stainless-Waterproof-Luminous/dp/B0D31HBWG1/ref=sr_1_2_sspa?dib=eyJ2IjoiMSJ9.1zFa2vTCZdD-bv6Knt_pWqvcRZPSSTPDwgMClRJNsWqdyGdCmryjEAfWpd-ZhwhC3vvNx9A0G2Gt1R952e7huzlukge2bmJETNf-kHBoWS5kV6g0pUVapEyDOEAGcw5ZvWlkeuLQ9oIwuhckRC6ARCt2yglYV-1HpP7lVGXotK6K6tjrdKxUSAOZJSXeOGP3dGuYPTjo9sllOrwA7FC2GG00aDcsSTzURENFj1c2rS-vNHkYmxOL1JYuwDWK2PJdMpsmkJw3jeMdgaiw7jG5ppMfAjwiETVldQzhHGVUFV8.manfNZwtTUhvDuSGdh32APM1_SmnNiKgOGabyA7rXBo&dib_tag=se&qid=1716973272&rnid=2563505031&s=watch&sr=1-2-spons&sp_csd=d2lkZ2V0TmFtZT1zcF9hdGZfYnJvd3Nl&psc=1"
        },
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Products Global Dataset - Discover by Category URL

特定のカテゴリURLを指定して商品を検出します。

Key Input Parameters:

Parameter	Type	Description	Required
`url`	`string`	商品をスクレイピングする元となるカテゴリURL	Yes
`sort_by`	`string`	結果の並べ替え基準	No
`zipcode`	`string`	ロケーション固有の結果のための郵便番号	No

Performance:

入力あたりの平均レスポンスタイム: 3分57秒

Sample Output Data:

商品データ収集後に受け取る出力例です：

{
    "title": "De'Longhi Stilosa EC230.BK, Traditional Barista Pump Espresso Machine, Espresso and Cappuccino, 2 cups, Black",
    "brand": "De'Longhi",
    "seller_name": "Hughes Electrical",
    "initial_price": 104.99,
    "final_price": 94,
    "currency": "GBP",
    "availability": "Only 1 left in stock.",
    "rating": 3.9,
    "reviews_count": 395,
    "asin": "B085J8LV4F",
    "url": "https://www.amazon.co.uk/dp/B085J8LV4F?th=1&psc=1",
    "image_url": "https://m.media-amazon.com/images/I/715gqhkOEiL._AC_SL1500_.jpg",
    "categories": [
        "Cooking & Dining",
        "Coffee, Tea & Espresso",
        "Coffee Machines",
        "Espresso & Cappuccino Machines"
    ],
    "delivery": [
        "FREE delivery 25 - 28 October",
        "Or fastest delivery Tomorrow, 22 October. Order within 3 hrs 59 mins"
    ],
    "features": [
        "Unleash your inner barista and create all your coffee shop favourites at home",
        "15-bar pump espresso maker with a stainless steel boiler for perfect coffee extraction",
        "Steam arm to create frothy cappuccinos and smooth lattes",
        "Combination of matt and glossy black finish with an anti-drip system"
    ],
    "input": {
        "url": "https://www.amazon.co.uk/DeLonghi-EC230-BK-Traditional-Espresso-Cappuccino/dp/B085J8LV4F/ref=sr_1_4"
    },
    "discovery_input": {
        "url": "https://www.amazon.co.uk/b/?_encoding=UTF8&node=10706951&ref_=Oct_d_odnav_d_13528598031_1",
        "sort_by": "Best Sellers",
        "zipcode": ""
    }
}

Code Example:

以下は、カテゴリURL別の商品収集をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(api_token, dataset_id, datasets, dataset_type="discover_new", discover_by="category_url", limit_per_input=4):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={dataset_id}&type={
        dataset_type}&discover_by={discover_by}&limit_per_input={limit_per_input}"

    # Sending API request to trigger dataset collection
    response = requests.post(
        trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = f"https://api.brightdata.com/datasets/v3/snapshot/{
        snapshot_id}?format=json"

    # Polling until the snapshot data is ready
    while True:
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None
        time.sleep(10)


def store_data(data, filename="amazon_category_url_data.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "API_TOKEN"
    DATASET_ID = "gd_lwhideng15g8jg63s7"

    # Define the dataset with category URLs, sort_by, and zipcodes
    datasets = [
        {"url": "https://www.amazon.com/s?i=luggage-intl-ship",
            "sort_by": "Featured", "zipcode": "10001"},
        {"url": "https://www.amazon.de/-/en/b/?node=1981001031&ref_=Oct_d_odnav_d_355007011_2&pd_rd_w=OjE3S&content-id=amzn1.sym.0069bc39-a323-47d6-a8fb-7558e4a563e4&pf_rd_p=0069bc39-a323-47d6-a8fb-7558e4a563e4&pf_rd_r=6YXZ7HGFNNEAF0GSDPDH&pd_rd_wg=0yR1G&pd_rd_r=a95cb46c-78ef-4b7b-845d-49fe04556440", "sort_by": "Price: Low to High", "zipcode": ""},
        {"url": "https://www.amazon.co.uk/b/?_encoding=UTF8&node=10706951&bbn=11052681&ref_=Oct_d_odnav_d_13528598031_1&pd_rd_w=LghVp&content-id=amzn1.sym.7414f21e-2c95-4394-9a75-8c1b3641bcea&pf_rd_p=7414f21e-2c95-4394-9a75-8c1b3641bcea&pf_rd_r=EE0PQWMSY2J0G8M032EB&pd_rd_wg=7snrU&pd_rd_r=349e1e79-8bf8-4e00-947d-17eab2942b8d", "sort_by": "Best Sellers", "zipcode": ""},
        {"url": "https://www.amazon.co.jp/-/en/b/?node=377403011&ref_=Oct_d_odnav_d_15314601_0&pd_rd_w=ajUV4&content-id=amzn1.sym.0d505cca-fde9-497c-b5f8-e827c26fad17&pf_rd_p=0d505cca-fde9-497c-b5f8-e827c26fad17&pf_rd_r=92HSETNKKN3RTA615BV7&pd_rd_wg=AwOOk&pd_rd_r=629211d8-6768-478c-94a2-829a0a0ca2a6", "sort_by": "", "zipcode": ""}
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Amazon Products Global Dataset - Discover by Keywords

Amazonドメイン全体で、特定のキーワードを使用して商品を検出します。

Key Input Parameters:

Parameter	Type	Description	Required
`keywords`	`string`	商品を検索するキーワード	Yes
`domain`	`string`	検索対象とするAmazonドメイン	Yes
`pages_to_search`	`number`	検索するページ数	No

Performance:

入力あたりの平均レスポンスタイム: 56秒

Sample Output Data:

キーワード検索で商品を検索した後に受け取る出力例です：

{
    "title": "Mitutoyo 500-197-30 Electronic Digital Caliper AOS Absolute Scale Digital Caliper, 0 to 8\"/0 to 200mm Measuring Range, 0.0005\"/0.01mm Resolution",
    "brand": "Mitutoyo",
    "seller_name": "Everly Home & Gift",
    "initial_price": 157.97,
    "final_price": 137.77,
    "currency": "USD",
    "availability": "In Stock",
    "rating": 4.8,
    "reviews_count": 88,
    "asin": "B01N6C3EGR",
    "url": "https://www.amazon.com/dp/B01N6C3EGR?th=1&psc=1",
    "image_url": "https://m.media-amazon.com/images/I/61Gigoh3LbL._SL1500_.jpg",
    "categories": [
        "Industrial & Scientific",
        "Test, Measure & Inspect",
        "Dimensional Measurement",
        "Calipers",
        "Digital Calipers"
    ],
    "delivery": [
        "FREE delivery Saturday, October 26",
        "Or Prime members get FREE delivery Tomorrow, October 22"
    ],
    "features": [
        "Hardened stainless steel construction for protection of caliper components",
        "Digital, single-value readout LCD display in metric units for readability",
        "Measuring Range 0 to 8\"/0 to 200mm",
        "Measurement Accuracy +/-0.001",
        "Resolution 0.0005\"/0.01mm"
    ],
    "input": {
        "url": "https://www.amazon.com/Mitutoyo-500-197-30-Electronic-Measuring-Resolution/dp/B01N6C3EGR"
    },
    "discovery_input": {
        "keywords": "Mitutoyo",
        "domain": "https://www.amazon.com",
        "pages_to_search": 1
    }
}

Code Example:

以下は、キーワード検索による商品収集をトリガーし、結果をJSONファイルに保存するPythonスクリプトです：

import json
import requests
import time


def trigger_datasets(
    api_token, dataset_id, datasets, dataset_type="discover_new", discover_by="keywords"
):
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json",
    }

    trigger_url = f"https://api.brightdata.com/datasets/v3/trigger?dataset_id={
        dataset_id}&type={dataset_type}&discover_by={discover_by}"

    # Sending API request to trigger dataset collection
    response = requests.post(trigger_url, headers=headers, data=json.dumps(datasets))

    if response.status_code == 200:
        print("Data collection triggered successfully!")
        snapshot_id = response.json().get("snapshot_id")
        return snapshot_id if snapshot_id else print("No snapshot ID returned.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def get_snapshot_data(api_token, snapshot_id):
    headers = {"Authorization": f"Bearer {api_token}"}
    snapshot_url = f"https://api.brightdata.com/datasets/v3/snapshot/{
        snapshot_id}?format=json"

    # Polling until the snapshot data is ready
    while True:
        response = requests.get(snapshot_url, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 202:
            print("Snapshot still processing... retrying.")
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None
        time.sleep(10)


def store_data(data, filename="amazon_global_dataset_by_keyword.json"):
    if data:
        with open(filename, "w") as file:
            json.dump(data, file, indent=4)
        print(f"Data saved in {filename}.")
    else:
        print("No data to store.")


if __name__ == "__main__":
    API_TOKEN = "YOUR_API_TOKEN"
    DATASET_ID = "gd_lwhideng15g8jg63s7"

    # Define the dataset with keywords, domain, and pages_to_search
    datasets = [
        {
            "keywords": "Mitutoyo",
            "domain": "https://www.amazon.com",
            "pages_to_search": 1,
        },
        {
            "keywords": "smart watch",
            "domain": "https://www.amazon.co.uk",
            "pages_to_search": 2,
        },
        {
            "keywords": "football",
            "domain": "https://www.amazon.in",
            "pages_to_search": 4,
        },
        {
            "keywords": "baby cloth",
            "domain": "https://www.amazon.de",
            "pages_to_search": 3,
        },
    ]

    # Trigger dataset collection
    snapshot_id = trigger_datasets(API_TOKEN, DATASET_ID, datasets)

    if snapshot_id:
        # Retrieve the data once the snapshot is ready
        data = get_snapshot_data(API_TOKEN, snapshot_id)
        if data:
            store_data(data)

このサンプルJSONファイルをダウンロードして、完全な出力を確認できます。

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
amazon-scraper-api-codes		amazon-scraper-api-codes
amazon_scraper		amazon_scraper
images		images
output_data		output_data
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
scraper_api_setup.md		scraper_api_setup.md

bright-jp/Amazon-scraper

Folders and files

Latest commit

History

Repository files navigation

Amazon Scraper

Table of Contents

Free Amazon Scraper

Prerequisites

Quick Setup

How to Scrape Amazon Data

Command:

Example:

Output

Challenges When Scraping Amazon Data

Solution: Bright Data Amazon Scraper API

Amazon Scraper API in Action

Customize Data Collection with API Parameters

Amazon Product Data

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Reviews Data

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Products Search

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Sellers Info

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Products by Best Sellers

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Products by Category URL

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Products by Keyword

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Products Global Dataset

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Products Global Dataset - Discover by Category URL

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

Amazon Products Global Dataset - Discover by Keywords

Key Input Parameters:

Performance:

Sample Output Data:

Code Example:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages