This Jupyter Notebook uses the Google Custom Search API (Programmable Search Engine) to discover Instagram accounts related to nail artists in different cities.
It queries Google with prompts like:
site:instagram.com HAIRSTYLE KOREA
…and extracts username, followers (when available), and bio (from snippet text) from the search results.
- Query Google for Instagram profiles by location.
- Extract basic profile info from SERP snippets (no profile clicks needed).
- Filter accounts by minimum follower count (configurable).
- Load lists of cities from a JSON file, so non-technical collaborators can edit cities easily.
- Export results to CSV/JSON.
├── .env # Environment variables (API keys, configs)
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── artist_list.ipynb # Main Jupyter notebook
│
├── data/ # Data folder (for raw/processed data)
│ └── models/ # Trained/stored models
│
├── queries/ # JSON configuration files for building queries
│ ├── cities.json # Cities (by continent) + top cities
│ ├── keywords.json # Keywords
│ ├── patterns.json # Regex patterns for parsing snippets
│ ├── sites.json # Sites/domain
│ └── views.json # Other filtering ways
│
└── Out/ # Output folder (results, exports)- Open the project in PyCharm Pro.
- Create a new virtual environment (Python 3.11+).
- Install dependencies:
pip install -r requirements.txt
- Create a Programmable Search Engine.
- Enable Search the entire web and bias toward instagram.com/*.
- Copy the Search engine ID (CX_ID).
- In Google Cloud Console: • Create an API key. • Enable the Custom Search API. • (Optional but recommended) Restrict the key to Custom Search API.
- Copy .env.example to .env and fill in:
GOOGLE_API_KEY=your_api_key
GOOGLE_CX=your_search_engine_id
PAGES=num_of_pages
FOLLOWER_MIN=follower_restrictions
# add more filter criterias here- All cities are stored in cities.json.
- It has two sections:
- all_major_cities: grouped by continent.
- top_major_cities: smaller set of priority cities.
- Example:
{
"all_cities": {
"Asia": ["Tokyo", "Seoul", "Shanghai"],
"Europe": ["London", "Paris"]
},
"top_cities": ["Tokyo", "London"]
}- Non-technical collaborators can safely edit this file without touching Python code.
- Open the notebook:
notebooks/01_search_instagram_serp.ipynb - Run through the cells:
- Load .env keys.
- Fetch search results from Google.
- Parse usernames, bios, and followers.
- (Optional) Loop through cities.json for multiple queries.
- Filter results (e.g., accounts with ≥2000 followers).
- Export to out/filtered.csv and out/filtered.json.
- Example code (loading cities):
import json
with open("queries/cities.json", "r", encoding="utf-8") as f:
city_data = json.load(f)
# Flatten all continents into one list
all_cities = [city for cities in city_data["all_cities"].values() for city in cities]
# Or just use the priority list
top_cities = city_data["top_cities"]- Be mindful of API quotas: free tier allows 100 queries/day.
- Each query returns up to 10 results. To fetch multiple pages, set PAGES=2 or higher in .env or code.
- Google snippets don’t always contain follower counts. Some rows may have None.