GTM Scout

Automated outreach pipeline that finds engineers using a competitor's technology and generates personalised LinkedIn messages using Claude. Configure any hunter company, competitor, and technology via CLI flags — no code changes needed.

A frontend dashboard is planned to allow simple configuration.

How it works

Company discovery — finds companies known to use the target technology via StackShare API + GitHub org search (Mock mode uses a hardcoded list of target companies)
Lead discovery — finds engineers at those companies in relevant roles via LinkedIn/Proxycurl (Mock mode uses a hardcoded lead list)
Filter + score — drops non-technical roles and scores leads 0–100 based on seniority, technology signal, and engagement. Uses deterministic filtering with configurable weights
AI personalisation — generates a LinkedIn connection note (≤300 chars) and a follow-up email for each qualified lead using Claude. The LLM automatically figures out the pitch angle based on the hunter and competitor context
Dry-run trigger — logs what would have been sent to console and output/dry_run_log.jsonl
Report — saves all leads and messages to a SQLite database and exports a CSV + JSON report to output/

Quickstart

# 1. Clone the repo.
git clone https://github.com/<YOUR_USERNAME>/gtm-hunter.git
cd gtm-hunter

# 2. Create and activate a virtual environment.
python3 -m venv .venv && source .venv/bin/activate

# 3. Install dependencies.
pip install -r requirements.txt

# 4. Add your Anthropic API key to .env.
cp .env.example .env
# edit .env and set ANTHROPIC_API_KEY.

# 5. Run the pipeline in mock mode (no other API keys needed).
python main.py run --mock \
  --hunter-company "<HUNTER_COMPANY>" \
  --competitor-company "<COMPETITOR_COMPANY>" \
  --competitor-technology "<COMPETITOR_TECHNOLOGY>"

Campaign configuration

The pipeline is fully configurable via CLI flags. You can target any competitor without touching the code:

python main.py run --mock

# Target a different competitor
python main.py run --mock \
  --hunter-company "<HUNTER_COMPANY>" \
  --competitor-company "<COMPETITOR_COMPANY>" \
  --competitor-technology "<COMPETITOR_TECHNOLOGY>"

Flag	Required	Default	Description
`--hunter-company`	yes	—	Your company — who the messages are sent from
`--competitor-company`	yes	—	The competitor whose users you're targeting
`--competitor-technology`	yes	—	The technology the competitor is known for
`--mock`	no	off	Use mock data instead of real APIs
`--dry-run` / `--live`	no	`--dry-run`	Log messages vs. send them
`--max-companies`	no	`10`	How many target companies to process
`--max-leads`	no	set in `.env`	Max leads to process across all companies

Output

After a run you'll find:

output/
├── dry_run_log.jsonl       ← one JSON line per lead with both messages
├── report_{id}.csv         ← leads + messages in spreadsheet format
└── report_{id}.json        ← same data as JSON

The SQLite database gtm_hunter.db stores all leads, messages, and run reports.

The messages that were generated in the testing run: view here

Live mode

To run against real APIs, fill in the optional keys in .env and drop the --mock flag:

python main.py run

Key	Used for
`STACKSHARE_API_KEY`	Find companies using the target technology via StackShare
`GITHUB_TOKEN`	Find companies via GitHub org search (higher rate limits)
`PROXYCURL_API_KEY`	Find engineers at target companies via LinkedIn

Note

Live mode has not been end-to-end tested. Some APIs (StackShare, Proxycurl) require paid access or manual approval to obtain keys. The mock mode fully demonstrates the pipeline logic and is the intended way to run this prototype.

Tech stack

Why Python and not a no-code tool like n8n? n8n is great for wiring together pre-built integrations, but this pipeline needs custom scoring logic, prompt engineering, structured DB writes, and the flexibility to swap sources or extend behaviour without fighting a GUI. Python gives full control over every step — the scoring weights, the prompt templates, the retry logic, the output format — with no black-box nodes in between. It's also easier for an engineering team to review, version, and test.

Why LLM and not an agent? The pipeline has a clear, fixed workflow — filter, score, generate, log. There's no need for autonomous decision-making or tool use (AI Agent). A plain LLM call at the personalisation step is more predictable, cheaper, and easier to debug.

Tool	Why
Anthropic Claude	Generates personalised LinkedIn invites and follow-up emails
SQLAlchemy	ORM for storing leads, messages, and reports
SQLite	Simple file-based DB, good for this demo. For production, we would swap SQLite with something like postgres
Pydantic	Data validation and typed models throughout the pipeline
httpx + tenacity	Async HTTP calls with automatic retries for external APIs
Click	CLI interface

Note

Scoring weights are in scoring_weights.json — adjust them to change how leads are ranked without touching code. Pipeline behaviour (threshold, max leads, dry run) is controlled via .env.

Limitations

The pipeline relies on the quality of data from external APIs (StackShare, GitHub, LinkedIn). Inaccurate or incomplete data may lead to missed leads or false positives.
This is a prototype and has not been tested at scale. Performance may degrade with large datasets or API rate limits.
The project doesn't include safety checks for the generated messages. In a production system, we would want to implement filters to ensure messages are appropriate and won't harm our brand reputation.
Logs: The dry-run logs are currently stored in a JSONL file, which is simple but not ideal for querying or analysis. In a production system, we would want to implement a more robust logging solution, potentially with structured logging to a service like Datadog.

Future improvements

The project was built with JSON for the output as well as the scoring weights, giving us the flexibility to add a frontend dashboard in the future where we can adjust the scoring weights and thresholds without having to change the code.
Implement a feedback loop to flag leads where message generation fails, allowing for prompt tweaks and retries.
Add more data sources for company and lead discovery to improve coverage.
Implement a more sophisticated scoring algorithm, potentially using LLMs to predict lead quality based on their profile and activity after the deterministic scoring step.
Add a cron job to run the pipeline on a regular basis and update the database and reports.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
db		db
pipeline		pipeline
prompts		prompts
results		results
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
scoring_weights.json		scoring_weights.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GTM Scout

How it works

Quickstart

Campaign configuration

Output

Live mode

Tech stack

Limitations

Future improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GTM Scout

How it works

Quickstart

Campaign configuration

Output

Live mode

Tech stack

Limitations

Future improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages