AI Domain Discovery

Automated pipeline that discovers newly registered .ai domains, validates whether they are real products vs parked domains, scores launch quality, and serves results through an API/dashboard.

What I Built

Multi-stage backend pipeline: discovery -> validation -> scoring -> persistence.
FastAPI layer for query/reporting workflows.
Scheduler-driven runs with retry-aware behavior.
Security-hardened docs/config flow (no secrets in repo).

Why This Exists

Certificate logs are noisy. Most domains are not actionable startups. The project focuses on reducing false positives quickly so investigation time goes to likely launches.

Architecture

Discovery: pulls candidate domains from CT and related sources.
Validation: checks DNS/HTTP/SSL + basic content and parked/for-sale heuristics.
Scoring: weighted quality signal model with guardrails.
Orchestration: scheduled and manual runs.
Serving: REST endpoints + lightweight dashboard assets.

Key Tradeoffs

Precision over recall in first-pass scoring: High early precision prevents analyst fatigue; tradeoff is missing some weak-signal startups.
Explicit pipeline stages over one monolith: Improves debugging and replacement of components; tradeoff is more wiring/config surface.
Deterministic rule scaffolding before heavy LLM usage: Controls cost and makes decisions auditable; tradeoff is slower adaptation to edge patterns.

Run

Docker

cd docker
docker-compose up -d

Local backend

cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp ../.env.example .env
python3 main.py

Test

Prerequisites: Python 3.11+, backend dependencies installed from backend/requirements.txt.

cd backend
python3 -m pytest tests -q

Troubleshoot

API won’t start: verify required env vars in .env and dependency install completed.
No discoveries: run manual discovery and inspect logs from scheduler/discovery services.
Low-quality output: inspect validation and scoring thresholds before broadening rules.

Interview Talking Points

How I balanced false-positive reduction vs missing early startups.
Why I kept scoring explainable instead of opaque.
One failure mode (source noise/timeouts) and how I mitigated it.
What I would change next (better source weighting + richer observability).

Related Docs

DECISIONS.md
BUILD_LOG.md
KNOWN_LIMITATIONS.md
DEMO.md
SECURITY.md

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
backend		backend
docker		docker
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
ARCHITECTURE.md		ARCHITECTURE.md
BUILD_LOG.md		BUILD_LOG.md
COLLABORATOR_HISTORY_RESET.md		COLLABORATOR_HISTORY_RESET.md
DECISIONS.md		DECISIONS.md
DEMO.md		DEMO.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
INVESTMENT_ANALYSIS.txt		INVESTMENT_ANALYSIS.txt
KNOWN_LIMITATIONS.md		KNOWN_LIMITATIONS.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SECURITY.md		SECURITY.md
SECURITY_REMEDIATION_MATRIX.md		SECURITY_REMEDIATION_MATRIX.md
SYSTEM_PRESENTATION.txt		SYSTEM_PRESENTATION.txt
SYSTEM_STATE.md		SYSTEM_STATE.md
deploy-docker.sh		deploy-docker.sh
deploy-manual.sh		deploy-manual.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Domain Discovery

What I Built

Why This Exists

Architecture

Key Tradeoffs

Run

Docker

Local backend

Test

Troubleshoot

Interview Talking Points

Related Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Domain Discovery

What I Built

Why This Exists

Architecture

Key Tradeoffs

Run

Docker

Local backend

Test

Troubleshoot

Interview Talking Points

Related Docs

About

Topics

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages