Skip to content

learnwithparam/mini-perplexity

Repository files navigation

Search-Augmented Generation with Web Scraping

learnwithparam.com

Build your own mini Perplexity. A LangGraph pipeline that searches the web with DuckDuckGo, scrapes the top results with BeautifulSoup, and synthesizes a cited answer with an LLM. All orchestrated inside a reproducible Jupyter notebook.

Start learning at learnwithparam.com. Regional pricing available with discounts of up to 60%.

What You'll Learn

  • Orchestrate a multi-step research pipeline with langgraph.StateGraph
  • Pull fresh web results with DuckDuckGo and extract article content with BeautifulSoup
  • Prompt an LLM to synthesize a cited answer from scraped context
  • Add a refinement pass that tightens the final answer
  • Design a ResearchState TypedDict that carries query, results, context, and answers across nodes

Tech Stack

  • Python 3.10+ with uv for dependency management
  • LangGraph + LangChain for pipeline orchestration
  • duckduckgo-search for free, key-less web search
  • BeautifulSoup + requests for content extraction
  • OpenAI via langchain-openai for synthesis and refinement
  • Jupyter for interactive exploration

Getting Started

Prerequisites

  • Python 3.10+
  • uv (installed automatically by make setup)
  • An OpenAI API key

Quick Start

# One command to set up and launch the notebook
make dev

# Or step by step:
make setup          # Create .env and install dependencies
# Edit .env with your OPENAI_API_KEY
make notebook       # Launch Jupyter

Open mini-perplexity.ipynb and run the cells top to bottom.

With Docker

make build
make up             # Notebook at http://localhost:8888
make logs
make down

Challenges

Work through these incrementally to build the pipeline:

  1. Search Node - Wrap DuckDuckGo search and return the top N results
  2. Content Extraction - Fetch each URL and pull the main article text with BeautifulSoup
  3. State Design - Define a ResearchState TypedDict the graph can pass between nodes
  4. Synthesis Node - Prompt an LLM with the query + scraped context to produce a cited answer
  5. Refinement Node - Add a second LLM pass that tightens and formats the answer
  6. Graph Wiring - Compose the nodes into a LangGraph StateGraph with explicit edges
  7. Citations - Return numbered source citations the user can follow

Makefile Targets

make help           Show all available commands
make setup          Initial setup (create .env, install deps)
make dev            Setup and launch notebook (one command!)
make notebook       Launch the Jupyter notebook
make build          Build Docker image
make up             Start notebook container
make down           Stop container
make clean          Remove venv and caches

Learn more

About

Search-Augmented Generation with Web Scraping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors