Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions documentation/Project Architecture_ Discourse Universe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# **Project Architecture: Discourse Universe**

**A 3D Gravitational Visualization of Public Sentiment**

## **1\. Executive Summary & Concept**

**Discourse Universe** is a 3D data visualization tool that maps abstract, chaotic human discourse from social media into an intuitive, physical solar system.

* **The Planets (Macro-View):** The top 10 most discussed topics are represented as planets orbiting a central sun (the \#1 most discussed topic). Planet size correlates to the volume of mentions.
* **The Cubes (Micro-View):** Instead of spheres, each planet is a 3D cube. The 6 faces of the cube represent the 6 dominant perspectives, themes, or sentiments driving that specific topic, summarized into human-readable labels by an LLM.

## **2\. Motivation & Societal Value**

While standard sentiment dashboards (bar charts, line graphs) are built for data analysts, Discourse Universe acts as an "empathy machine" built for the general public. It solves two major issues with modern media consumption:

1. **Correcting the Distortion of Scale:** Social algorithms often make fringe outrage seem like the most important issue in the world. By utilizing a gravitational physics model, users intuitively grasp scale. A manufactured culture-war asteroid is visually dwarfed by a massive gas-giant representing housing costs.
2. **Forcing Nuance:** By requiring every topic to be viewed through 6 distinct faces (e.g., Financial, Ethical, Skeptical), the UI forces users to confront the reality that issues are multi-dimensional, breaking binary "For vs. Against" echo chambers.

## **3\. System Architecture: The 7-Day Rolling Window**

To prevent semantic drift (where topics change so fast the visualization becomes chaotic) and to eliminate massive API/compute costs, the system uses a **Daily Refresh with a 7-Day Rolling Window**.

* **Pacing:** The backend pipeline runs only once every 24 hours.
* **Data Scope:** It processes the last 168 hours (7 days) of data, providing deep semantic stability.
* **Output:** It generates a single, static data.json file that dictates the entire state of the 3D solar system for the next 24 hours.

## **4\. The AI / NLP Pipeline (Two-Phase Processing)**

The core challenge is categorizing 100,000+ posts into clean topics and labeling them without spending hundreds of dollars on LLM tokens. The solution is separating the sorting from the labeling.

### **Phase 1: Traditional NLP (The Sorter)**

* **Technology:** BERTopic utilizing HDBSCAN (Hierarchical Density-Based Spatial Clustering).
* **Why HDBSCAN?** Instead of forcing 100,000 posts into 10 buckets (which creates garbage clusters), HDBSCAN finds dense conversational neighborhoods and throws the rest into an "Outlier/Noise" bucket (Topic \-1).
* **The Workflow:** We embed the text, cluster it, discard the noise, and extract only the top 10 largest remaining clusters (The Planets). We run a secondary clustering pass on each planet to find its 6 largest sub-clusters (The Faces).

### **Phase 2: The LLM (The Explainer)**

* **Technology:** GPT-4o-mini (or Anthropic Haiku) via API.
* **The Workflow:** We take the top 50 most representative posts from each of the 6 faces across all 10 planets (60 groups total). We prompt the LLM: *"Read these 50 posts. Output a JSON object with a 2-word title and a 1-sentence summary of the core perspective."*
* **Efficiency:** By only passing the most representative posts of the 60 pre-sorted clusters, we only make 60 small LLM calls per day.

## **5\. Frontend & 3D Visualization**

* **Technology:** React, Three.js (via React Three Fiber).
* **Behavior:** The frontend is entirely static. It fetches data.json on load.
* **Interactivity:** Users can watch the orbits, hover over planets for macro-stats, and click a planet to lock the camera. Once locked, the user can click-and-drag to rotate the cube and read the 6 LLM-generated summaries mapped to the faces as textures. Filters (e.g., "Politics", "Sports") instantly reload the 3D scene using pre-calculated category data from the JSON file.

## **6\. Hosting & Cost Breakdown**

Because the heavy lifting is completely isolated to a daily background job, the hosting architecture is practically free.

| Component | Tech Stack | Estimated Monthly Cost |
| :---- | :---- | :---- |
| **Data Ingestion** | Reddit API / Bluesky AT Protocol | $0 (Free tiers) |
| **Pipeline Runner** | GitHub Actions (Cron Job) | $0 (Free tier) |
| **LLM Summarization** | OpenAI (GPT-4o-mini) | \~$0.20 ($0.003/day) |
| **Frontend Hosting** | Vercel / Netlify | $0 (Static hosting) |
| **Total** | | **Under $1.00 / month** |

## **7\. Critiques & Limitations**

To maintain analytical integrity, the project must acknowledge the following blind spots:

* **Demographic Bias:** Reddit and Bluesky do not represent the global population. They skew younger, more male (Reddit), and highly Western/tech-centric. This visualizes the *Internet's* discourse, not humanity's.
* **Margin Flattening:** When an LLM summarizes a cluster of 500 posts into one sentence, it naturally prioritizes the loudest, most consensus-driven voice in that group. Highly nuanced or minority opinions within that sub-cluster will be washed out.

## **8\. Development Roadmap (Cursor MVP)**

* **Stage 1: Data Ingestion (5-8 hrs):** Write Python scripts to authenticate with Reddit/Bluesky APIs, pull 7 days of data, clean text (regex/spam filtering), and store in SQLite.
* **Stage 2: The NLP Engine (15-20 hrs):** Implement BERTopic and HDBSCAN. Tune hyperparameters (like min\_cluster\_size) until the 10 topics and 6 sub-themes make logical sense on raw text.
* **Stage 3: LLM Integration & Automation (3-5 hrs):** Write the prompt to force JSON extraction from the LLM. Format the final output to data.json. Wrap the script in a GitHub Actions YAML file to run daily.
* **Stage 4: 3D Web Frontend (10-15 hrs):** Initialize a React Three Fiber project. Build the orbital physics, map the JSON data to cube dimensions and textures, and implement the click-to-rotate interaction mechanics.
84 changes: 84 additions & 0 deletions documentation/Technical Implementation Canvas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# **Technical Implementation Canvas: Perspectiverse**

**Core Philosophy:** Lightweight, local-first data processing, outputting to a static, serverless frontend.

## **1\. Environment & Architecture**

* **IDE:** Cursor
* **Version Control:** Git / GitHub
* **Python Manager:** uv (for ultra-fast virtual environments and dependency management)
* **LLM Engine:** Ollama (running locally to avoid all API costs and rate limits)
* **Frontend:** Vite \+ React \+ React Three Fiber (compiles to static HTML/JS for GitHub Pages)

### **Directory Structure**

perspectiverse/
├── docs/ \# Markdown documentation
├── pipeline/ \# Python backend code
│ ├── data/ \# Local storage (SQLite / raw JSONs)
│ ├── run\_pipeline.py \# Main orchestrator script
│ └── requirements.txt \# Managed via uv
├── public/ \# Static web assets
│ ├── index.html
│ └── data.json \# THE BRIDGE: Output of pipeline, input for frontend
└── src/ \# React / Three.js frontend code

## **2\. Stage-by-Stage Implementation**

### **Stage 1: Local Setup & Tooling (1-2 Hours)**

1. **Initialize Git:** git init
2. **Setup Python:** uv venv and activate it.
3. **Install Core Python Libs:** uv pip install atproto pandas bertopic ollama
4. **Install Local AI:** Download Ollama to your OS, and pull a fast, lightweight model suitable for JSON extraction: ollama run llama3 (or phi3 for even faster local processing).
5. **Setup Frontend:** Run npm create vite@latest . \--template react in the root (or a subfolder), and install Three.js: npm install three @react-three/fiber @react-three/drei.

### **Stage 2: Bluesky Data Ingestion (3-5 Hours)**

* **Goal:** Pull a random sample of English posts from the last 7 days.
* **The Script:** Write a Python script using atproto.
* **The Logic:**
1. Authenticate with an App Password.
2. Query the Bluesky search API for posts between \[Date \- 7 days\] and \[Today\].
3. Fetch a large batch (e.g., 20,000 posts).
4. Use Python's random.sample() to randomly select 10,000 posts to ensure a diverse, non-chronological slice of the internet.
5. Clean text (remove URLs, handles) and save to a local posts.csv or SQLite.

### **Stage 3: The Sorter (Traditional NLP) (5-8 Hours)**

* **Goal:** Find the 10 Planets and 60 Faces using BERTopic.
* **The Script:**
1. Load the 10,000 cleaned posts.
2. Embed and cluster using BERTopic (which uses all-MiniLM-L6-v2 locally by default—very fast).
3. Filter out Topic \-1 (the noise).
4. Isolate the top 10 largest remaining clusters (Planets). Save their sizes (Volume).
5. For each of the 10 Planets, take its posts and run a K-Means cluster (n\_clusters=6) to find the 6 Faces.
6. Extract the top 20 most representative posts for each of the 60 Faces.

### **Stage 4: The Explainer (Local LLM via Ollama) (4-6 Hours)**

* **Goal:** Generate the human-readable labels for the cubes.
* **The Script:**
1. Use the official ollama Python library.
2. Loop through the 60 Faces. For each, send the 20 representative posts to your local Llama-3 model.
3. **The Prompt:** *"Read these posts. Output ONLY a valid JSON object with two keys: 'title' (max 3 words) and 'summary' (max 1 sentence)."*
4. Parse the JSON response.
5. **Compile final output:** Assemble the planetary sizes, cluster coordinates, and the 60 LLM labels into a single data.json file. Save this to the /public folder.

### **Stage 5: The 3D Frontend (10-15 Hours)**

* **Goal:** Visualize data.json in the browser.
* **The Logic:**
1. **The Sun:** Calculate the biggest volume and place it at \[0,0,0\].
2. **The Planets:** Map the remaining 9 topics to orbital paths around the sun based on size.
3. **The Cubes:** Render each planet as a BoxGeometry.
4. **The Textures:** Use React Three Fiber's HTML overlays or Canvas textures to map the LLM JSON summaries to the 6 faces of each respective cube.
5. **Interaction:** Add an onClick event to lock the camera to a cube, and onDrag to rotate it.

### **Stage 6: Deployment (1-2 Hours)**

* **Backend:** Remains completely local. Whenever you want to update the data, you run python pipeline/run\_pipeline.py on your laptop. It overwrites public/data.json.
* **Frontend:**
1. Commit the updated data.json and the frontend code to Git.
2. Configure GitHub Pages in your repo settings to deploy from your main branch.
3. GitHub Actions automatically builds the Vite/React app and publishes it to a live URL for the world to see.
109 changes: 109 additions & 0 deletions documentation/UI & 3D Implementation Canvas_ Perspectiverse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# **UI & 3D Implementation Canvas: Perspectiverse**

**Core UI Philosophy:** A split-screen analytical experience. The left 2/3 of the screen is an immersive, passive 3D observatory. The right 1/3 is an active, data-rich analytical sidebar that reacts to 3D interactions.

## **1\. The Visual Mechanics**

### **The 3D Canvas (Left 2/3)**

* **The Sun:** The \#1 topic sits at \[0,0,0\].
* **The Planets:** Topics 2-10 orbit the center at varying distances.
* **The Labels:** Floating 3D text (or screen-space HTML) sits slightly above each planet, showing only the 2-word topic name (e.g., "AI Regulation").
* **The Animation:** All planets orbit the center continuously. Simultaneously, each planet rotates on its own X/Y/Z axes so the user can see all 6 faces spinning.
* **The Geometry (The Spiky Cubes):**
* A standard cube has flat faces. We will modify this so each of the 6 faces acts like the base of a 4-sided pyramid.
* The height (extrusion) of that pyramid's peak is driven directly by the percentage of conversation volume for that perspective.
* *Result:* A perspective with 50% volume creates a massive spike on one side, while the remaining 5 perspectives (10% each) have tiny bumps.

### **The Analytical Sidebar (Right 1/3)**

* **Default State (No Selection):** Welcome text, project philosophy, instructions on how to navigate the 3D space, and high-level global stats (total posts analyzed this week).
* **Topic Selected State (Clicking a Planet):**
* The 3D camera smoothly flies to and locks onto the selected planet.
* The sidebar updates: Shows an isolated, spinning render of the spiky cube at the top.
* Below it: A list of the 6 perspectives with their full titles, 1-sentence summaries, and a horizontal bar chart showing their volume %.
* **Perspective Selected State (Clicking a Bar Chart / Face):**
* The sidebar drills down further.
* It displays the specific perspective details.
* Below that: A scrollable feed of the actual representative Bluesky posts that formed this cluster, sorted by like count descending.

## **2\. Technical Stack Implications**

To pull this off without writing thousands of lines of complex WebGL code, we use the React ecosystem.

* **UI Framework:** React (via Vite).
* **Styling:** Tailwind CSS (perfect for rapidly building the sidebar and horizontal bar charts).
* **3D Engine:** Three.js wrapped in @react-three/fiber (R3F). R3F allows you to build 3D scenes using React components.
* **3D Helpers:** @react-three/drei. This is a library of pre-built R3F tools. We will use it for:
* \<OrbitControls\> (Camera movement).
* \<Html\> (Pinning DOM text labels to 3D coordinates).
* \<CameraControls\> (For smooth flying/zooming when a planet is clicked).
* **State Management:** React useState / useContext (To pass the "selected ID" from the 3D canvas to the Sidebar).

### **Changes Needed to Your Backend data.json:**

To power this UI, your Python pipeline must output a very specific JSON structure. It needs the aggregate data *and* the raw posts.

{
"last\_updated": "2026-06-04",
"total\_posts": 100000,
"topics": \[
{
"id": 1,
"name": "Artificial Intelligence",
"total\_volume\_percent": 35.5,
"perspectives": \[
{
"id": "1A",
"title": "Job Replacement Fear",
"summary": "Users are heavily anxious about recent layoffs attributed to automation.",
"volume\_percent": 45.0,
"representative\_posts": \[
{"author": "user1.bsky", "text": "Just lost my copywriting gig to an LLM...", "likes": 402},
{"author": "user2.bsky", "text": "The tech bros don't care about the working class.", "likes": 150}
\]
}
// ... 5 more perspectives
\]
}
\]
}

## **3\. Stages of Buildout (From Simple to Complex)**

### **Phase 1: The 2D React Shell & State (2-4 hours)**

* **Goal:** Build the layout without worrying about 3D math yet.
* **Tasks:**
1. Set up Vite \+ React \+ Tailwind.
2. Create a CSS Grid or Flexbox layout: Left div (70% width), Right div (30% width).
3. Load dummy data.json into a React state variable.
4. Build the Sidebar components that toggle based on a selectedTopic state variable. Ensure the horizontal bar charts map correctly to the data.

### **Phase 2: The Basic 3D Solar System (4-6 hours)**

* **Goal:** Get flat cubes orbiting on the screen.
* **Tasks:**
1. Install three, @react-three/fiber, and @react-three/drei.
2. Map through your topics array and render a standard \<mesh\> with a \<boxGeometry\> for each.
3. Use useFrame to calculate circular orbits around the center based on time and distance.
4. Use Drei's \<Html\> component to attach floating labels above each cube.
5. Add onClick handlers to the meshes that update the selectedTopic React state, verifying that clicking a cube changes the sidebar.

### **Phase 3: Building the "Spiky Cube" Geometry (5-8 hours)**

* **Goal:** Translate percentage data into physical shape.
* **Tasks:**
* *The Technical Solution:* Don't try to mathematically distort a standard box. Instead, create a custom React component called \<SpikyCube\>.
* Inside \<SpikyCube\>, render a small central BoxGeometry.
* Attach 6 ConeGeometry meshes (with 4 radial segments, making them square pyramids) to the 6 faces of the central box.
* Map the volume\_percent of your 6 perspectives to the scale-y (height) of those 6 respective pyramids.
* Apply a useFrame rotation to the whole group so the lopsided star/cube tumbles through space.

### **Phase 4: Camera Animations & Polish (3-5 hours)**

* **Goal:** Make the experience feel cinematic.
* **Tasks:**
1. When a user clicks a planet, use Drei's \<CameraControls\> to interpolate (fly) the camera from its current position to a close-up offset of the selected planet.
2. Blur or dim the unselected planets.
3. Ensure that clicking a perspective in the sidebar highlights the corresponding "spike" on the 3D model (e.g., changing its color to an emissive neon).
Loading