Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/security-audit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ jobs:
run: |
echo "📋 Scanning Python code for security issues..."
source .venv/bin/activate
bandit -r data_engineering/ -f json -o bandit-report.json || true
bandit -r pipeline/ -f json -o bandit-report.json || true

- name: Dependency vulnerability scan (pip-audit)
continue-on-error: true
Expand Down
49 changes: 30 additions & 19 deletions FILES.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,30 +10,41 @@ This document provides a complete listing of all files in the Perspectiverse rep
| `FILES.md` | This file - complete file listing and organization guide |
| `LICENSE` | Project license |
| `pyproject.toml` | Python project configuration, dependencies, and build system |
| `.gitignore` | Git ignore rules for Python, data files, logs, and IDE files |
| `package.json` | Frontend dependencies and scripts |
| `.gitignore` | Git ignore rules |
| `uv.lock` | Lock file for uv package manager (generated) |

## Scripts
## Pipeline (`pipeline/`)

| File | Description |
|------|-------------|
| `scripts/security_check.sh` | Security scanning script for local development (Bandit + pip-audit) |
| File/Directory | Description |
|----------------|-------------|
| `pipeline/run_pipeline.py` | Main orchestrator script for data ingestion and processing |
| `pipeline/data/` | Local storage for SQLite databases and raw JSONs |
| `pipeline/config/` | Pipeline configuration files |
| `pipeline/data_sources/` | Source-specific extraction scripts (e.g., Bluesky) |

## GitHub Configuration (`.github/`)
| File | Description |
|------|-------------|
| `workflows/security-audit.yml` | Automated security scanning workflow (Bandit + pip-audit) |
## Frontend (`src/` & `public/`)

## Data Engineering Files (`data_engineering/`)
| File/Directory | Description |
|----------------|-------------|
| `src/` | React components and Three.js visualization code |
| `public/` | Static assets |
| `public/data.json` | The bridge: Output of pipeline, input for frontend |
| `index.html` | Entry point for the web application |
| `tailwind.config.js` | Tailwind CSS configuration |
| `postcss.config.js` | PostCSS configuration |

| Directory | Description |
|-----------|-------------|
| `config/` | Central configuration files |
| `scripts/` | Main pipeline orchestrators and scripts |
| `data_sources/` | Extraction and processing scripts for various data sources |
## Documentation (`project_documentation/`)

## Final Output (`data_final/`)
| File | Description |
|------|-------------|
| `Project Architecture_ Discourse Universe.md` | Conceptual overview and high-level architecture |
| `Technical Implementation Canvas.md` | Technical details and implementation stages |
| `UI & 3D Implementation Canvas_ Perspectiverse.md` | Frontend design and 3D visualization details |

## Scripts & CI/CD

| Directory | Description |
|-----------|-------------|
| `data_final/` | Final processed Markdown files ready for use |
| File | Description |
|------|-------------|
| `scripts/security_check.sh` | Security scanning script for local development |
| `.github/workflows/security-audit.yml` | Automated security scanning workflow |
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
Live mapping the universe of human attention and perspectives

## 🎯 Project Overview
Perspectiverse aims to map the universe of human attention and perspectives through a reproducible data engineering pipeline.
Perspectiverse aims to map the universe of human attention and perspectives through a reproducible data engineering pipeline and an immersive 3D visualization.

## 🚀 Quick Start

### Prerequisites
- Python 3.10+
- Node.js & npm
- [uv](https://github.com/astral-sh/uv) (fast Python package manager)
- [Ollama](https://ollama.ai/) (for local LLM processing)

### Installation

Expand All @@ -18,28 +20,26 @@ Perspectiverse aims to map the universe of human attention and perspectives thro
cd perspectiverse
```

2. **Set up the virtual environment:**
2. **Set up the Python environment:**
```bash
uv venv
```

3. **Install dependencies:**
```bash
uv sync
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```

4. **Activate the environment:**
3. **Set up the Frontend:**
```bash
source .venv/bin/activate # On Windows: .venv\Scripts\activate
npm install
```

## 📁 Project Structure
The project follows an intuitive structure inspired by [the_depositum](https://github.com/Data-Science-Link/the_depositum):

- `data_engineering/`: Contains all technical components for data extraction and transformation.
- `data_final/`: Contains the final output files (optimized for AI tools).
- `scripts/`: Useful utility scripts, including security checks.
- `.github/workflows/`: Automated CI/CD pipelines, including security audits.
- `pipeline/`: Python backend code for data ingestion and NLP processing.
- `data/`: Local storage for raw and intermediate data.
- `run_pipeline.py`: Main orchestrator script.
- `public/`: Static web assets, including the generated `data.json`.
- `src/`: React and React Three Fiber frontend source code.
- `project_documentation/`: High-level design and implementation documents.
- `scripts/`: Utility scripts for security and maintenance.

For a detailed file listing, see [FILES.md](FILES.md).

Expand Down
Empty file removed data_final/.gitkeep
Empty file.
8 changes: 0 additions & 8 deletions data_final/README.md

This file was deleted.

13 changes: 13 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Perspectiverse | Discourse Universe</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.jsx"></script>
</body>
</html>
32 changes: 32 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"name": "perspectiverse-frontend",
"private": true,
"version": "0.1.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "vite build",
"lint": "eslint . --ext js,jsx --report-unused-disable-directives --max-warnings 0",
"preview": "vite preview"
},
"dependencies": {
"@react-three/drei": "^9.0.0",
"@react-three/fiber": "^8.0.0",
"react": "^18.2.0",
"react-dom": "^18.2.0",
"three": "^0.160.0"
},
"devDependencies": {
"@types/react": "^18.2.0",
"@types/react-dom": "^18.2.0",
"@vitejs/plugin-react": "^4.2.0",
"autoprefixer": "^10.4.17",
"eslint": "^8.56.0",
"eslint-plugin-react": "^7.33.2",
"eslint-plugin-react-hooks": "^4.6.0",
"eslint-plugin-react-refresh": "^0.4.5",
"postcss": "^8.4.35",
"tailwindcss": "^3.4.1",
"vite": "^5.1.0"
}
}
File renamed without changes.
1 change: 1 addition & 0 deletions pipeline/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Pipeline package
File renamed without changes.
File renamed without changes.
20 changes: 20 additions & 0 deletions pipeline/run_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import os
import json
import random
import pandas as pd
from atproto import Client
# import ollama
# from bertopic import BERTopic

def main():
print("🚀 Starting Perspectiverse Pipeline...")

# 1. Data Ingestion (Bluesky)
# 2. Traditional NLP (BERTopic)
# 3. LLM Summarization (Ollama)
# 4. Final Output (data.json)

print("✅ Pipeline complete. data.json updated in public/")

if __name__ == "__main__":
main()
File renamed without changes.
6 changes: 6 additions & 0 deletions postcss.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
export default {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
}
23 changes: 23 additions & 0 deletions public/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"last_updated": "2024-06-04",
"total_posts": 10000,
"topics": [
{
"id": 1,
"name": "Artificial Intelligence",
"total_volume_percent": 35.5,
"perspectives": [
{
"id": "1A",
"title": "Job Replacement Fear",
"summary": "Users are heavily anxious about recent layoffs attributed to automation.",
"volume_percent": 45.0,
"representative_posts": [
{"author": "user1.bsky", "text": "Just lost my copywriting gig to an LLM...", "likes": 402},
{"author": "user2.bsky", "text": "The tech bros don't care about the working class.", "likes": 150}
]
}
]
}
]
}
19 changes: 5 additions & 14 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,11 @@ authors = [
]

dependencies = [
"requests>=2.31.0",
"ebooklib>=0.18",
"beautifulsoup4>=4.12.0",
"striprtf>=0.0.26",
"pyyaml>=6.0",
"pypdf>=3.17.0",
"pdfplumber>=0.10.0",
"pydub>=0.25.1",
"pyloudnorm>=0.1.1",
"atproto>=0.0.1",
"pandas>=2.0.0",
"bertopic>=0.16.0",
"ollama>=0.1.0",
"numpy>=1.24.0,<2.0",
"openai-whisper==20231117",
]

[project.optional-dependencies]
Expand All @@ -35,8 +29,5 @@ dev = [
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.uv.extra-build-dependencies]
openai-whisper = ["standard-pkg-resources"]

[tool.hatch.build.targets.wheel]
packages = ["data_engineering"]
packages = ["pipeline"]
9 changes: 6 additions & 3 deletions scripts/security_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ fi

echo ""
echo "📋 Running Bandit security scan..."
bandit -r data_engineering/ -f json -o bandit-report.json || true
bandit -r pipeline/ -f json -o bandit-report.json || true

echo ""
echo "📦 Running pip-audit dependency scan..."
Expand Down Expand Up @@ -68,8 +68,11 @@ import os
if os.path.exists('pip-audit-report.json'):
with open('pip-audit-report.json') as f:
data = json.load(f)
vulns = data.get('vulnerabilities', [])
print(len(vulns))
if 'dependencies' in data:
count = sum(len(d.get('vulns', [])) for d in data['dependencies'])
else:
count = len(data.get('vulnerabilities', []))
print(count)
else:
print(0)
" 2>/dev/null || echo "0")
Expand Down
20 changes: 20 additions & 0 deletions src/App.jsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import React from 'react'

function App() {
return (
<div className="flex h-screen w-screen bg-black text-white">
<div className="w-2/3 border-r border-gray-800">
{/* 3D Canvas will go here */}
<div className="flex h-full items-center justify-center">
<p className="text-xl italic">3D Observatory (React Three Fiber)</p>
</div>
</div>
<div className="w-1/3 p-8 overflow-y-auto">
<h1 className="text-3xl font-bold mb-4">Perspectiverse</h1>
<p className="text-gray-400">Select a planet to begin exploring the discourse.</p>
</div>
</div>
)
}

export default App
3 changes: 3 additions & 0 deletions src/index.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
10 changes: 10 additions & 0 deletions src/main.jsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import React from 'react'
import ReactDOM from 'react-dom/client'
import App from './App.jsx'
import './index.css'

ReactDOM.createRoot(document.getElementById('root')).render(
<React.StrictMode>
<App />
</React.StrictMode>,
)
11 changes: 11 additions & 0 deletions tailwind.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
/** @type {import('tailwindcss').Config} */
export default {
content: [
"./index.html",
"./src/**/*.{js,ts,jsx,tsx}",
],
theme: {
extend: {},
},
plugins: [],
}
Loading
Loading