GitHub - PeushYadav/adobe_2025

PDF Structure Extractor using Tiny LLM

Project Overview

This project extracts and restructures the content of a PDF file by identifying headings, subheadings, and plain text using a lightweight language model (Tiny LLM ~60MB). Due to LLM input token limitations (1024 tokens), large PDFs are broken into manageable chunks before processing. The tool outputs structured JSON and a final PDF.

Project Structure

Got it! Here's your Project Structure section rewritten in a clean, numbered format (ideal for GitHub READMEs):

📁 Project Structure

uploads/ Stores uploaded PDF files from the frontend.
index.js Main Node.js backend file to handle file upload and processing.
processpdf.js Contains logic to split PDFs into chunks and manage token limits for LLM.
frontend/ (React + Vite app)
- public/ – Static assets
- src/ – React components and frontend logic
- package.json – Frontend dependencies and scripts
llm-server/ (Python backend with FastAPI)
- main.py – API endpoints for processing text using Tiny LLM
- download_model.py – Downloads and prepares the LLM model
- requirements.txt – Python dependencies for the server
.gitignore Lists files and folders ignored by Git (like node_modules and venv).
README.md Documentation of the project.

Let me know if you want this embedded back into the full README and exported as an updated PDF.

Features

Upload a PDF via UI
Automatically chunks into ~500-word parts
Sends to Tiny LLM and extracts structure
Displays JSON result
Outputs a structured final PDF Setup Instructions

Clone the Repository: git clone https://github.com/your-username/pdf-structure-extractor.git cd pdf-structure-extractor
Setup Python Environment: PDF Structure Extractor using Tiny LLM cd llm-server python -m venv venv source venv/bin/activate (Windows: venv\Scripts\activate) pip install -r requirements.txt
Install Node Dependencies: Run 'npm install' in root and 'frontend/' folders. Running the App
Start Node Backend: node index.js
Start LLM Server: cd llm-server uvicorn main:app --host 127.0.0.1 --port 8000
Start Frontend: cd frontend npm run dev How to Use
Open the app at http://localhost:5173
Upload a PDF
Wait for JSON and PDF outputs
JSON is displayed; PDF is downloadable Important Notes

Tiny LLM has a 1024-token input limit
We split large PDFs into ~500-word chunks
Python, pip, and virtualenv are required
Run 'npm install' in each folder with a package.json PDF Structure Extractor using Tiny LLM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📁 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
backend		backend
frontend		frontend
llm-server		llm-server
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📁 Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages