Skip to content

Commit 5317f4b

Browse files
committed
feat: Update project metadata, add documentation, and create testing scripts for Contextify
1 parent 6d992f5 commit 5317f4b

8 files changed

Lines changed: 824 additions & 4 deletions

File tree

BUILD_AND_PUBLISH.md

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# Building and Publishing Contextify
2+
3+
This guide explains how to build and publish Contextify to PyPI.
4+
5+
## Prerequisites
6+
7+
1. Install build tools:
8+
```bash
9+
pip install build twine
10+
# or
11+
uv pip install build twine
12+
```
13+
14+
2. Create a PyPI account:
15+
- Production: https://pypi.org/account/register/
16+
- Test: https://test.pypi.org/account/register/
17+
18+
3. Generate API token:
19+
- Go to Account Settings → API tokens
20+
- Create a token for the project
21+
- Save it securely (you'll only see it once)
22+
23+
## Building the Package
24+
25+
1. Clean previous builds:
26+
```bash
27+
# Windows
28+
Remove-Item -Recurse -Force dist, build, *.egg-info
29+
30+
# Linux/Mac
31+
rm -rf dist build *.egg-info
32+
```
33+
34+
2. Build the package:
35+
```bash
36+
python -m build
37+
```
38+
39+
This creates:
40+
- `dist/contextify-X.Y.Z-py3-none-any.whl` (wheel distribution)
41+
- `dist/contextify-X.Y.Z.tar.gz` (source distribution)
42+
43+
3. Verify the build:
44+
```bash
45+
twine check dist/*
46+
```
47+
48+
## Testing Locally
49+
50+
Install the built package locally:
51+
```bash
52+
pip install dist/contextify-X.Y.Z-py3-none-any.whl
53+
```
54+
55+
Test it:
56+
```python
57+
from libs.core.document_processor import DocumentProcessor
58+
59+
processor = DocumentProcessor()
60+
text = processor.extract_text("test.pdf")
61+
print(text)
62+
```
63+
64+
## Publishing to Test PyPI (Recommended First)
65+
66+
1. Upload to Test PyPI:
67+
```bash
68+
twine upload --repository testpypi dist/*
69+
```
70+
71+
2. Enter your Test PyPI credentials or token:
72+
- Username: `__token__`
73+
- Password: Your Test PyPI API token
74+
75+
3. Install from Test PyPI:
76+
```bash
77+
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ contextify
78+
```
79+
80+
## Publishing to PyPI (Production)
81+
82+
1. Upload to PyPI:
83+
```bash
84+
twine upload dist/*
85+
```
86+
87+
2. Enter your PyPI credentials or token:
88+
- Username: `__token__`
89+
- Password: Your PyPI API token
90+
91+
3. Verify on PyPI:
92+
- Visit: https://pypi.org/project/contextify/
93+
94+
4. Install from PyPI:
95+
```bash
96+
pip install contextify
97+
```
98+
99+
## Using GitHub Actions (Automated Publishing)
100+
101+
Create `.github/workflows/publish.yml`:
102+
103+
```yaml
104+
name: Publish to PyPI
105+
106+
on:
107+
release:
108+
types: [published]
109+
110+
jobs:
111+
publish:
112+
runs-on: ubuntu-latest
113+
steps:
114+
- uses: actions/checkout@v4
115+
- uses: actions/setup-python@v5
116+
with:
117+
python-version: '3.11'
118+
- name: Install dependencies
119+
run: |
120+
python -m pip install --upgrade pip
121+
pip install build twine
122+
- name: Build package
123+
run: python -m build
124+
- name: Publish to PyPI
125+
env:
126+
TWINE_USERNAME: __token__
127+
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
128+
run: twine upload dist/*
129+
```
130+
131+
Add your PyPI API token to GitHub Secrets:
132+
- Go to repository Settings → Secrets → Actions
133+
- Add secret: `PYPI_API_TOKEN`
134+
135+
## Version Bumping
136+
137+
Update version in `pyproject.toml`:
138+
```toml
139+
[project]
140+
version = "X.Y.Z"
141+
```
142+
143+
Follow [Semantic Versioning](https://semver.org/):
144+
- MAJOR: Breaking changes
145+
- MINOR: New features (backwards compatible)
146+
- PATCH: Bug fixes
147+
148+
## Checklist Before Publishing
149+
150+
- [ ] All tests pass
151+
- [ ] Version number updated in `pyproject.toml`
152+
- [ ] CHANGELOG.md updated
153+
- [ ] README.md is current
154+
- [ ] Build succeeds without warnings
155+
- [ ] Tested installation locally
156+
- [ ] Tested on Test PyPI
157+
- [ ] Git tag created: `git tag -a v1.0.0 -m "Release 1.0.0"`
158+
- [ ] Git tag pushed: `git push origin v1.0.0`
159+
160+
## Troubleshooting
161+
162+
### Import Error After Installation
163+
164+
Make sure the package name matches:
165+
```python
166+
# Correct
167+
from libs.core.document_processor import DocumentProcessor
168+
169+
# If you want simpler imports, add to libs/__init__.py:
170+
from libs.core.document_processor import DocumentProcessor
171+
__all__ = ['DocumentProcessor']
172+
173+
# Then you can use:
174+
from libs import DocumentProcessor
175+
```
176+
177+
### Build Fails
178+
179+
- Check all dependencies are in `pyproject.toml`
180+
- Ensure `__init__.py` files exist in all package directories
181+
- Verify no syntax errors
182+
183+
### Upload Fails
184+
185+
- Check credentials/token
186+
- Verify package name is available on PyPI
187+
- Ensure version number hasn't been used before
188+
189+
## References
190+
191+
- [Python Packaging Guide](https://packaging.python.org/)
192+
- [Twine Documentation](https://twine.readthedocs.io/)
193+
- [PyPI Help](https://pypi.org/help/)

CHANGELOG.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [0.1.0] - 2026-01-19
9+
10+
### Added
11+
- Initial release of Contextify
12+
- Multi-format document support (PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, HWP, HWPX)
13+
- Intelligent text extraction with structure preservation
14+
- Table detection and extraction with HTML formatting
15+
- OCR integration (OpenAI, Anthropic, Google Gemini, vLLM)
16+
- Smart chunking with semantic awareness
17+
- Metadata extraction
18+
- Support for 20+ code file formats
19+
- Korean document support (HWP, HWPX)
20+
21+
### Features
22+
- `DocumentProcessor` class for easy document processing
23+
- Configurable chunk size and overlap
24+
- Protected regions for code blocks
25+
- Pluggable OCR engine architecture
26+
- Automatic encoding detection for text files
27+
- Chart and image extraction from Office documents
28+
29+
[0.1.0]: https://github.com/CocoRoF/Contextify/releases/tag/v0.1.0

CONTRIBUTING.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Contributing to Contextify
2+
3+
Thank you for your interest in contributing to Contextify! This document provides guidelines and instructions for contributing.
4+
5+
## Development Setup
6+
7+
1. Clone the repository:
8+
```bash
9+
git clone https://github.com/CocoRoF/Contextify.git
10+
cd contextify
11+
```
12+
13+
2. Create a virtual environment and install dependencies:
14+
```bash
15+
# Using uv (recommended)
16+
uv venv
17+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
18+
uv pip install -e ".[dev]"
19+
20+
# Or using pip
21+
python -m venv .venv
22+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
23+
pip install -e ".[dev]"
24+
```
25+
26+
3. Run tests:
27+
```bash
28+
python test_all_handlers.py
29+
```
30+
31+
## Code Style
32+
33+
- Follow PEP 8 guidelines
34+
- Use type hints where appropriate
35+
- Add docstrings for public functions and classes
36+
- Keep functions focused and modular
37+
38+
## Testing
39+
40+
- Add tests for new features
41+
- Ensure all tests pass before submitting a PR
42+
- Test with multiple document formats when applicable
43+
44+
## Pull Request Process
45+
46+
1. Fork the repository
47+
2. Create a new branch for your feature (`git checkout -b feature/amazing-feature`)
48+
3. Make your changes
49+
4. Run tests to ensure everything works
50+
5. Commit your changes (`git commit -m 'Add amazing feature'`)
51+
6. Push to your branch (`git push origin feature/amazing-feature`)
52+
7. Open a Pull Request
53+
54+
## Reporting Issues
55+
56+
When reporting issues, please include:
57+
- Python version
58+
- Operating system
59+
- Document format being processed
60+
- Minimal code to reproduce the issue
61+
- Error messages and stack traces
62+
63+
## Feature Requests
64+
65+
We welcome feature requests! Please:
66+
- Check if the feature already exists or is planned
67+
- Provide a clear description of the feature
68+
- Explain the use case and benefits
69+
70+
## License
71+
72+
By contributing, you agree that your contributions will be licensed under the Apache License 2.0.

0 commit comments

Comments
 (0)