Skip to content

Commit ba7b6dd

Browse files
authored
Merge pull request #3 from CocoRoF/main
upgrade
2 parents 5770f5a + 89d51c7 commit ba7b6dd

25 files changed

+2690
-456
lines changed

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,38 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.1.2] - 2026-01-20
9+
10+
### Added
11+
- **BedrockOCR**: AWS Bedrock Vision model support for OCR processing
12+
- Supports Claude 3.5 Sonnet and other Bedrock vision models
13+
- Full AWS credential configuration (access key, secret key, session token, region)
14+
- Configurable timeouts and retry settings
15+
- **ImageFileHandler**: New handler for standalone image files (jpg, png, gif, bmp, webp)
16+
- Automatically uses OCR engine when available
17+
- Returns image tag format when OCR is not configured for later processing
18+
- **PageTagProcessor**: Centralized page/slide/sheet tag processing system
19+
- Unified tag generation across all document handlers
20+
- Configurable tag prefixes and suffixes
21+
- **Image pattern support for OCR**: Custom image tag patterns now passed to OCR engine
22+
- `ImageProcessor.get_pattern_string()` method for regex pattern generation
23+
- `BaseOCR.set_image_pattern()` and `set_image_pattern_from_string()` methods
24+
- OCR engines now recognize custom image tag formats
25+
26+
### Changed
27+
- **DocumentProcessor**: OCR engine setter now invalidates handler registry for proper refresh
28+
- **Handler registry**: ImageFileHandler automatically registered with OCR engine support
29+
- **QUICKSTART.md**: Complete rewrite with comprehensive documentation
30+
- 3-stage processing pipeline documentation (File → Text → OCR → Chunks)
31+
- Detailed OCR configuration guide for all 5 engines
32+
- Tag customization examples (image, page, slide, sheet)
33+
- Complete API reference with all parameters
34+
35+
### Improved
36+
- All Korean comments and docstrings in `img_processor.py` converted to English
37+
- Enhanced OCR integration with custom pattern matching support
38+
- Better separation of concerns with PageTagProcessor
39+
840
## [0.1.0] - 2026-01-19
941

1042
### Added
@@ -26,4 +58,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2658
- Automatic encoding detection for text files
2759
- Chart and image extraction from Office documents
2860

61+
[0.1.2]: https://github.com/CocoRoF/Contextifier/compare/v0.1.0...v0.1.2
2962
[0.1.0]: https://github.com/CocoRoF/Contextifier/releases/tag/v0.1.0

0 commit comments

Comments
 (0)