3-Channel LLM Watermarking for Hangul Jamo Structure

This is a capstone project to suggest a Korean specific LLM watermarking strategy by @mirulili at Yonsei University (2025 Fall).

Structure

3Ch-Jamo-Watermark/
├─ src/
│  ├─ __init__.py
│  ├─ main.py                           # Execute watermark generation and detection pipeline
│  │
│  ├─ model/                            # Language model related modules
│  │  ├─ __init__.py
│  │  ├─ load_model.py                   # Load model and tokenizer
│  │  └─ generate.py                     # Text generation logic
│  │
│  ├─ watermark/                        # Core watermarking logic modules
│  │  ├─ __init__.py
│  │  ├─ jamo_utils.py                   # Hangul Jamo decomposition utility
│  │  ├─ payload_mgr.py                  # Manage message <-> bit sequence conversion
│  │  ├─ hash_policy.py                  # Jamo-based hash calculation policy
│  │  ├─ processor.py                    # JamoWatermarkProcessor (Watermark insertion)
│  │  └─ detector.py                     # JamoWatermarkDetector (Watermark detection)
│  │
│  └─ evaluation/                       # Performance evaluation related modules
│     ├─ __init__.py
│     ├─ eval_quality.py                 # Measure generation quality (PPL, etc.)
│     └─ eval_robustness.py              # Robustness testing
│
├─ .gitignore                          
├─ Makefile                            
├─ README.md                           
└─ requirements.txt

Execution Instructions

Install Dependencies:
```
make install
```
Run Program:
```
make run
```
This executes src/main.py, which performs the entire process of inserting the watermark to generate text and then restoring the message from the generated text.
Test Robustness:
```
make test_robustness
```

Core Operating Principle

Jamo Channel Separation: Decompose a Hangul syllable into three channels -- Choseong (initial consonant), Jungseong (medial vowel), and Jongseong (final consonant) -- and independently assign a watermark bit to each channel.
Parallel Channel Selection & Target Bit Matching: At each watermarking insertion step, one channel among three is randomly selected. Calculate a hash value from the Jamo indices of each token and check if this value matches the target bit for the current step.
Conditional Step Synchronization:
- Insertion (Processor): After applying a bias to the logits (probabilities), the watermark is considered inserted and moves to the next bit (step_t increments) only if the most likely candidate token matches the target bit and is actually selected.
- Detection (Detector): Read the tokens of the generated text sequentially, and extract the watermark and move to the next bit (step_t increments) only if the token's hash value matches the target bit to be found.
- This method allows the generator and detector to maintain synchronization by advancing the step according to the same rule, despite the uncertainty of sampling.

Preview of Detection Results

Watermarked Text

General Text

(Used MarkLLM Toolkit visualizer)

Full report can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
notes		notes
src		src
.gitignore		.gitignore
Hangul_Watermark_Poster.pdf		Hangul_Watermark_Poster.pdf
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Report.pdf		Report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3-Channel LLM Watermarking for Hangul Jamo Structure

Structure

Execution Instructions

Core Operating Principle

Preview of Detection Results

Watermarked Text

General Text

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

3-Channel LLM Watermarking for Hangul Jamo Structure

Structure

Execution Instructions

Core Operating Principle

Preview of Detection Results

Watermarked Text

General Text

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages