Note
These tools are based on the KanjiVG project.
This is an unofficial implementation to generate datasets for Machine learning experiments from the Kanjivg project.
- Source Data: KanjiVG (Kanji Vector Graphics)
- Status: Unofficial, for research/personal use.
- python 3.10>=
- ubuntu
1. Clone the KanjiVG-ML repository:
git clone https://github.com/rishiyama/kanjivg-ML
cd kanjivg-ML2. Clone the KanjiVG repository and initialize it:
git clone https://github.com/KanjiVG/kanjivg.git
# fix kanjivg/__init__.py to import kanjivg
bash scripts/init.sh 3. Install any required dependencies:
cairo:
pip install CairoSVG
apt install libcairo2Optional:
if you can get the output like this, then you are ready to use the kanjivg and kanjivg-ML package.
$ python example.py Is 0x4E00 a kanji? True
Generate a dataset:
python run.pyand also, you can customize the parameters of png-images, such as width, height, and save directory by using the following command:
# same as default
python run.py --path ./kanjivg/kanji --width 256 --height 256 --save_dir ./outputresult
output
|-- kanji
| |-- png
| |-- png_white
| `-- svg
`-- other
|-- png
|-- png_white
`-- svg- kanji: contains the kanji images by filtering.
- other: contains the non-kanji images by filtering.
- png: contains the kanji images in PNG format with a transparent background.
- png_white: contains the kanji images in PNG format with a white background.
- svg: contains the kanji images in SVG format, using simplified SVG paths.
This project is heavily reliant on the fantastic work done by the KanjiVG project.