You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1.3 KiB
1.3 KiB
Segmenter for gettext (.po) translation files
Inserts ZWSP between the segments of Chinese and Japanese text.
For Chinese, uses a high quality zh_segmentation model from Google: https://tfhub.dev/google/zh_segmentation/1.
For Japanese, uses Sudachi.
Pre-requisites
-
Python. The easiest way to install Python on any Linux system is https://github.com/asdf-vm/asdf.
-
Packages:
pip install -r tools/segmenter/requirements.txt
Usage
To re-segment all the translation files:
tools/segmenter/segment_all.py
To re-segment the Chinese translation files:
tools/segmenter/segment_zh.py --input_path Translations/zh_CN.po
tools/segmenter/segment_zh.py --input_path Translations/zh_TW.po
To re-segment the Japanese translation files:
tools/segmenter/segment_ja.py --input_path Translations/ja.po
Additionaly, you can provide a different separator, such as --separator='|', for debugging.
This tool performs a number of replacements to make sure interpolations are not affected etc.
You can also see the segmenter output for a given string like this:
tools/segmenter/segment_zh.py --debug '返回到 {:d} 层'
返回|到| |{|:d}| |层