Python 3.x:
python --version
# Python 3.13.2Edit the following files to specify the values you want to override:
kanji_parts.json
kanji_vocab.json
keywords.json
vocab_furigana.json
vocab_meaning.json
Download aggregated kanji information from Kanji Data Releases
curl --output-dir input -OL https://github.com/PikaPikaGems/kanji-data-releases/releases/latest/download/kanji-data.tar.gz
tar -xzf ./input/kanji-data.tar.gz -C ./input/Download the map of vocabulary to its components from JMdict Furigana Map
curl --output-dir input -OL https://github.com/PikaPikaGems/jmdict-furigana-map/releases/latest/download/jmdict-furigana-map.json.tar.gz
tar -xzf ./input/jmdict-furigana-map.json.tar.gz -C ./input/Download and prepare the Simplified JMdict JSON file from Jmdict Simplified
# if all words
curl --output-dir input -OL https://github.com/scriptin/jmdict-simplified/releases/download/3.6.1%2B20250324123350/jmdict-eng-3.6.1+20250324123350.json.tgz
tar -xzf ./input/jmdict-eng-3.6.1+20250324123350.json.tgz -C ./input/
mv input/jmdict-eng-3.6.1.json input/scriptin-jmdict-eng.json
# If common words only
curl --output-dir input -OL https://github.com/scriptin/jmdict-simplified/releases/download/3.6.1%2B20250324123350/jmdict-eng-common-3.6.1+20250324123350.json.tgz
tar -xzf ./input/jmdict-eng-common-3.6.1+20250324123350.json.tgz -C ./input/
mv input/jmdict-eng-common-3.6.1.json input/scriptin-jmdict-eng.json
Remove the files which you don't need anymore, to reduce clutter
rm ./input/kanji-data.tar.gz
rm ./input/jmdict-furigana-map.json.tar.gz
# depending on what you chose
rm ./input/jmdict-eng-common-3.6.1+20250324123350.json.tgz
rm ./input/jmdict-eng-3.6.1+20250324123350.json.tgz
This leaves the input directory with the following files:
cum_use.json
jmdict-furigana-map.json # From: JMdict Furigana Map
kanji_vocab.json
merged_kanji.json
missing_components.json
phonetic_components.json
scriptin-jmdict-eng.json # From: Jmdict Simplified
vocab_furigana.json
vocab_meaning.json
./src/kanji_build_output_jsons.pyThe following output files should be generated in the output directory:
- component_keyword.json
- cum_use.json
- kanji_extended.json
- kanji_main.json
- phonetic.json
- vocabulary_meaning.json
- vocabulary_furigana.json
Additionally, the following files will be created by running the script above
in the input directory. This will not be part of the release file.
jmdict-vocab-meaning.json
./src/kanji_inspect.pySee RELEASE.md
The software is distributed under the MIT License.
The input data comes from:
- Dmitry Shpika's jmdict-simplified which project uses the JMdict/EDICT file, which is the property of the Electronic Dictionary Research and Development Group (https://www.edrdg.org/), and used in conformance with the Group's license.
- Kanji Data Releases and JMdict Furigana Map, both under CC BY-SA 4.0.
The original XML files - JMdict.xml, JMdict_e.xml, JMdict_e_examp.xml,and JMnedict.xml - are the property of the Electronic Dictionary Research and Development Group, and are used in conformance with the Group's license. All derived files are distributed under the same license, as the original license requires it.