dictzip-decompile

This is an experimental tool to convert a dictzip into a dictfile. The output may not be perfect for complex dictionaries. The output should be perfect for dictionaries generated by Penelope.

Usage

Usage: dictzip-decompile [options] dictzip

Options:
  -o, --output string   The output filename (will be overwritten if it exists) (- is stdout) (default "./decompiled.df")
  -r, --resources       Also extract referenced resources to the current directory (warning: any existing files will be overwritten, so it is recommended to run in an empty directory if enabled)
  -h, --help            Show this help text

Arguments:
  dictzip is the path to the dictzip to decompile.

To convert the resulting dictfile into a dictzip, use dictgen.

Note: The regenerated dictzip from the dictfile may not match exactly, but it will look the same, and certain bugs with prefixes and variants will be implicitly fixed by the conversion process (i.e. variant in wrong file, incorrect prefix, missing words in index file). All output is in raw HTML, not Markdown.

This is an experimental tool, and the output may not be perfect on complex dictionaries.

Example uses

Fixing prefixes or missing variants in dictzips generated by other tools (recompiling the dictfile will automatically fix the prefixes and variants).
Upgrading a v1 dictzip to v2 (same as above).
Decompiling a dictzip to merge it with another.
Converting a previously-created dictzip to a dictfile to make it easier to improve.
Converting StarDict dictionaries by converting to a dictzip using Penelope, then to a dictfile using this tool.

Notes

The following dictzip generators have enhanced decompilation support:

Penelope: The output should be perfect.
Kobo (en, a few others): The output should be mostly perfect, but there are a few missing edge cases. Variants (&) and header info (:) are extracted in addition to the entry content.
Kobo (fr): The output should be mostly perfect, but there are a few missing edge cases. Variants (&) and header info (:) are extracted in addition to the entry content.
dictgen: The output should be very close to the original dictfile (it has been tested with the output of gotdict-convert and webster1913-convert). With gotdict-convert, the only difference when the decompiled dictzip’s dictfile was recompiled was the casing of a few entries in the words index. Even so, this should not be used unless the original dictfile has been lost. In addition, the original Markdown code and images are not recovered. Variants (&) and header info (: / ::) are extracted in addition to the entry content.

Other dictzips only have the headword (@) and variants (&) extracted, and the content is included as-is as raw HTML without support for other dictfile features.