Skip to content

Add some sort of metadata to the extracted dumps. #1601

@raldone01

Description

@raldone01
url="https://kaikki.org/dictionary/raw-wiktextract-data.jsonl.gz"
output_file="20260201_raw-wiktextract-data.jsonl"

curl -fL "$url" \
  | gzip -dc \
  > "$output_file"

I currently download the compressed dump like that.
Currently it is based on the 20260201 dump of wiktionary but there is no way to know what you get based on the filename.
I also didn't find any embedded data that tells me what version of wiktionary the dump is based on.
Maybe add a first line of json with metadata or simply offer a index with maybe the last 3 extracts (compressed would be enough) with proper filenames.

Currently I verify the sha256 sum to check that I got the file that I want.

Related to: #1264

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions