Add some sort of metadata to the extracted dumps.

```bash
url="https://kaikki.org/dictionary/raw-wiktextract-data.jsonl.gz"
output_file="20260201_raw-wiktextract-data.jsonl"

curl -fL "$url" \
  | gzip -dc \
  > "$output_file"
```

I currently download the compressed dump like that.
Currently it is based on the 20260201 dump of wiktionary but there is no way to know what you get based on the filename.
I also didn't find any embedded data that tells me what version of wiktionary the dump is based on.
Maybe add a first line of json with metadata or simply offer a index with maybe the last 3 extracts (compressed would be enough) with proper filenames.

Currently I verify the sha256 sum to check that I got the file that I want.

Related to: https://github.com/tatuylonen/wiktextract/issues/1264

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some sort of metadata to the extracted dumps. #1601

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add some sort of metadata to the extracted dumps. #1601

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions