Skip to content

Converting string that contains Japanese text breaks on Windows  #77

@keychera

Description

@keychera

I'm using bootleg both via babashka v1.2.174 and running the jar file (java 11.0.13) on Windows 10 Home Single Language

here is a reproducible example via babashka

(ns user)

(require '[babashka.pods :as pods])
(pods/load-pod 'retrogradeorbit/bootleg "0.1.9")
(require '[pod.retrogradeorbit.bootleg.utils :as utils])
(require '[pod.retrogradeorbit.hickory.select :as s])

(let [jp-html "<div>読</div>"]
  (spit "test-jp-str.txt" jp-html)
  (spit "test-jp-converted.txt"
        (utils/convert-to jp-html :hickory)))

and I also tried running this with the jar via java -jar command, and stdout to a converted.txt

(let [jp-html "<div>読</div>"]
  (convert-to jp-html :hickory))

both breaks the string 読 but with different results

  • via babashka, it turns into {:type :element, :attrs nil, :tag :div, :content ["読"]}
  • via java -jar, it turns into {:type :element, :attrs nil, :tag :div, :content ["��"]}
    (not quite sure if the characters will be correctly shown here)

note: I tried this on MacOS and there is no conversion problem

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions