Skip to content

Commit fa122a3

Browse files
Fixed error with non-exact unicode escapes and only whitespace files (#8)
* fixed errors with unicode escapes and only whitespace files * updated README and rockspec
1 parent 4e0ac22 commit fa122a3

4 files changed

Lines changed: 35 additions & 28 deletions

File tree

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,18 @@
11
# tinytoml
22
[![Run Tests and Code Coverage](https://github.com/FourierTransformer/tinytoml/actions/workflows/test-and-coverage.yml/badge.svg)](https://github.com/FourierTransformer/tinytoml/actions/workflows/test-and-coverage.yml) [![Coverage Status](https://coveralls.io/repos/github/FourierTransformer/tinytoml/badge.svg?branch=refs/pull/1/merge)](https://coveralls.io/github/FourierTransformer/tinytoml?branch=main)
33

4-
tinytoml is a pure Lua [TOML](https://toml.io) parsing library. It's written in [Teal](https://github.com/teal-language/tl) and works with Lua 5.1-5.4 and LuaJIT 2.0/2.1. tinytoml parses a TOML document into a standard Lua table using default Lua types. Since TOML supports various datetime types, those are _defaultly_ represented by strings, but can be configured to use a custom type if desired.
4+
tinytoml is a pure Lua [TOML](https://toml.io) parsing library. It's written in [Teal](https://github.com/teal-language/tl) and works with Lua 5.1-5.4 and LuaJIT 2.0/2.1. tinytoml parses a TOML document into a standard Lua table using default Lua types. Since TOML supports various datetime types, those are by default represented by strings, but can be configured to use a custom type if desired.
55

6-
tinytoml passes all the [toml-test](https://github.com/toml-lang/toml-test) use cases that Lua can realistically pass (even the UTF-8 ones!). The few that fail are mostly representational:
6+
tinytoml passes all the [toml-test](https://github.com/toml-lang/toml-test) [use cases](https://toml-lang.github.io/toml-test-matrix/) that Lua can realistically pass (even the UTF-8 ones!). The few that fail are mostly representational:
77
- Lua doesn't differentiate between an array or a dictionary, so tests involving _empty_ arrays fail.
8-
- Some Lua versions have differences in how numbers are represented
8+
- Some Lua versions have differences in how numbers are represented. Lua 5.3 introduced integers, so tests involving integer representation pass on newer versions.
99
- tinytoml currently support trailing commas in arrays/inline-tables. This is coming in TOML 1.1.0.
1010

1111
Current Supported TOML Version: 1.0.0
1212

13-
## Implemented and Missing Features
14-
- TOML Parsing in Pure Lua, just grab the tinytoml.lua file and go!
15-
- Does not keep track of comments
13+
## Missing Features
1614
- Cannot encode a table to TOML
15+
- Does not keep track of comments
1716

1817
## Installing
1918
You can grab the `tinytoml.lua` file from this repo (or the `tinytoml.tl` file if using Teal) or install it via LuaRocks
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
package = "tinytoml"
2-
version = "0.0.2-1"
2+
version = "0.0.3-1"
33

44
source = {
55
url = "git://github.com/FourierTransformer/tinytoml.git",
6-
tag = "0.0.2"
6+
tag = "0.0.3"
77
}
88

99
description = {

tinytoml.lua

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -297,18 +297,18 @@ local function handle_backslash_escape(sm)
297297
end
298298

299299

300-
sm._, sm.end_seq, sm.match, sm.ext = sm.input:find("^([uU])([0-9a-fA-F]+)", sm.i + 1)
301-
if sm.match then
302300

303-
if (sm.match == "u" and #sm.ext == 4) or
304-
(sm.match == "U" and #sm.ext == 8) then
305-
local codepoint_to_insert = _utf8char(tonumber(sm.ext, 16))
306-
if not validate_utf8(codepoint_to_insert) then
307-
_error(sm, "Escaped UTF-8 sequence not valid UTF-8 character: \\" .. sm.match .. sm.ext, "string")
308-
end
309-
sm.i = sm.end_seq
310-
return codepoint_to_insert, false
301+
sm._, sm.end_seq, sm.match, sm.ext = sm.input:find("^(u)([0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])", sm.i + 1)
302+
if not sm.match then
303+
sm._, sm.end_seq, sm.match, sm.ext = sm.input:find("^(U)([0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])", sm.i + 1)
304+
end
305+
if sm.match then
306+
local codepoint_to_insert = _utf8char(tonumber(sm.ext, 16))
307+
if not validate_utf8(codepoint_to_insert) then
308+
_error(sm, "Escaped UTF-8 sequence not valid UTF-8 character: \\" .. sm.match .. sm.ext, "string")
311309
end
310+
sm.i = sm.end_seq
311+
return codepoint_to_insert, false
312312
end
313313

314314
return nil
@@ -1084,6 +1084,10 @@ function tinytoml.parse(filename, options)
10841084
local dynamic_next_mode = "start_of_line"
10851085
local transition = nil
10861086
sm._, sm.i = sm.input:find("[^ \t]", sm.i)
1087+
1088+
1089+
if not sm.i then return {} end
1090+
10871091
while sm.i <= sm.input_length do
10881092
sm.byte = sbyte(sm.input, sm.i)
10891093

tinytoml.tl

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -297,18 +297,18 @@ local function handle_backslash_escape(sm: StateMachine): string, boolean
297297
end
298298

299299
-- unicode escape sequences
300-
sm._, sm.end_seq, sm.match, sm.ext = sm.input:find("^([uU])([0-9a-fA-F]+)", sm.i+1) as (integer, integer, string, string)
300+
-- hex escapes coming in toml 1.1.0, will need to update
301+
sm._, sm.end_seq, sm.match, sm.ext = sm.input:find("^(u)([0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])", sm.i+1) as (integer, integer, string)
302+
if not sm.match then
303+
sm._, sm.end_seq, sm.match, sm.ext = sm.input:find("^(U)([0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])", sm.i+1) as (integer, integer, string, string)
304+
end
301305
if sm.match then
302-
--if (sm.match == "x" and #sm.ext == 2) or -- hex escapes coming in toml 1.1.0, will need to update pattern in :find above as well
303-
if (sm.match == "u" and #sm.ext == 4) or
304-
(sm.match == "U" and #sm.ext == 8) then
305-
local codepoint_to_insert = _utf8char(tonumber(sm.ext, 16))
306-
if not validate_utf8(codepoint_to_insert) then
307-
_error(sm, "Escaped UTF-8 sequence not valid UTF-8 character: \\" .. sm.match .. sm.ext, "string")
308-
end
309-
sm.i = sm.end_seq
310-
return codepoint_to_insert, false
306+
local codepoint_to_insert = _utf8char(tonumber(sm.ext, 16))
307+
if not validate_utf8(codepoint_to_insert) then
308+
_error(sm, "Escaped UTF-8 sequence not valid UTF-8 character: \\" .. sm.match .. sm.ext, "string")
311309
end
310+
sm.i = sm.end_seq
311+
return codepoint_to_insert, false
312312
end
313313

314314
return nil
@@ -1084,6 +1084,10 @@ function tinytoml.parse(filename: string, options?: TinyTomlOptions): {string:an
10841084
local dynamic_next_mode: states = "start_of_line"
10851085
local transition: {function, string} = nil
10861086
sm._, sm.i = sm.input:find("[^ \t]", sm.i)
1087+
1088+
-- just an file with whitespace and nothing else...
1089+
if not sm.i then return {} end
1090+
10871091
while sm.i <= sm.input_length do
10881092
sm.byte = sbyte(sm.input, sm.i)
10891093

0 commit comments

Comments
 (0)