Parsek is a library for (and written in) Kotlin for easily building parser combinators. It is based on JParsec and (Haskell) Parsec. It allows you to create a text (or token) parser based on easy-to-combine building blocks. It is designed to be:
- Declarative: Define parsers in a readable and composable way.
- Flexible: Parse both text and token streams.
- Lightweight: Minimal dependencies and easy to integrate into Kotlin projects.
- What is Parsek?
- What can Parsek be used for?
- Installation
- Quick Start Guide
- Core Concepts
- Examples
- Error Handling
- Performance Notes
- Related Projects
- Contribution Guidelines
- License
- Links & Documentation
Parsek is versatile and can be used for a variety of parsing tasks, such as:
- Parsing Configuration Files: JSON, YAML, or custom formats.
- Building DSLs: Create interpreters or compilers for domain-specific languages.
- Processing Structured Data: Parse CSV, logs, or other structured text.
- Advent of Code Challenges: Quickly write parsers for puzzle inputs.
- Language Parsing: Build lexers and parsers for programming languages.
Add Parsek to your project using Gradle or Maven.
dependencies {
implementation 'nl.w8mr.parsek:core:<latest-version>'
}<dependency>
<groupId>nl.w8mr.parsek</groupId>
<artifactId>core</artifactId>
<version><!-- latest-version --></version>
</dependency>Replace <latest-version> with the version shown in the badge above.
Here's a minimal example to get you started:
//import nl.w8mr.parsek.text.*
val parser = number // Parses a sequence of digits as an Int
val result = parser("123abc") // result: 123You can also combine parsers:
val signed = signedNumber
println(signed("-42")) // Output: -42
println(signed("17")) // Output: 17At its core, Parsek operates on the concept of a Parser. A Parser is a function that takes an input, consumes a part of it, and returns a result along with the remaining input. The result can either be a success or a failure.
interface Parser<Token, R> {
fun apply(context: Context<Token>): Pair<Result<R>, Context<Token>>
sealed class Result<out R>(open val subResults: List<Result<*>> = emptyList())
data class Success<R>(val value: R, override val subResults: List<Result<*>> = emptyList()) : Result<R>(subResults)
data class Failure(val error: Any, override val subResults: List<Result<*>> = emptyList()) : Result<Nothing>(subResults)
}While the core library is generic, the nl.w8mr.parsek.text package provides utilities specifically for parsing CharSequence (e.g., String). These text-specific parsers simplify common tasks like matching characters, strings, or patterns.
For example, instead of writing a generic parser for a specific character, you can use the char function from the text package:
val digit = char { it.isDigit() }- String Output: All text parsers produce output as
String. This ensures consistency when working with text-based data. - Automatic Concatenation: When a parser produces a list of strings (e.g., from a
repeatorsepBycombinator), the result is automatically concatenated into a single string.
For example:
val digit = char { it.isDigit() }
val digits = repeat(digit)
// Input: "123abc"
// Output: "123" (list of digits concatenated into a single string)
digits("123abc")There are also predefined parsers for common patterns, such as digit, letter, number, and more.
val parser = digit
parser("5abc") shouldBe "5"val parser = number
parser("123abc") shouldBe 123val identifier = letter and some(letter or digit)
identifier("abc123") shouldBe "abc123"val parser = signedNumber
parser("-42") shouldBe -42
parser("17") shouldBe 17val digit = char { it.isDigit() }
val number = repeat(digit, min = 1) map { it.joinToString("").toInt() }
val comma = char(',')
val numberList = number sepBy comma
numberList("123,45,6") // Success: ([123, 45, 6], "")val openBracket = literal('[')
val closeBracket = literal(']')
val comma = char(',')
val number = repeat(char { it.isDigit() }, min = 1) map { it.toInt() }
val value = ref(::list) or number
val list: Parser<Char, List<Any>> = openBracket and (value sepBy comma) and closeBracket
list("[1,[2,3],4]") shouldBe
listOf(1, listOf(2, 3), 4)Parsek provides robust error handling through its Result class. A parser can return either:
- Success: Contains the parsed value.
- Failure: Contains an error message describing what went wrong.
For example:
val parser = char('a')
parser("b") // Failure: Expected 'a', but found 'b'.The combinator DSL in Parsek provides a powerful and expressive way to define parsers. It allows you to combine multiple parsers into a single cohesive unit, producing a structured output object. Additionally, the DSL provides mechanisms to handle failure responses explicitly, enabling custom error handling and recovery strategies.
combi: A DSL block for combining multiple parsers, handling their results and errors in a structured way.bind: Used inside acombiblock to run a parser and extract its result, or fail if the parser fails.
val keyValueParser = combi<Char, Pair<String, String>> {
val key = repeat(char { it.isLetterOrDigit() }).bind()
-char('=')
val value = repeat(char { it.isLetterOrDigit() || it == ' ' }, min = 1).bind()
key to value
}
keyValueParser("username=John Doe") // Success: ("username" to "John Doe")In this example:
- The
keyparser extracts the key (e.g.,username). - The
valueparser extracts the value (e.g.,John Doe). - The result is combined into a
Pairobject.
The combinator DSL also allows you to handle failure responses explicitly. This is useful when you want to provide custom error messages or fallback behavior.
val safeKeyValueParser = combi {
val key = repeat(char { it.isLetterOrDigit() }).bind()
if (key.isEmpty()) {
fail("Key cannot be empty")
}
-char('=')
val value = repeat(char { it.isLetterOrDigit() || it == ' ' }, min = 1).bind()
if (value.isEmpty()) {
fail("Value cannot be empty")
}
key to value
}
val result = safeKeyValueParser("=John Doe")
if (result is Parser.Failure) {
println("Parsing failed: ${result.message}")
}The bindAsResult and Result.bind methods allow you to customize how failures are handled within a parser. These methods are particularly useful when you want to propagate or transform failure results explicitly.
val repeatedParser = combi {
val list = mutableListOf<String>()
val parser = char { it.isLetter() }
while (list.size < 3) { // Ensure at least 3 elements are parsed
when (val result = parser.bindAsResult()) {
is Parser.Success -> list.add(result.bind())
is Parser.Failure -> fail("Only ${list.size} elements found, needed at least 3")
}
}
while (list.size < 5) { // Parse up to 5 elements
when (val result = parser.bindAsResult()) {
is Parser.Success -> list.add(result.bind())
is Parser.Failure -> break
}
}
list
}
repeatedParser("abcde") shouldBe listOf("a", "b", "c", "d", "e")
shouldThrowMessage<ParseException>("Combinator failed, parser number 3 with error: Only 2 elements found, needed at least 3") {
repeatedParser("ab") shouldBe listOf("a", "b")
}Parsek is designed to be lightweight and efficient for most parsing tasks. For very large inputs or performance-critical applications, consider benchmarking against alternatives. Contributions with benchmarks are welcome!
- JParsec - Java parser combinator library
- Parsec (Haskell) - Haskell parser combinator library
- KotlinParsec - Another Kotlin parser combinator library
We welcome contributions! To get started:
- Fork the repository and clone it locally.
- Set up the development environment:
- Create a feature branch for your changes.
- Submit a pull request with a clear description.
Please follow the Kotlin coding conventions and write tests for new features.
Parsek is licensed under the MIT License. See LICENSE for details.