Skip to content

Allow steps to contain both DataTable and DocString#592

Closed
kruczekrobert wants to merge 1 commit into
cucumber:mainfrom
kruczekrobert:allow-step-datatable-docstring
Closed

Allow steps to contain both DataTable and DocString#592
kruczekrobert wants to merge 1 commit into
cucumber:mainfrom
kruczekrobert:allow-step-datatable-docstring

Conversation

@kruczekrobert
Copy link
Copy Markdown

🤔 What's changed?

Allow a Gherkin step to contain both a DataTable and a DocString.

Supported rules:

  • zero or one DataTable
  • zero or one DocString
  • both may appear on the same step
  • repeated DocStrings are rejected
  • repeated DataTables remain rejected by the grammar/table parsing model

This PR updates:

  • shared grammar
  • generated parsers
  • AST builders
  • pickle compilers
  • C, Dart, .NET and Elixir internals that previously stored only one step argument
  • acceptance testdata for mixed DataTable/DocString step arguments
  • bad testdata for repeated DocString
  • README AST documentation

⚡️ What's your motivation?

This adds support for step definitions that need both structured tabular data and a free-form payload on the same step.

The message model already has separate dataTable and docString fields on Step / PickleStepArgument, so this change makes the grammar and implementations preserve both values.

I really wanted to use e.g.

 Scenario: Create user
    When I send "POST" request to "/api/users"
      | kind   | name          | value                 |
      | query  | dryRun        | true                  |
      | query  | source        | godog                 |
      | header | Content-Type  | application/json      |
      | header | X-Request-ID  | test-request-123      |
      """
      {
        "name": "Bob",
        "email": "bob@example.com",
        "roles": ["admin", "user"]
      }
      """
    Then response status should be 201

🏷️ What kind of change is this?

  • ⚡ New feature (non-breaking change which adds new behaviour)

♻️ Anything particular you want feedback on?

I would appreciate feedback on supporting both argument orders (DataTable then DocString, and DocString then DataTable) versus limiting the grammar to one preferred order.

📋 Checklist:

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • I've changed the behaviour of the code
    • I have added/updated tests to cover my changes.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

@davidjgoss
Copy link
Copy Markdown
Member

davidjgoss commented May 15, 2026

Thanks for putting this together @kruczekrobert.

I'll wait for other maintainers to weigh in, but I don't think we should add this. It's a fairly sizeable change just here, it'll necessitate more upfront and ongoing work in various Cucumber runners and formatters, and I don't see a major benefit that justifies it given one can (and always could have) have multiple steps and compose state under the hood:

  Scenario: Create user
    Given a request body
      """
      {
        "name": "Bob",
        "email": "bob@example.com",
        "roles": ["admin", "user"]
      }
      """
    When I send "POST" request to "/api/users"
      | kind   | name          | value                 |
      | query  | dryRun        | true                  |
      | query  | source        | godog                 |
      | header | Content-Type  | application/json      |
      | header | X-Request-ID  | test-request-123      |
    Then response status should be 201

(Also, there are other tools which are better suited to testing APIs than Cucumber is.)

@kruczekrobert
Copy link
Copy Markdown
Author

kruczekrobert commented May 16, 2026

Thanks for putting this together @kruczekrobert.

I'll wait for other maintainers to weigh in, but I don't think we should add this. It's a fairly sizeable change just here, it'll necessitate more upfront and ongoing work in various Cucumber runners and formatters, and I don't see a major benefit that justifies it given one can (and always could have) have multiple steps and compose state under the hood:

  Scenario: Create user
    Given a request body
      """
      {
        "name": "Bob",
        "email": "bob@example.com",
        "roles": ["admin", "user"]
      }
      """
    When I send "POST" request to "/api/users"
      | kind   | name          | value                 |
      | query  | dryRun        | true                  |
      | query  | source        | godog                 |
      | header | Content-Type  | application/json      |
      | header | X-Request-ID  | test-request-123      |
    Then response status should be 201

(Also, there are other tools which are better suited to testing APIs than Cucumber is.)

Yes, I know your way, but it's only a workaround, not a solution.

Using tables and docstrings together is convenient, simple, and reduces the complexity of steps for future developers (and current developers, as the changes are backward compatible and may or may not use them).

These changes should have been implemented years ago but were abandoned. I often talk to developers who base their software on Cucumber tests, build toolkits or integration testing frameworks within their own organizations, and they all complain about the same problem – the lack of tables and docstrings within a single step.

I believe these changes are important. It doesn't matter if there are other ways to test the API; this is an open-source library that should allow for building many other solutions with a focus on simplicity. The lack of this feature killed this approach, and many people unnecessarily built complex steps just to implement the workaround you showed.

Time to fix the mistake, luckily it's not painful :)

If we can close this pull request successfully, the next thing I will add is a pull request in the godog library (2.6k stars) to support this approach. Finally: this change is not big at all, the number of changed lines is just the result of re-generating files, the changes in logic themselves, despite how many languages ​​they affect, are very simple.

@davidjgoss
Copy link
Copy Markdown
Member

You’ve made a lot of assertions there, nothing that changes my position though. I suspect that debating the finer points won’t be time well spent for me so I’ll leave it there.

@kruczekrobert
Copy link
Copy Markdown
Author

kruczekrobert commented May 16, 2026

cucumber/website#264

I'm adding another related thread.

Could you jump in? mpkorstanje

@kczarnik
Copy link
Copy Markdown

I’ve run into this a lot at work. Honestly, lacking support for both just overcomplicates the steps and kills readability (you end up having to constantly look for how the previous step correlates with the next).

@meteran
Copy link
Copy Markdown

meteran commented May 16, 2026

It is implemented in python behave(https://github.com/behave/behave) and I used id widely. When I started developing in golang I wanted to do the same in godog lib, because it was really handy. I was really disappointed that it's not available there. I had to write some workarounds on my own, which are not readable at all.

@davidjgoss, your solution works just for one request in a scenario. One request can be checked in unit tests. We expect something more from scenario-based tests. It really makes tests more readable.

+1 for this change. Finally someone did it.

@mpkorstanje
Copy link
Copy Markdown
Member

mpkorstanje commented May 17, 2026

I'm not necessarily opposed to a change in this general direction, but I don't see a good reason to merge this specific pull request right now. Mostly for the reasons already stated by David.


Additionally, looking at the prior interest in the Cuke google group, Cucumber Attic and Cucumber GitHub orgs, there doesn't seem to be much.

It's also worth noting that the issues attached to this PR are not feature requests. The cucumber/godog#89 issue is about using DocString and Datatable with multistep definitions rather than using both DocStrings and Datatables in a single step. Likewise cucumber/website#264 asks for explicit clarification in the documentation.

So if there is an interest, none have thought to request this feature before this PR. So I don't see a reason to make any sort of decision on this right now.


From a technical perspective, the proposed change also has problems. While the Step has both a docString and dataTable fields they're mutually exclusive. Reusing these to provide both is clever, but then opens up several questions:

  1. What order were the docString and dataTable originally in? This is important when rendering the pretty formatting the GherkinDocument and when providing arguments to steps.
  2. If there is effectively a list of step arguments, why not provide multiple docstring and table arguments?
Feature: Example
  Scenario: Multiple doc strings
     """
     { a }
     """
     | key | value |
     | hello | world |
     """
     { b }
     """
     | key | value |
     | hello again | world |
     """
     { c }
     """
  1. If there is a list of step arguments, how are multiple data table arguments distinguished from each other?
Feature: Example
  Scenario: Multiple doc strings
     | key | value |
     | hello | world |
     // Table 2
     | key | value |
     | hello again | world |

And please note that these are not suggestions to improve this PR.


Given the rather widespread impact of the proposed change on the broader ecosystem, I'd like to have some consensus between the different Cucumber implementations that use the Gherkin parser. Not just Cucumber-JS,-JVM,-Ruby, but also Reqnroll and other users of the Gherkin library.

The most appropriate way to do this would be to create an issue, not a pull request, and build a consensus with these parties.

@kruczekrobert
Copy link
Copy Markdown
Author

I'm not necessarily opposed to a change in this general direction, but I don't see a good reason to merge this specific pull request right now. Mostly for the reasons already stated by David.

Additionally, looking at the prior interest in the Cuke google group, Cucumber Attic and Cucumber GitHub orgs, there doesn't seem to be much.

It's also worth noting that the issues attached to this PR are not feature requests. The cucumber/godog#89 issue is about using DocString and Datatable with multistep definitions rather than using both DocStrings and Datatables in a single step. Likewise cucumber/website#264 asks for explicit clarification in the documentation.

So if there is an interest, none have thought to request this feature before this PR. So I don't see a reason to make any sort of decision on this right now.

From a technical perspective, the proposed change also has problems. While the Step has both a docString and dataTable fields they're mutually exclusive. Reusing these to provide both is clever, but then opens up several questions:

  1. What order were the docString and dataTable originally in? This is important when rendering the pretty formatting the GherkinDocument and when providing arguments to steps.
  2. If there is effectively a list of step arguments, why not provide multiple docstring and table arguments?
Feature: Example
  Scenario: Multiple doc strings
     """
     { a }
     """
     | key | value |
     | hello | world |
     """
     { b }
     """
     | key | value |
     | hello again | world |
     """
     { c }
     """
  1. If there is a list of step arguments, how are multiple data table arguments distinguished from each other?
Feature: Example
  Scenario: Multiple doc strings
     | key | value |
     | hello | world |
     // Table 2
     | key | value |
     | hello again | world |

And please note that these are not suggestions to improve this PR.

Given the rather widespread impact of the proposed change on the broader ecosystem, I'd like to have some consensus between the different Cucumber implementations that use the Gherkin parser. Not just Cucumber-JS,-JVM,-Ruby, but also Reqnroll and other users of the Gherkin library.

The most appropriate way to do this would be to create an issue, not a pull request, and build a consensus with these parties.

I don't think this change should be interpreted as introducing a general list of step arguments.

The proposed grammar still keeps the model deliberately constrained:

  • at most one DocString
  • at most one DataTable
  • no repeated DocStrings
  • no repeated DataTables
  • no arbitrary ordered list of step arguments

That distinction matters because DocString and DataTable are not two instances of the same generic concept. They are two different, already named argument types with different shapes and different semantics.

A DocString is a single block scalar. Multiple DocStrings would not have obvious semantics without adding names, ordering rules, or a list model. If a step needs several independent text payloads, that is usually better expressed as separate steps or as one structured payload.

A DataTable is a single tabular argument. Multiple DataTables are even less clear: two adjacent tables have no name or type-level distinction, and two tables with the same shape are not self-describing. Supporting that would require additional syntax or a true ordered/named argument collection.

Mixing several DocStrings and several DataTables would therefore be a different feature. It would require changing the message model, pretty formatting rules, runner APIs, and migration expectations for consumers. I agree that would be a much broader design discussion.

The narrower case of one DocString plus one DataTable is different. It maps to the two fields that already exist today: docString and dataTable. Consumers can distinguish them by type/name, not by positional index.

On ordering: I agree this is the main semantic question.

If both syntactic orders are accepted, the AST still contains locations for both arguments, so a formatter that wants to preserve source order can derive it from locations. For execution, however, I do not think this needs to become a positional argument list. Since the two arguments have different named fields, runners can expose them by type/name rather than by order.

That said, if preserving order in Pickles is considered required, I would prefer constraining the grammar to one canonical order instead of introducing a general ordered argument list. For example, allow only DocString followed by DataTable, or only DataTable followed by DocString. That keeps the feature small and avoids implying support for repeated arguments.


Regarding the pull request itself: I’m happy to continue refining it through the issue/discussion we open. That said, I do not see this as a breaking change; it enables an additional opt-in use case without changing existing feature files or existing single-argument steps. Since it can also unblock richer APIs in downstream libraries, I would prefer to keep the PR open and continue moving it toward an acceptable implementation rather than defer the change indefinitely.

@mpkorstanje
Copy link
Copy Markdown
Member

@kruczekrobert I think that would all be better discussed outside of this PR and that would have been a better starting place. As I said before, I'd like to have some consensus between the different Cucumber implementations that use the Gherkin parser.

@kruczekrobert kruczekrobert closed this by deleting the head repository May 17, 2026
@kruczekrobert
Copy link
Copy Markdown
Author

kruczekrobert commented May 17, 2026

Pull request will be re-opened after discussion with issue: #593

There is still a branch on my fork: https://github.com/kruczekrobert/gherkin/tree/allow-step-datatable-docstring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants