Skip to content

supplemental modifications to PR 342#346

Open
btmlnsn wants to merge 11 commits intoumyelab:masterfrom
btmlnsn:testing
Open

supplemental modifications to PR 342#346
btmlnsn wants to merge 11 commits intoumyelab:masterfrom
btmlnsn:testing

Conversation

@btmlnsn
Copy link
Contributor

@btmlnsn btmlnsn commented Jan 30, 2026

@yujiahu415 @ruck94301 There was a fair amount touched upon in the comments from PR342. Some productive conversations to be had and modifications to be made. I'd rather err on the side of concision here, and group similar criticisms worth addressing and briefly explaining. That said, I'll acknowledge that certain points may read "terse" without meaning to be at all. Questions are welcome, though I'd like to note that this PR isn't a significant departure from LabGym Pre-342. Wanted to identify this observation from the top, in an effort to be conscientious of everyone's time (especially Henry's). I'll get started.

  • Selftest functionality: Somehow, I missed the selftest PR (Implement selftest #332 ) completely - didn't realize this had been added, which is my bad. There were quite a few PR's at the end of the year and somehow, this either slipped by without me realizing it, or I forgot about this feature. PR is well-written and I think this is a great feature. Definitely worth maintaining functionality for this. To that end, I've moved the unified "tests/" folder from testing: reorg of test infra for maintainability and scalability #342 from outside of the internal package, to inside of the package. I found it useful to unify the tests, and my immediate instinct was to place this outside of the rest of the LabGym package, keeping it structurally separate; this was an instinct for no other reason than erring on the side of reducing what I'll call "internal package surface area". Ultimately, this folder should be unified, and placing it either inside of or outside of the package doesn't make much a difference. Since selftest currently relies on the tests being packaged internally, I have no problem moving it inside. I'm more-so interested in unifying the tests under one dir, and not particularly opinionated about its placement. It has been placed here, and selftest works. I'm under the impression that a pyproject.toml and a pyinstaller ".spec" can be relatively easily configured to "wrap" tests alongside the LabGym package within a unified "LabGym.app". Implementing it this way and exploring that any deeper is obviously an un-necessary cost and not a point I'm interested in exploring or commenting on much more deeply, but I wanted to note its possibility in case a compelling reason to do so presents itself when a PyInstaller release is nearer. Honestly, 0 preference either way, so I've moved "tests/" back into the package.

  • Noxfile.py "pip installs": John is correct in identifying that the pytest installation line has recently become redundant; I've removed that line. However, in looking at the successive commits and tested Git actions builds in testing: reorg of test infra for maintainability and scalability #342 , the builds now fail without a "pip install" for Coverage inside of noxfile. This is because the pytest.ini now includes two new coverage filters related to detectron2's import of cv2. Not ultimately necessary, but advantageous in my opinion. It's been documented within the related notes that John wrote in-repo that this warning is non-urgent and non-actionable. As more unit tests are written, I thought it would be useful to deliberately suppress this well-documented warning in an effort to preserve signal in test output. This is a common practice, and requires coverage to be installed within noxfile. Worth the cost in my opinion.

  • John's Pylint Scripts: Per John's request in testing: reorg of test infra for maintainability and scalability #342 , I've removed the contents of the "scripts" folder. These aren't scripts that I used; probably clutter if John is the only person using them. I've got my own "devtools" scripts outside of the repo. Happy to share anything I've found useful, but everyone's got their own preferences, and so John's probably right that it's best to exclude these going forwards. My previous PR set aside a space for this in an effort to be courteous of these scripts, but I can remove these now.

  • PyTest markers: While I hear John's points about not including these in an anticipatory way, I have unit test branches that I'm actively working on that make use of these; I hope to propagate and merge these unit tests through successive PR's once they're ready (which will be very soon). For that reason, I'd politely prefer to on hold onto these for my PR - 342 was always meant to be the "groundwork" for unit tests that I'm writing that I thought I'd indicated in its description would be coming shortly after. Not a significant cost. In fact, in reviewing, I've used the gui marker to mark some of John's existing unit tests. My recommendation would be to hold onto these for now, as opposed to removing and then re-merging very shortly in a successive PR.

  • Conftest.py WX Typo: The LabGym.ui typo is my mistake completely on this fixture - earlier, I was experimenting with a different module structure (nothing worth diving into, just kind of a sandbox thing), and I mistakenly carried over this path simply because it was top-of-mind. I've corrected the typo and the wx_app fixture should now import correctly and work properly. More on this next.

  • Conftest fixtures: Of the 4 fixtures I added in 342, in my in-progress unit tests, I've actually modified my implementations for 2 of them. I'll prune these 2 for now (the "sampling greyscale" and "sampling video frames"), and likely merge those newly modified fixtures with their respective unit tests. Further, the mock_argv has utility as-is within one of the unit tests, which I have edited to include. It will be useful in future unit tests. Further, the wx_app fixture now works properly and is used in new unit tests to come, quite heavily. I'll ask that everyone "humors" me on this inclusion in this PR, and support these 2 fixtures as the basis for more fixtures to come as more unit tests become included.

That covers everything I felt was worth modifying or discussing code-wise. Now for structure, briefly.

  • Hierarchization: While I hear John's point, there will be more developers contributing to LabGym. I would like to invite some members of the lab to begin contributing, and I believe that the hierarchization I presented/proposed was not excessive (obviously a subjective argument). I created folders in anticipation of the development (and hopefully, merging) of unit test files that I'm actively working on, with plans to propose very shortly. That said, I feel that structure implies helpful, functional meaning to the files being added, which I feel is useful towards supporting our lab's goal of inviting more contributors to work on LabGym in an accessible way. I would argue that creating functional "bins" for new tests greatly reduces cognitive load for all contributors approaching the repo, especially those approaching the repo for the first time. There is a tentative plan for the structure, and it is well-supported. I acknowledge the drawbacks of creating excessive structure and hierarchical structures beyond their utility, but as it stands currently, I have created 16 new files for unit tests, and there are likely to be more, in support of a well-organized and fleshed-out unit test suite (with additional integration tests to come as well). Creating the structure in anticipation of these will allow these to be rolled-out in a more organized manner, in my opinion. In short, it doesn't currently "break" anything, and is already useful in successive PR's to come. I would rather focus those PR's on the unit tests themselves, rather than an on-the-fly structural change to the tests folder. Again, this was intended to be a "groundwork" PR for more to come. I respect John's opinion on the matter, full stop. This is mine.

  • "Confusing/Scattered" Test Config: What I had meant by noting that test structure was "confusing" or that test config is "scattered" is that, pre-342, the existence of redundant testing folders was both confusing and presented a scattered test config for contributors. I likely described this poorly, so I apologize. A lab member asked me about the reason for this redundancy, and I, myself, wasn't sure. I used two different terms (confusing and scattered) to call out the same redundancy, and I felt this redundancy was productive to eliminate via creation of a unified tests folder, and then have pytest.ini "point" at a consistent, unified folder. This felt less "scattered" - as opposed to having pytest parse through subfolders for "test_" files. Ultimately, more about reducing cognitive load for potential LabGym contributors than anything else. With this PR's in-package implementation, it operates functionally the same. Feels like a worthwhile change to me.

I think that wraps up everything I wanted to touch on. I'm open to incorporating any additional concrete changes anyone feels are critical to address within this PR, and I'm open to further discussion where it's helpful. With full respect and appreciation for the thoughtful review + intent behind the comments, as well as everyone's time, my only ask is that we keep scope and effort proportional for any further discussion or modifications, given that this PR is largely 'groundwork' for more tests "in the pipe", and not a major behavioral departure from LabGym pre-342 (everything should still "work").

@ruck94301
Copy link
Contributor

ruck94301 commented Feb 4, 2026

I'm finding it hard to review 342 + 346 together.
Probably there's a good way, but, it's beyond my GitHub & git abilities, sorry.
(OTOH, maybe it is genuinely hard, which would explain why you prefer not to revert 342 and produce a single unified PR.)

I think since you now know to check the selftest feature (which I failed to guard with tests),
and since the CI tests pass,
I have no reason to "hard"- object to PR 346.

I don't see yet how the filters in pytest.ini lead to requiring nox install of "coverage".
But like I admitted, I'm not confident I'm doing a good job reviewing.

I understand you want to retain the hierarchicalization with, under "tests", separate "integration", "notes", and "unit" subdirs.
I will again "soft-" object.
What are you going to do with "test_load.py". Technically it's more an integration test than a unit test isn't it, so now what? Park it by itself (for now) under "integration"?
Some of the existing test modules might have some integration tests and some unit tests, shall we break them up into multiple files to align with intended separation?
Some tests might be debatable -- is it a true unit or a true integration test or something in between?

The marker mechanism (which you're preparing to use, good!!) is actually the better way to create "sets" of tests that can be treated differently. Seems like by adding the novel extra hierarchicalization (extra subdirs), we are trading one mild confusion that could be handled with a convo with another mild confusion that could be handled with a convo. :)

My soft-objection doesn't mean I'm saying you're categorically wrong, of course. It means probably you and I are either optimizing for different value systems, or, we are estimating the "cost" differently based on our life experiences. Where confusion, maintainability, code-smells are a type of future cost.

Hope that makes sense.

@ruck94301
Copy link
Contributor

Interesting discussion with Gemini.
It confirms Bobby's position in that some BIG (emphasis mine) projects separate tests into tests/unit/ and tests/integration/.
So it's inaccurate for me to call that "novel".

I still think there's a lot of runway before we "Phase 3" below, but that's subjective.

John asks: How big is big?
Gemini responds:
If your project is small but growing, you don't have to jump straight to a complex directory structure. You can use a phased approach:

  • Phase 1: Just starting. Single file or small module. Keep everything in a flat tests/ folder. Distinguish them by filename: test_unit_core.py vs test_int_api.py.
  • Phase 2: Adding a DB/API. Multiple modules, one external dependency. Use Pytest Markers. Label slow tests with @pytest.mark.integration. This lets you run pytest -m "not integration" to stay fast while coding.
  • Phase 3: Professional/Shared. Multiple contributors or a CI pipeline. | Full Directory Separation. Move to tests/unit/ and tests/integration/. This is the clearest "self-documenting" structure for others.

So maybe the disconnect that I see LabGym more like Phase 2, and Bobby sees LabGym more like Phase 3.

John asks:
But some tests are somewhere in between true unit tests and true integration tests. What about them?

Gemini responds:
Those "in-between" tests are often the most valuable ones in a real-world project. In testing literature, they are usually called Component Tests or Sociable Unit Tests.

They feel like unit tests because they are relatively fast and test a single logic flow, but they feel like integration tests because they don't mock every single neighbor.

The "In-Between" Vocabulary

  • Sociable Unit Test A unit test that uses "real" helper classes or small local dependencies instead of mocks. Mocks can be "liars." Using the real helper ensures that if the helper breaks, the test fails, preventing a false green light.
  • Component Test Tests a whole module or "component" (e.g., an entire API endpoint or a service class + its database layer). It's easier to refactor internal code if your tests target the interface of a module rather than every tiny function inside it.
  • Subcutaneous Test A test that runs just "under the skin" of the UI (e.g., hitting your Python API directly without using a browser). It provides the confidence of an End-to-End (E2E) test but runs significantly faster.

Where do they go?
If you have moved to the subdirectory structure (tests/unit/ and tests/integration/), these "in-between" tests usually cause a bit of a dilemma. Here is how most teams handle them:

  1. The "Speed" Rule: If the test runs entirely in memory (no database, no network) and takes milliseconds, put it in tests/unit/. Even if it hits 3 or 4 real classes, it's still a "fast" unit of work.
  2. The "Dependency" Rule: If the test requires a database, a temporary file, or a network call, put it in tests/integration/. Even if the logic is simple, the environment is what makes it an integration test.
  3. The "Middle Child" Folder: If your project is large, you can literally create a third folder: tests/functional/ or tests/component/. This is common in microservices where you want to test the service in isolation but with a real (usually containerized) database.

Which "Shape" are you building?
Depending on how many "in-between" tests you have, you might be following a different testing philosophy:

  • The Testing Pyramid: Focuses heavily on "Solitary" unit tests (lots of mocks).
  • The Testing Trophy: Focuses heavily on Integration/Functional tests. This philosophy (popularized by Kent C. Dodds) argues that "in-between" tests give you the best "bang for your buck" because they actually test if the parts work together.
  • The Testing Honeycomb: Common in microservices; it ignores tiny unit tests and focuses almost entirely on Integration/Component tests to ensure the service's contract is solid.

@btmlnsn
Copy link
Contributor Author

btmlnsn commented Feb 6, 2026

@ruck94301 thanks again for the follow-up and for raising the initial concern about LabGym --selftest. Just to be sure, I've added dedicated unit tests in this PR via commits 44e7940, 9640522, and 4b72d76 that explicitly exercise the selftest code path, so that functionality is now protected going forward. CI is green with that in place.

On the broader structure question: I hear your soft objection. For now I'm going to keep the current tests/unit + tests/integration layout in this PR since it directly supports the next batch of tests I'm actively working on, with markers remaining the primary mechanism for selecting subsets.

I'd be happy to revisit structure once more tests are in place and we have a clearer sense of how to support the in-progress expanded test-suite. At that point, it feels like we'll have better signal to judge whether flattening or reshaping things would be beneficial (which would be quite simple to do if a need is established).

I appreciate the thoughtful perspective, and I'll continue to keep this top-of-mind. With the selftest now covered and CI passing, I think this PR is in a good place to land for the reasons I've described above, and I'm happy to revisit broader structural questions at a later point once more tests are in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants