An invisible dependency checker.
This project is still very much incomplete and not functional and stuck in procastination. I'll probably rewrite it in, wait for it, Rust, as an excuse to learn it and correct some mistake like doing manual custom parsing instead of using Json Schema and also use ripgrep. It is left here publicly as my portfolio
doc-lock is a tool used to test if changes across multiple code repositories or external projects will remain valid.
Often times it is impossible to make sure that some values such as environment variables' names and documentation match what they are supposed to represent.
Unless a developer happens to "think about it" these errors will ship into production.
One solution is to use template frameworks such as cookiecutter or copier but this not always possible and even projects who use those could still be dependant on other invisible dependencies.
Thus a solution is needed to quickly asses that these invisible dependencies remain valid through the evolution of those code repositories.
Typical use cases:
-
Keeping documentation up to date
-
Detecting breaking changes across repositories
-
Checking if API promises are broken
-
Foreseeing unplanned work ahead of time
An invisible dependency is any implicit dependency between two or more software components, data, behavior or piece of knowledge whose incoherency is made evident only by causing malfunction.
Once one is made aware of this concept, it then becomes obvious that these are everywhere:
- Uncaught exceptions: a callee expects the exception it raises to be managed but the caller has no way to explicit way to know unless it crashes the program.
- Race conditions: a resource is expected to be assured mutual exclusion but which part of the code will be called from multiple threads is non explicit in most programming languages.
- Typing is a way to manage hidden dependencies. For example before Python had type annotations, projects used to have a bunch of tests ony dedicated to catch errors caused by typing.
- Memory management systems such as smart pointers, garbage collectors and borrowers are a way not to forget freeing memory, which could only otherwise detected when a memory leak is detected.
- Automated tests are often used to make sure multiple components work harmoniously to produce the expected results, which otherwise would be impossible without trying the software.
- CIs and other automation systems are often broken because an external source changes or ceases to exist.
- A library used to interface an embedded device ceases to function when the firmware written in C changes one of its
enum.
All of these have one point in common: things that change without the developer having the means to easily notice them before a malfunction occurs.
That is not to say that these can or should be solved with doc-lock!
doc-lock is not a good solution. It is merely a good compromise, sometimes.
While it is very tempting to replace unit tests and sound designs with a lock that only says "don't change that or else" because it is cheaper, it will only add viscosity to developers.
Don't be that guy who talks loud and use this tool to police what your developers do because you do not trust them.
doc-lock should be seen only as a last resort safety net or a temporary scaffold.
Every lock you put should be seen as adding to your technical debt but with less interest.
- DL should be seen as a complementary to automated tests and therefore should be fast.
- Ease of use with through progressive disclosure of complexity.
- Having two projects blocking each other could be a problem and therefore a mechanism to avoid that should be provided and obvious.
- For this reason having tags inside the files directly is probably a bad idea and would negatively impact performance
- There should be a lock file in addition of doc-lock file
- There should be a way to update the lock file from the CLI
- Allow for both "gross" bulk verification (e.g. whole repository or file) and more sophisticated verification (e.g. regex)
- Support multiple sources (e.g. git, mercurial, http, ssh, filesystem)
- Extensible verifiers
- Extensible source fetchers
- Caching?
Locks should not be in a project's file embedded source files in comments because it would impact performances by searching through all the files. It also could be impacted by unrelated symbols inside the source files, which could only be partially solved by a syntax aware syntax. Therefore, the program should read from files specified from its args.
There should be two different files: one configuration file written by and for humans, and another lock file written by the program.
Both should be in TOML since it has seen great success in Python and Rust and other major projects.
No archaic ini files.
No loose and over the top verbose JSON.
No "in which included file does that come from" YAML file with a 1000 pages language spec and 3000 pages errata.
The config file should have the following concepts:
sourcesrecordsfilterstest-tagstests
The purpose of the lock file is to keep track of the state of the strings extracted from extractors each sources, when needed.
The typical usage is to check if a line has changed since the last time the program ran and find when if coming from a VCS.
A test-tag is a way to regroup logically a set of tests.
Tests should be declared at the top level to not only make clear what the config is about but also to avoid creating new ones by mistake.
A test can have zero or multiple test-tags.
A test-tag with no associated test is considered invalid while a test with no declared group belongs to test-groups.default.
A source is a way to access and reference a file tree through the specified fetcher (sources.<fetcher>).
If the source points to a specific file then it will be represented as that file at the root of the file tree.
A source should have the information necessary to fetch a specific or last version of a file tree.
A source that produces an empty file tree is considered invalid.
A record extracts a list of strings from a file from the file tree of a source or specifies them directly through its lines attribute.
For example a record could simply split a file by new lines or only a few specific lines.
If the file is binary then the list of strings will be a string of hexadecimal numbers.
An extractor is specified by a string in the form of <extractor>[: <spec>].
For example:
whole-text-filewhole-binary-filelines: 136:140, orlines: :10to take lines up to line 10 orlines: 10:to take lines starting at 10. All list of strings are base 1.bytes: 127:255
The way by which a source fetches its source tree:
- Filesystem location (
sources.fs) - Git (
sources.git) - SSH (
sources.ssh) - HTTP[S] (
sources.http) - etc.
The fetcher is specified by the sources.<fetcher name> field.
Takes the list of strings output of an extractor or another filter and transforms it into another list of strings.
All list of strings are based 1.
Filters can be defined at the top under filters.filter-name and be referenced as $filter-name.
The filters fields are composed of a list of strings specifying the filters in the form of:
<type>: <spec>.
For example:
range 2:15will select lines 2 to 15.trimwill remove all leading and trailing white spaces.trim: "'will remove all trailing and leading double and single quotes.re: <spec>will apply to each individual line a regular expression following Python'srelibrary to each line and apply<spec>as its arguments and filter out those that don't match.inverse-re: ^bazooka gum$will filter out matching lines.remove-re: \w+\s*=\s*will remove every match of the pattern from all stringsonly-re: \w+will only only leave matches in all stringsa-re:(5) <spec>: will let through all lines that have a match and the 5 following lines if possibleb-re:(5) <spec>: will let through all lines that have a match and the 5 preceding lines if possiblec-re:(5) <spec>: will let through all lines that have a match and the 5 preceding and following lines if possibleremove-emptywill remove all empty linessh: <shell-command params>will feed the lines as the standard input and outputs the lines produced by the shell command.
An empty list fed into a filter should result in a failed test-chain.
A test specifies a set of records a condition to succeed.
A condition can be one of:
equality: succeeds if all the given records are equal line by lineset-equality: Likeequalitybut the order of the lines and duplicates don't matter.count: 5: succeeds if all the given records havenlines
[tests.test-name]
records = [
"record1",
"record2",
]
condition = "equality"Typical conditions of success are:
- Number of strings
- Strings comparisons
- Regular expressions
- Lines changes since a certain revision in the VCS