Repository for .NET tests and demos.
This is a substring search tool that was written for a coding test. It's a CLI program that searches one or more input files for instances of a query string and then outputs the number of instances it found. The default query string is the name of the input file itself, without the extension (e.g. the input file "SomeFile.txt" will be searched for instances of the string "SomeFile").
The C# code is in the src/ subdirectory. The test/ subdirectory contains
input files for the built-in test suite. The file doc/NameCounter.pdf
explains some of the reasoning behind the search algorithm. There is an
explanatory diagram in doc/NameCounter.png and doc/NameCounter.svg that
might clarify how the algorithm uses its knowledge of repeated prefixes in
the query string to continue the search after finding a complete or partial
instance in the input stream. The doc/natural/ subdirectory contains code
documentation generated by NaturalDocs.
The program is invoked as follows:
NameCounter.exe [OPTION]... FILE...
Options may be combined in a single CLI argument (e.g. -ozq hello instead of
-o -z -q hello). Note that all options must be specified before the input files
in the command line. The following options are supported:
-- - End option parsing, treat remaining CLI arguments as input files.
-h - Print a help message and exit.
-o - Include overlapping instances of the query string (default: no).
-q STR - Use the specified query string (default: input filename without
extension).
-z - Treat nonexistent input files as empty instead of having them trigger
an error (default: no).
-T - Run the self test suite and exit. Note that the test suite uses relative
paths to the test input files and will probably fail if the program is
not run from the src/ subdirectory.
-
The program counts instances of the query string, not lines containing that string.
- Processing is fully character-oriented. Line endings in the input are treated like any other characters.
-
The program operates by strictly sequential stream processing. It reads one character at a time from the input and doesn't do any seeking, pushback, buffering or indexing.
- This is a simple, space-efficient and effective solution.
-
By default, the program counts instances of the plain filename, excluding any extension or directory names that were in the given input paths.
- Only the last extension is removed from filenames with multiple extensions.
-
The program assumes that input files use ASCII/UTF-8 encoding and that operating on C#
char:s (i.e. UTF-16 code units) is sufficient.- No specific requirements about input encoding support were stated and I didn't feel like getting fancy with it.
- The widespread use of UTF-8 and high-low surrogate encoding of UTF-16 make it unlikely that this approach will result in unintended behavior.
-
Non-overlapping search (where the string "ABBABBA" contains one instance of "ABBA", not two) is the default behavior.
- Overlapping search rarely makes a difference in natural-language text, can be non-intuitive and is often not intended by the user.
- Overlapping search can be enabled with the
-ooption if needed.
-
The solution is meant to be self-contained and platform/toolchain agnostic.
- This choice was made for ease of development and evaluation. This is not a program that will be used practically.
- There are only two code files. The program can be built with a simple compiler invocation.
- Only basic platform libraries are used.
- The code is path format agnostic. Path construction is delegated to the platform (System.IO.Path).
- The test suite is built into the progam and can be run via the
-Toption. - The program was developed with Mono on Debian, but I think (hope) it will compile and run on any recent .NET platform.
-
The implementation is mostly procedural, using static classes.
- I didn't see any compelling reason for an object-oriented design in a small program like this one.
-
The doc comments are in the NaturalDocs format.
- There doesn't appear to be any widely adopted standard code documentation format and tool for the .NET ecosystem, and NaturalDocs is what I'm familiar with.
-
Full disclosure:
- The array comparison in the unit test code is the only idea I got from a 3rd party (an old question on Stack Overflow).
- The CLI argument parser is adapted C++ code from another project of mine.
- The rest of the code is my own ex nihilo creation, only guided by Microsoft's API documentation.