demodotnet

Repository for .NET tests and demos.

NameCounter

This is a substring search tool that was written for a coding test. It's a CLI program that searches one or more input files for instances of a query string and then outputs the number of instances it found. The default query string is the name of the input file itself, without the extension (e.g. the input file "SomeFile.txt" will be searched for instances of the string "SomeFile").

The C# code is in the src/ subdirectory. The test/ subdirectory contains input files for the built-in test suite. The file doc/NameCounter.pdf explains some of the reasoning behind the search algorithm. There is an explanatory diagram in doc/NameCounter.png and doc/NameCounter.svg that might clarify how the algorithm uses its knowledge of repeated prefixes in the query string to continue the search after finding a complete or partial instance in the input stream. The doc/natural/ subdirectory contains code documentation generated by NaturalDocs.

Usage Instructions

The program is invoked as follows:

NameCounter.exe [OPTION]... FILE...

Options may be combined in a single CLI argument (e.g. -ozq hello instead of -o -z -q hello). Note that all options must be specified before the input files in the command line. The following options are supported:

-- - End option parsing, treat remaining CLI arguments as input files.

-h - Print a help message and exit.

-o - Include overlapping instances of the query string (default: no).

-q STR - Use the specified query string (default: input filename without extension).

-z - Treat nonexistent input files as empty instead of having them trigger an error (default: no).

-T - Run the self test suite and exit. Note that the test suite uses relative paths to the test input files and will probably fail if the program is not run from the src/ subdirectory.

Assumptions, Priorities and Design Choices

The program counts instances of the query string, not lines containing that string.
- Processing is fully character-oriented. Line endings in the input are treated like any other characters.
The program operates by strictly sequential stream processing. It reads one character at a time from the input and doesn't do any seeking, pushback, buffering or indexing.
- This is a simple, space-efficient and effective solution.
By default, the program counts instances of the plain filename, excluding any extension or directory names that were in the given input paths.
- Only the last extension is removed from filenames with multiple extensions.
The program assumes that input files use ASCII/UTF-8 encoding and that operating on C# char:s (i.e. UTF-16 code units) is sufficient.
- No specific requirements about input encoding support were stated and I didn't feel like getting fancy with it.
- The widespread use of UTF-8 and high-low surrogate encoding of UTF-16 make it unlikely that this approach will result in unintended behavior.
Non-overlapping search (where the string "ABBABBA" contains one instance of "ABBA", not two) is the default behavior.
- Overlapping search rarely makes a difference in natural-language text, can be non-intuitive and is often not intended by the user.
- Overlapping search can be enabled with the -o option if needed.
The solution is meant to be self-contained and platform/toolchain agnostic.
- This choice was made for ease of development and evaluation. This is not a program that will be used practically.
- There are only two code files. The program can be built with a simple compiler invocation.
- Only basic platform libraries are used.
- The code is path format agnostic. Path construction is delegated to the platform (System.IO.Path).
- The test suite is built into the progam and can be run via the -T option.
- The program was developed with Mono on Debian, but I think (hope) it will compile and run on any recent .NET platform.
The implementation is mostly procedural, using static classes.
- I didn't see any compelling reason for an object-oriented design in a small program like this one.
The doc comments are in the NaturalDocs format.
- There doesn't appear to be any widely adopted standard code documentation format and tool for the .NET ecosystem, and NaturalDocs is what I'm familiar with.
Full disclosure:
- The array comparison in the unit test code is the only idea I got from a 3rd party (an old question on Stack Overflow).
- The CLI argument parser is adapted C++ code from another project of mine.
- The rest of the code is my own ex nihilo creation, only guided by Microsoft's API documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
doc		doc
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

demodotnet

NameCounter

Usage Instructions

Assumptions, Priorities and Design Choices

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

demodotnet

NameCounter

Usage Instructions

Assumptions, Priorities and Design Choices

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages