-
Notifications
You must be signed in to change notification settings - Fork 0
Validating
It's easy to do validation in Data::Tubes. As a matter of fact, everything can be a validator - as long as you put the validation in some sub reference, it will do the job. Sometimes it's just easier to can a few tests if they have a certain shape... that's what the toolkit offers.
Well... validator might not be the best choice in many cases, possibly filter might have been more indicated. Whatever, every function applies some validation rules over the input record, and rejects it (i.e. it does not propagate it any further) if the rules are not complied. So... they are validators indeed.
One of the available factories is special and closer to what you might expect from a real validator though: thoroughly. This will allow you to collect validation info on the way, so that you can eventually present the results to... a log, the end user, wherever you want.
Most validation rules are supposed to be applied as early as possible, i.e. before parsing. This allows you to quickly get unuseful inputs out of the way, e.g. empty or comment lines.
This factory simplifies writing a filter function to assess whether the input data is good enough or not. It's mostly useful when you can express the validation rules through regular expressions; to keep it simple, it will just silently ditch records that do not comply.
The factory interface is a bit different from most of the factories;
instead of key-value pairs, it mainly accepts a sequence of validators,
optionally followed by a hash referece with options. Think pipeline, the
principle is the same.
Time for an example. We will restart from the same example as in article Fallback, but put a validation step before parsing so that we will avoid exceptions as in the following example:
#!/usr/bin/env perl
# vim: sts=3 ts=3 sw=3 et ai :
use strict;
use warnings;
use Data::Tubes qw< pipeline >;
my $input = <<'END';
1,Pearl Jam,Evenflow
2,Prince,Purple Rain
3;Take That;It Only Takes a Minute
4,Soundgarden,Black Hole Sun
END
pipeline(
'Source::open_file', # source, as before
'Reader::by_line', # reader, as before
['Validator::admit', qr{(?:.*,.*,)}],
['Parser::by_format' => 'id,artist,title'],
[
'Renderer::with_template_perlish',
"[#[% id %]] [% artist %] - [% title %]\n",
],
'Writer::to_files',
{tap => 'sink'},
)->(\$input);The output is as expected:
shell$ ./admit-00
[#1] Pearl Jam - Evenflow
[#2] Prince - Purple Rain
[#4] Soundgarden - Black Hole Sun
This approach simply (and silently) removes failing records. Whether this is actually what suits you, or just a quick shortcut before setting things up with better error management is up to you. Also, note that it might be inefficient (you're somehow doing parsing twice, or multiple times), so you might to avoid this approach or do some benchmarking. But you might also want to read on for a few more goodies!
It's worth noting that you can also pass sub references in addition to
regular expression references. In this case, the sub will be called with
the provided input and should return a true value if the validation is
passed.
Last, if you provide a list of validation regexes/subs, all of them have to match/return a true value for the record to go ahead. Anything failing interrupts the checks.
It might well be that your selection rules are better expressed as
blacklists instead of whitelists. For example, to get rid of C++
comment-only lines you might want to say that whatever matches
qr{(?:\A\s*//)} should be ignored instead of allowed. This is why
admit supports boolean option refuse, as we can see in the following
example:
#!/usr/bin/env perl
# vim: sts=3 ts=3 sw=3 et ai :
use strict;
use warnings;
use Data::Tubes qw< pipeline >;
my $input = <<'END';
// this line is commented
1,Pearl Jam,Evenflow
2,Prince,Purple Rain
// this too
4,Soundgarden,Black Hole Sun
END
pipeline(
'Source::open_file', # source, as before
'Reader::by_line', # reader, as before
['Validator::admit', qr{(?:\A\s*//)}, {refuse => 1}],
['Parser::by_format' => 'id,artist,title'],
[
'Renderer::with_template_perlish',
"[#[% id %]] [% artist %] - [% title %]\n",
],
'Writer::to_files',
{tap => 'sink'},
)->(\$input);The output is the same as before. Using refuse might not be the best
move in readability (the main factory is still named admit after all),
which is why there is also refuse as a direct factory, like in the
following example:
#!/usr/bin/env perl
# vim: sts=3 ts=3 sw=3 et ai :
use strict;
use warnings;
use Data::Tubes qw< pipeline >;
my $input = <<'END';
// this line is commented
1,Pearl Jam,Evenflow
2,Prince,Purple Rain
// this too
4,Soundgarden,Black Hole Sun
END
pipeline(
'Source::open_file', # source, as before
'Reader::by_line', # reader, as before
['Validator::refuse', qr{(?:\A\s*//)}],
['Parser::by_format' => 'id,artist,title'],
[
'Renderer::with_template_perlish',
"[#[% id %]] [% artist %] - [% title %]\n",
],
'Writer::to_files',
{tap => 'sink'},
)->(\$input);The refuse function has also a few variations that can come handy in the following cases:
- if you want to get rid of empty lines
- if you want to get rid of comment line (Perl/shell style, i.e. starting
with a
#character)
These functions are:
- refuse_empty allows you to reject empty lines (only)
- refuse_comment allows you to reject comment lines (only)
- refuse_comment_or_empty allows you to reject both empty and comment lines.
These function do not take anything additional... they just do what they advertise. You can add other validators after them, anyway.
One example should suffice:
#!/usr/bin/env perl
# vim: sts=3 ts=3 sw=3 et ai :
use strict;
use warnings;
use Data::Tubes qw< pipeline >;
my $input = <<'END';
# this line is commented
1,Pearl Jam,Evenflow
# an empty lines before, another one after
2,Prince,Purple Rain
# this too
4,Soundgarden,Black Hole Sun
END
pipeline(
'Source::open_file', # source, as before
'Reader::by_line', # reader, as before
'Validator::refuse_comment_or_empty',
['Parser::by_format' => 'id,artist,title'],
[
'Renderer::with_template_perlish',
"[#[% id %]] [% artist %] - [% title %]\n",
],
'Writer::to_files',
{tap => 'sink'},
)->(\$input);All functions so far assume it's OK to just silently ignore non-compliant lines. This is probably OK when you are ignoring empty or comment lines, or when you're just interested in a few lines out of many (like when you are analyzing some input logs).
Other times, a validator is expected to also tell you what's wrong with the input, so that you can record it somewhere. This is when you want to take a look at thoroughly.
The manual page is quite... thorough, but it's worth recapping a few things:
- validation takes place for all validators, i.e. there's no shortcircuiting. This is by design, so that you can figure out in one sweep all issues that you might have with your inputs. You get a separate feedback for each validator though, so you can later decide what to keep and what to ditch;
- you can wrap each test in a wrapper sub, e.g. if you want to trap
exceptions. This is usually a sub reference, although you can also pass
the string
trythat will automatically use Try::Tiny under the hood (you will need to have it installed, though); - you can optionally get also the feedback for successful validations, just in case you need them.
It's worth taking a look at the shape of a validator subroutine, here's the signature and how it is called:
my @outcome = $validator->(
$target, # what pointed by "input", or the whole record
$record, # the whole record, if necessary
\%args, # args passed to the factory
@parameters, # anything sub in the array ref version
);You should normally only need the first parameter $target, but:
- if you need to take a look at the whole record (e.g. to compare values around, etc.) you get it as the second argument. This also allows you to mangle it if you need... although it's probably not a good idea (we will not stop you from shooting at your own feed anyway);
- you can pass whatever additional arguments to thoroughly and get them through the hash reference passed as third argument;
- any additional parameters come from your validator definition, see below.
To provide a validator, you can either pass a sub reference, or an array reference shaped like this:
my $array_validator = [
'Name of the validator, as a string',
sub {
# this is the validator!
},
@any_additional_parameter_even_none
]
This is useful if you want to associate a name to the tests, and also if you want to pass additional parameters. Do you need them? It's up to you to decide.
After too much talking, let's take a look at an example:
#!/usr/bin/env perl
# vim: sts=3 ts=3 sw=3 et ai :
use strict;
use warnings;
use Data::Tubes qw< pipeline >;
my $input = <<'END';
# this succeeds
foo: bar, number: 12
# this fails due to number: odd and less than 10
foo: baz, number: 3
# this fails due to odd number and non-matching foo
number: 21, foo: galook!
END
pipeline(
'Source::open_file', # source, as before
'Reader::by_line', # reader, as before
'Validator::refuse_comment_or_empty',
'Parser::ghashy',
[
'Validator::thoroughly',
sub { $_[0]{foo} =~ /bar|baz/ },
['is-even' => sub { $_[0]{number} % 2 == 0 }],
['in-bounds' => sub { $_[0]{number} >= 10 && $_[0]{number} <= 21 }]
],
sub {
use Data::Dumper;
$_[0]{rendered} = Dumper [@{$_[0]}{qw< structured validation >}];
return shift;
},
'Writer::to_files',
{tap => 'sink'},
)->(\$input);The output in this case allows you to see the validation process too:
shell$ ./thoroughly-00
$VAR1 = [
{
'foo' => 'bar',
'number' => 12
},
undef
];
$VAR1 = [
{
'number' => 3,
'foo' => 'baz'
},
[
[
'is-even',
''
],
[
'in-bounds',
''
]
]
];
$VAR1 = [
{
'number' => 21,
'foo' => 'galook!'
},
[
[
'validator-0',
0
],
[
'is-even',
''
]
]
];
Note: the above example works only as of release 0.734!
There's little to say to conclude this little article: as any other part in Data::Tubes, the validators are part of a toolkit that helps you accomplish your goals, hopefully without getting in the way. Do you need them? Probably not. Are they useful? Mostly yes... but it's up to you to decide!