Skip to content

Reading an untagged file <= 8 bytes in size causes output encoding differences #84

@chrishodgins

Description

@chrishodgins

With the following perl program the output will appear corrupted unless the file is greater than 8 bytes in size. The file untagged-file-with-ebcdic.txt is untagged and only contains EBCDIC characters.

Perl test program:

open(my $fh, '<', 'untagged-file-with-ebcdic.txt');
while (my $row = <$fh>) {
	chomp $row;
	print "$row\n";
}
close($fh);

Shell example:

$ chtag -r untagged-file-with-ebcdic.txt
$ od -Ax -xc untagged-file-with-ebcdic.txt
0000000000      F1F2    F3F4    F5F6    F715
               1   2   3   4   5   6   7  \n
0000000008
$ perl test.pl
�������

### Now try again with slightly bigger contents
$ od -Ax -xc untagged-file-with-ebcdic.txt
0000000000      F1F2    F3F4    F5F6    F7F8    1500
               1   2   3   4   5   6   7   8  \n
0000000009
$ perl test.pl 
12345678

Repeating the same sequence with the file tagged as IBM-1047:

$ chtag -r untagged-file-with-ebcdic.txt
$ od -Ax -xc untagged-file-with-ebcdic.txt
0000000000      F1F2    F3F4    F5F6    F715
               1   2   3   4   5   6   7  \n
0000000008
$ perl test.pl
1234567

### Now try again with slightly bigger contents
$ od -Ax -xc untagged-file-with-ebcdic.txt
0000000000      F1F2    F3F4    F5F6    F7F8    1500
               1   2   3   4   5   6   7   8  \n
0000000009
$ perl test.pl 
12345678

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions