[Linux-disciples] Deleting all non-ASCII characters
Stephen R Laniel
steve at laniels.org
Thu Jul 17 10:37:40 EDT 2008
Apropos of Jamie's question, here's a
script to delete all non-ASCII
characters in a file. It uses the
:ascii: character class, which seems to
only be documented here:
http://perldoc.perl.org/perlreref.html
That's my only complaint about Perl's
documentation, which is otherwise
stellar: there's both a 'perlre' page:
http://perldoc.perl.org/perlre.html
and a perlreref page, linked further up.
Those should, ostensibly, be identical.
Yet they're not. And finding the
character-class documentation is harder
than it should be.
Anyway, here's the script:
#!/usr/bin/perl
use strict;
use warnings;
while(<>) {
s#[^[:ascii:]]##g;
print;
}
This has other advantages over the
script that I posted earlier. Among
others, it only read one line at a time,
so it doesn't use much memory at all.
--
Stephen R. Laniel
steve at laniels.org
Cell: +(617) 308-5571
http://laniels.org/
PGP key: http://laniels.org/slaniel.key
More information about the Linux-disciples
mailing list