[Linux-disciples] Deleting all non-ASCII characters

Stephen R Laniel steve at laniels.org
Thu Jul 17 10:37:40 EDT 2008


Apropos of Jamie's question, here's a
script to delete all non-ASCII
characters in a file. It uses the
:ascii: character class, which seems to
only be documented here:
http://perldoc.perl.org/perlreref.html

That's my only complaint about Perl's
documentation, which is otherwise
stellar: there's both a 'perlre' page:
http://perldoc.perl.org/perlre.html

and a perlreref page, linked further up.
Those should, ostensibly, be identical.
Yet they're not. And finding the
character-class documentation is harder
than it should be.

Anyway, here's the script:

#!/usr/bin/perl
use strict;
use warnings;
while(<>) {
    s#[^[:ascii:]]##g;
    print;
}

This has other advantages over the
script that I posted earlier. Among
others, it only read one line at a time,
so it doesn't use much memory at all.

-- 
Stephen R. Laniel
steve at laniels.org
Cell: +(617) 308-5571
http://laniels.org/
PGP key: http://laniels.org/slaniel.key


More information about the Linux-disciples mailing list