Perl UTF-8 and Regular Expressions

Today I wanted to explore Perl’s Unicode and regular expression capabilities, so I wrote down this simple script. It is quite amazing how simply Perl handles strings and regular expressions! Otherwise you have to use multiple sed or egrep commands with pipelines.


use Encode;
use utf8;

# mercy in Greek
my $bob  = "<b>Έλεος</b>";

# get the first argument of script and decode it to utf8 string
my $telis = decode('UTF-8',$ARGV[0]);

# beta 'β' letter
my $ter = ord('β');

my $arithmouba = 2;
$arithmouba = $arithmouba << 3;

# convert number back to letter
$ter = chr($ter);

# regular expression substitution
$bob =~ s/<b>/<b>\n/g; 

# encode output to utf8
$bob = encode('UTF-8', $bob);
$ter = encode('UTF-8', $ter);

print "$bob\n$telis\n$ter\n$arithmouba\n";

About cmanios

This entry was posted in Linux and tagged , , , , , , . Bookmark the permalink.

One Response to Perl UTF-8 and Regular Expressions

  1. lmviet says:

    Reblogged this on lmviet's Blog and commented:
    Perl UTF-8 and Regular Expressions

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s