Perl UTF-8 and Regular Expressions

Today I wanted to explore Perl’s Unicode and regular expression capabilities, so I wrote down this simple script. It is quite amazing how simply Perl handles strings and regular expressions! Otherwise you have to use multiple sed or egrep commands with pipelines.

#!/usr/bin/perl

use Encode;
use utf8;

# mercy in Greek
my $bob  = "<b>Έλεος</b>";

# get the first argument of script and decode it to utf8 string
my $telis = decode('UTF-8',$ARGV[0]);

# beta 'β' letter
my $ter = ord('β');
$ter+=4;

my $arithmouba = 2;
$arithmouba = $arithmouba << 3;

# convert number back to letter
$ter = chr($ter);

# regular expression substitution
$bob =~ s/<b>/<b>\n/g; 

# encode output to utf8
$bob = encode('UTF-8', $bob);
$ter = encode('UTF-8', $ter);


print "$bob\n$telis\n$ter\n$arithmouba\n";
Advertisements

About cmanios

programming
This entry was posted in Linux and tagged , , , , , , . Bookmark the permalink.

One Response to Perl UTF-8 and Regular Expressions

  1. lmviet says:

    Reblogged this on lmviet's Blog and commented:
    Perl UTF-8 and Regular Expressions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s