So you REALLY don’t know regular expressions?

Ever since I started my new job, I’ve noticed a curious phenomenon. I work with two wonderfully gifted programmers who both know PHP much better than I do, and I learn something new from them all the time. However, neither one of them really knows or uses regular expressions.

Now, as I learned Perl before I learned PHP, naturally I learned regular expressions quite early on in that process. In Perl, regular expressions are a huge part of the language – you simply cannot get away without learning them to some extent as they are used extensively in so many parts of the language.

Apparently I’m not the only one to notice this. Here’s a quote I found on Stack Exchange:

In earlier phases of my career (ie. pre-PHP), I was a Perl guru, and one major aspect of Perl gurudom is mastery of regular expressions.

On my current team, I’m literally the only one of us who reaches for regex before other (usually nastier) tools. Seems like to the rest of the team they’re pure magic. They’ll wheel over to my desk and ask for a regex that takes me literally ten seconds to put together, and then be blown away when it works. I don’t know–I’ve worked with them so long, it’s just natural at this point.

In the absence of regex-fluency, you’re left with combinations of flow-control statements wrapping strstr and strpos statements, which gets ugly and hard to run in your head. I’d much rather craft one elegant regex than thirty lines of plodding string searching.

While I would hesitate to call myself a Perl guru (at best I would call myself intermediate with Perl), I would say I know enough about regular expressions that I can generally get useful work done with them.

Take the following example in Perl (edited somewhat as it didn’t play nice with TinyMCE):

$fruit = "apple,banana,cherry";
print $fruit;
@fruit = split(/,/,$fruit);
foreach(@fruit){print $_."\n";}
apple,banana,cherry
apple
banana
cherry

Now, this code should be fairly easy to understand, even if you don’t really know Perl. $fruit is a string containing “apple,banana,cherry”. The split() function takes two arguments, a regular expression defining the character(s) that are used to separate the parts of the string you want to put into an array, and the string you want to split. This returns the array @fruit, which consists of three strings, “apple’, “banana”, and “cherry”.

In PHP, you can do pretty much the same thing, using the explode() function:

$fruit = "apple,banana,cherry";
echo $fruit."\n";
$fruitArray = explode(",",$fruit);
foreach($fruitArray as $fruitArrayItem)
{
echo $fruitArrayItem."\n";
}
apple,banana,cherry
apple
banana
cherry

As you can see, they work in pretty much the same way here. Both return basically the same output, and the syntax for using the appropriate functions for splitting the strings is virtually identical.

However, it’s once things get a bit more difficult that it becomes obvious how much more powerful regular expressions are. Say you’re dealing with a string that’s similar to that above, but may use different characters to separate the elements. For instance, say you’ve obtained the data that you want to pass through into an array from a text file and it’s somewhat inconsistent – perhaps the information you want is separated by differing amounts and types of whitespace, or different characters. The explode() function simply won’t handle that (at least, not without a lot of pain). But with Perl’s split() function, that’s no problem. Here’s how you might deal with input that had different types and quantities of whitespace as a separator:
@fruit = split(/\s+/,$fruit);
Yes, it’s that simple! The \s metacharacter matches any type of whitespace, and the + modifier means that it will match one or more times. Now you can very easily convert the contents of that string into an array.

Or say you want to convert an entire string of text, with all kinds of punctuation and whitespace, into an array, but only keep the actual words. This wouldn’t be practical with explode(), but with split() it’s easy:
@fruit = split(/\W+/,$fruit);
The \W metacharacter matches any non-word character (ie anything other than a-z, A-Z or 0-9), and again the + modifier means that it will match one or more times.

And of course, regular expressions are useful for many more tasks than this that, while possible with most language’s existing string functions, can get very nasty quite quickly. Say you want to match a UK postcode to check that it’s valid (note that for the sake of simplicity, I’m going to ignore BFPO and GIR postcodes). These use a format of one or two letters, followed by one digit, then may have an additional digit or letter, then a space, then a digit, then two letters. This would be a nightmare to check using most language’s native string functions, but with a regex in Perl, it’s relatively simple:

my $postcode = "NR1 1NP";
if($postcode =~ m/^[a-zA-Z]{1,2}\d{1}(|[a-zA-Z0-9]{1})(|\s+)\d{1}\w{2}$/)
{
print "It matched!\n";
}

And if you wanted to return the first part of the postcode if it matched as well, that’s simple too:
my $postcode = "NR1 1NP";
if($postcode =~ s/^([a-zA-Z]{1,2}\d{1}(|[a-zA-Z0-9]{1}))(|\s+)\d{1}\w{2}$/$1/)
{
print "It matched! $postcode\n";
}

Now, you may say “But that’s in Perl! I’m using PHP!’. Well, regular expressions are an extremely powerful part of PHP that are very useful, they’re just not as central to the language as they are in Perl. PHP actually has two distinct types of regular expressions – POSIX-extended regular expressions, and Perl-compatible regular expressions (or PCRE). However, POSIX-extended regular expressions were deprecated from PHP 5.3 onwards, so it’s not really worth taking the time to learn them when PCRE will do exactly the same thing and is going to be around for the future. Furthermore, most other programming languages also support Perl-compatible regular expressions, so they’re fairly portable between languages, and once you’ve learned them in one language, you can easily use them in another. In other words, if you learn how to work with regular expressions in Perl, you can very easily transfer that knowledge to most other programming languages that support regular expressions.

In the first example given above, we can replace explode() with preg_split, and the syntax is virtually identical to split() in Perl, with the only difference being the name of the function and that the pattern to match is wrapped in double quotes:

$fruit = "apple,banana,cherry";
echo $fruit."\n";
$fruitArray = preg_split("/,/",$fruit);
foreach($fruitArray as $fruitArrayItem)
{
echo $fruitArrayItem."\n";
}
apple,banana,cherry
apple
banana
cherry

Along similar lines, if we want to check if a string matches a pattern, we can use preg_match(), and if we want to search and replace, we can use preg_replace(). PHP’s regular expression support is not appreciably poorer than Perl’s, even if it’s less central to the language as a whole.

But regular expressions are slower than PHP’s string functions!

Yes, that’s true. So it’s a mistake to use regular expressions for something that can be handled quickly and easily using string functions. For instance, if in the following string you wanted to replace the word “cow” with “sheep”:

The cow jumped over the moon

You could use something like this:

$text = "The cow jumped over the moon";
$text = preg_replace("/cow/","sheep",$text);

However, because here you are only looking to match literal characters, you don’t need to use a regular expression. Just use the following:
$text = str_replace("cow","sheep",$text);

But, if you have to do some more complex pattern matching, you have to start using strpos to get the location of specific characters and returning substrings between those characters, and it gets very messy, very quickly indeed. In those cases, while I haven’t done any kind of benchmarking on it, it stands to reason that quite quickly you’ll reach a point where a regex would be faster.

However, for a number of common tasks, such as validating email addresses and URLs, there’s another way and you don’t need to resort to regular expressions, or faffing about with loads of string functions. The filter_var() function can be used for validating or sanitising email addresses and URLs, among other things, so this is worth using instead of writing a regex. If you’re using a framework such as CodeIgniter, you may have access to its native functions for validating this kind of thing, so you should use those instead.

But regular expressions are ugly and make for less readable code!

Not really. They seem intimidating to the newcomer, and very few people can just glance at a regex and instantly know what it does. But with regexes, you can often do complex things in far fewer lines of code than would be needed to accomplish the same thing using just PHP’s string functions. If you can do something in a line or two using string functions, it’s probably best to do that. But after that, things go downhill very quickly.

Once you learn them, regular expressions really are not that hard, and you’ll probably find enough things to use them for that you’ll get plenty of practice at them. They’re certainly more readable to anyone with even a modicum of experience using them than line after line of flow-control statements.

But you shouldn’t be using regular expressions for parsing HTML or XML!

Quite true. Regular expressions are the wrong tool for that. You should probably use an existing library of some kind for that.

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Ah, yes, surely one of the most misused quotes on the web! Again, regular expressions are not the right tool for every job, and there’s a lot of tasks they get used for, and quite frankly, shouldn’t be. Most of us who know regular expressions have been known to use them for things we probably shouldn’t (I actually only just stumbled across filter_var, so I’ve done my share of validating email addresses using regexes, and I’m as guilty as anyone else of overusing them). But there’s still plenty of stuff you should use it for when what you need to do can’t be accomplished quickly and easily using string functions.

Regular expressions are not inherently evil. They’re a tool like any other. What is bad is using them for things where a simple alternative exists. However, they are still extremely useful, and there’s plenty of valid use cases for them.

Github

To date, Subversion is the single versioning system I have the most experience with. I use it at work, and I was already somewhat familiar with it beforehand. However, with all the buzz over Git over the last few years, it’s always been tempting to explore that as an alternative.

I’ve had a Github account for over a year, but had as yet not added anything to it. However, today that changed. I’ve had a rather haphazard approch towards my .vimrc and other Vim configuration files for a while, with the result that they tend to be less than consistent across different machines. I’ve seen that a fair number of people put their Vim configuration files under version control, and that seemed like an effective solution, so I’ve gotten my .vimrc and .vim into a respectable state and added them to a new repository. Now I should have no excuse for letting them get out of sync.

I have to say, Github is a truly wonderful service. The tutorials for getting started with Git are really good, and make it easy to get started. It’s probably one of the main reasons why Git is becoming more and more popular- there isn’t really anything comparable for Subversion.

Linux in the workplace

At the start of September I left my customer services role and started a new position as a web developer. I won’t give the name of either my old or new employer, but I will say that the new role is with a much smaller company, and the part I work for now is an e-commerce store that enjoys a significant degree of independence from the parent company. There are only two developers including myself, and we are solely responsible for the company’s IT infrastructure, and we don’t have the hassle of dealing with legacy applications or infrastructure. We therefore have considerable freedom in terms of what we choose to use to get our work done.

When I first started, I used Windows XP Professional since that was what my work laptop came with, but it soon became obvious that there wasn’t actually anything I specifically needed to be using Windows for. I mostly work on the company’s intranet, which doesn’t really need to be tested in Internet Explorer as we use Firefox internally. For email and calendar, we use Google Apps, which works fine with virtually any email client that supports IMAP, so I was using Thunderbird with the Lightning plugin. When coding I used Netbeans with the jVi plugin for most of my work, with occasional usage of Vim for writing shorter scripts. I used AppServ to provide local versions of Apache, MySQL and PHP, and I used PHPMyAdmin to interact with the database. For version control, I used Subversion. From time to time I need to remote into another machine using VNC, SSH or RDP, for which I used mRemote, but I was confident I could find an equivalent application. Also, we use Ubuntu on most of our servers, so it made a lot of sense from a compatibility point of view to also use it on my own desktop. From time to time, I also found myself writing bash or Perl scripts for systems administration purposes, and since it wasn’t really very practical to do that in Windows when it was going to be running in Ubuntu, I’d used an Ubuntu Server install in Virtualbox to write it, but it was obvious that running Ubuntu as my desktop OS would make more sense.

As Ubuntu 11.10 was due a little over a month after I first started, I decided to hold off making the switch until then so I could start with the most recent version and not have the hassle of upgrading an existing install. I had already downloaded the 64-bit version of Ubuntu 11.10 for my home machines and burned them to a CD, so I brought the CD into work and set up a dual boot so I could revert back to XP if anything went wrong, and also so I could easily copy across any files I needed from the Windows partition.

It took a fair while to get everything I wanted installed, but a lot less time than it would have taken if I’d set up Windows XP from scratch. The hardware all worked fine out of the box, and most of the software I needed was in the repositories. The only thing that I really needed that wasn’t there was Netbeans (which has apparently now been removed from the repositories), but the version in the Ubuntu repositories has never been very up-to-date anyway. Instead I installed the version of Netbeans available on the website, and that has worked fine for me. While there wasn’t a version of mRemote available, I did discover Remmina, which has proven to be an excellent client for SSH, RDP and VNC, to the point that I’ve now stopped using the terminal to connect via SSH in favour of using Remmina instead. Thunderbird does just as good a job with my email and calendar as it does on Windows, and I also have Mutt available. Naturally, it couldn’t be simpler to install a full LAMP stack and PHPMyAdmin either. In fact, the only application that I use much that I couldn’t get a decent version of was MySQL Workbench, and that was only because Oracle haven’t yet released a version for Ubuntu 11.10 (tried the version for 11.04, but it doesn’t seem to work), but I can live without that.

What’s interesting is that despite all the scaremongering I’ve heard over the years about how Linux isn’t ready for the workplace, I’ve as yet had no problems whatsoever. For everything I used in Windows, it was either available on Ubuntu, or there was a viable equivalent, or I could get by fine without it. Granted, the nature of my work means I have little need for the small amount of functionality that Microsoft Office has and LibreOffice doesn’t, and I don’t need to use the kind of ghastly legacy apps written in Visual Basic that most large enterprises commonly use, but I haven’t noticed any significant barriers to my productivity.

In fact, if anything I’m considerably more productive. I know people like to rag on Unity, and I wasn’t happy with it in the netbook edition of Ubuntu 10.10 myself, but in 11.10 it’s really starting to show its promise, and I haven’t had any problems with it. The fact that I know Ubuntu a lot more thoroughly than I do Windows, purely from my own experience at home, means that I can get things done a lot quicker, but also the whole package management system means I’m largely free from the annoyances of opening an application in the morning to be confronted with an update dialogue, quite apart from the fact that very few updates require a restart. I’d go so far as to say that I’ve been more productive using Ubuntu at work than I would have been with either Windows 7 or OS X (and over the last few years I’ve used Windows Vista, Windows 7 and OS X fairly extensively).

I really don’t want this to turn into Yet Another Year of the Linux Desktop blog post, because that’s rather a tired old cliche, but I have absolutely no problems whatsoever getting my work done on Ubuntu. I’ll concede that as a developer I have significant freedom that isn’t often afforded to other people, and running some flavour of Unix makes a lot of sense if you’re a developer working with one of the open-source server-side languages such as PHP or Python (if I were a .NET developer, it would make rather less sense). I’m also lucky to be in a position where I don’t have to worry about legacy apps or IE compatibility too much. Nonetheless, it’s still remarkable how smoothly my migration across to Ubuntu on my work desktop has gone, and the extent to which I find it’s improved my workflow.

Hacked!

Had a rather unfortunate incident last month – someone hacked into my Pogoplug mail server, and managed to get their mitts on my .fetchmailrc, which had all the login details for several email accounts. They promptly began sending spam out using my Gmail account.

Naturally this meant I spent ages running round like a headless chicken trying to lock them out – when I first noticed that they’d been sending emails directly from my mail server, I logged into it via SSH and shut it down, then changed the passwords on all my email accounts.

Thinking logically, there were four services that I had forwarded ports to the server for – SSH, Apache, Postfix and Dovecot. Now, I was running SSH on a non-standard port, had disabled root access, and didn’t allow password authentication (SSH keys only). Also, I had enabled DenyHosts, so I’m fairly confident SSH was not the point of entry.

So that leaves either Apache, Postfix or Dovecot. I had noticed in the error logs a lot of characters prefixed with backslashes, and wondered if someone was trying some kind of shellcode injection, and to be safe I had added new iptables rules to blacklist the IP addresses responsible. I had done what I could to secure Apache, but I can’t rule it out as the application that was compromised. I went through the server logs, but without finding anything – I’m guessing whoever was responsible deleted the appropriate entries in the log files. I couldn’t be sure that the server could still be trusted, so I did a fresh install, and have disabled port forwarding on my router.

This has certainly made me much more cautious and suspicious about security, which I guess can’t be a bad thing. Even beforehand, I found it pretty scary to see the sheer number of script kiddies who will try to hack into any server on the Internet.

New phone

On Friday of last week I unexpectedly got a text from Vodafone saying I was able to upgrade my phone early. I was pretty pleased about this as having been something of an Android early adopter, I was still using an early Android phone, namely my HTC Magic. While a fine phone when it was released, it was only the second Android phone to become available in the UK and was therefore a bit dated compared to newer devices. It has been upgraded to Froyo (albeit a cut-down custom build) but that did slow the phone down somewhat.

So as soon as I had the opportunity I had a good look around for a new one to replace it. Right from the start I had my eye on the HTC Desire Z. Much as I love touchscreen phones, it’s very often extremely handy to have a physical keyboard, and as I’ve found myself using ConnectBot to connect to my home server via SSH a lot, the keyboard-toting Desire Z immediately had an advantage over the touchscreen-only models. Ideally I didn’t want to change my plan, so I checked out the deals for HTC phones on the same plan, and the Desire Z happened to be the only one on the same plan, so it was a no-brainer.

I got the phone on Monday, and it is amazing. The keyboard is easy to use and works well, the phone is lightning fast, and the UI is spot-on – it has everything I love about Android on the Magic (like the great notification system) and more. In particular I love the RSS reader- it syncs with Google Reader, so if I have to wait for a train, I can at least read some feeds while I’m waiting.

One thing I’m hoping to get more use out of is SL4A. I had this on my Magic, but coding on a touchscreen phone is not easy! I’m hoping that with the Desire Z’s keyboard, this will be a lot more useful.

More on my mini server

While I was very pleased to get a proper Linux distro working on my Pogoplug, the Arch-based Plugbox Linux was never really my cup of tea. While it’s a fine distro, I always felt that Debian would have been a much better fit. Partly this is because Debian has established a strong history of being a solid, stable distro that would carry on working no matter what, whereas Arch is more bleeding-edge. Also, Debian has a colossal repository that included a lot of software I wanted that wasn’t in the Arch repositories and I couldn’t get to install or compile from source, such as procmail and Squirrelmail. Debian also has strong support for many different processor architectures, including armel. Finally, being an Ubuntu user on the desktop, Debian is a distro that feels much more familiar to me.

So I eventually gave up on running Plugbox Linux and took the opportunity of the release of Debian Squeeze to install it on my Pogoplug, thanks to this tutorial. With that done, I set about adding my favourite applications. Byobu is a really handy tool that makes GNU screen significantly more intuitive and useful, so that’s always one of the first things to go on, and one that I’d really missed in Plugbox. I’ve now gotten my mail server working again, with the addition of procmail as my mail filter and Squirrelmail to give me a web interface. I’ve also set up Leafnode on there as I’d really like to learn more about Usenet, and I’m beginning to get the hang of using slrn to read it.

It’s amazing how much running my own server has taught me about security. I was staggered to see the sheer number of attempts by script kiddies to connect via SSH to my Pogoplug, and it really made me start thinking about security in a way I’d never bothered beforehand. I’ve installed denyhosts to block atttempts to brute-force the password, and made sure I chose a good password. I’ve also set OpenSSH to listen on a different port, which should hopefully decrease the number of login attempts substantially (I presume most of these were just script kiddies scanning large blocks of IP addresses looking for hosts with port 22 open), and have disabled root login (as at right now my login is the only one that is allowed via SSH, so if anyone does bother to do a more thorough scan and try to connect to the port I’m running SSH on, they’ll need to guess my username AND password, and do so before denyhosts kicks them off – a pretty tall order).

The whole concept of “plug servers” is one I really like, and my experience with the Pogoplug has been extremely good – it’s an inexpensive and extremely hackable device that has been an absolute pleasure to use.

My new mini server

For a while now I’ve wanted a home server of some description, the idea being that it was something I could use to run a web server for development purposes, and a mail server so I could have an offline backup for my Gmail account (considering how much I rely on it, it’s only prudent to plan for what might happen if Gmail went down), and whatever else I need. Also, I only have laptops at present so I liked the idea of having something I could leave on all the time and connect to remotely via SSH.

Around Christmas, I read a forum post by someone who’d bought a PogoPlug cheap from PC World and had hacked it into a web server using Plugbox Linux, an Arch-based Linux distro. Shortly afterwards, I went into a branch of Currys in Norwich, and they had one on sale (£20 off the RRP of £70), so I shelled out for it. I already had a load of USB flash drives lying around, and an 8GB one is big enough for what I had in mind. After all, I wasn’t going to be serving anything that demanding over it, so something small and low-powered should be fine.

This weekend I finally got round to getting it set up. The PogoPlug service is actually pretty good – if you’re unfamiliar with it, it’s basically a self-hosted version of Dropbox, where you buy the device, connect it to your router, attach up to 4 flash drives or hard drives via USB, then share the files stored on them easily across your home network or over the Internet. However, this wasn’t really what I wanted.

Installing Plugbox Linux wasn’t hard – I merely had to activate SSH from the PogoPlug’s control panel, connect and kill the hbwd process, then install a new bootloader to enable it to boot the new OS. Once that was done, it was a case of attaching a flash drive, ensuring it was correctly mounted and the filesystem was set up properly, then downloading the Plugbox Linux tarball and unpacking it on the flash drive, before rebooting into the new OS.

Once it was installed, it wasn’t too hard to get the hang of pacman. I’d prefer it to have been Debian-based as that’s what I’m most familiar with, but that’s just personal preference. After a little tinkering I now have Postfix and Dovecot working on there, as well as Apache (although it might make sense to switch to something lighter, such as lighttpd or Cherokee). I’ve given it a fully qualified domain name via a free subdomain at dyndns.org, and I can now access emails on there via IMAP. Outgoing email works fine too, so I can always set up a Perl script or two to notify me if anything goes wrong by sending an email to my Gmail account. I’ve set up fetchmail to pull emails from my Gmail account via POP3, so all my email is in the process of being backed up on there, and I can use my phone to access it via IMAP, or SSH in and read it with Mutt. Going forwards, I may install Squirrelmail as well to give me more options.

One thing I’m not too sure about – I couldn’t get incoming mails to work, and I’m unsure whether this is because it’s using a subdomain (the email address is basically matthew@mydomainname.dyndns.org) or Postfix is merely misconfigured. Is it possible to receive emails to a subdomain in this fashion?

Anyway, this is a really great little machine and it’s been lots of fun getting it set up. I have to say, though, I’m really disappointed with the range of home server and NAS products currently on the market. Most of the NAS systems offer very little in the way of functionality or customisability, and most of the home servers are a bit too big, powerful and expensive, and usually run Windows Home Server, which isn’t really my cup of tea.

What I’d like to see is a small home server with a couple of hard drive bays at most, and a Debian or Ubuntu-based OS with access to apt-get and tasksel, so it’s easy to install whatever you want from the repositories. Also, give it a web interface that’s simpler than Webmin and makes it quick and easy to set up common software, but offer an advanced option for those that want it. That would be a fantastic device for end users – if it made it easy to set up a UPnP server, a Firefly server, or a BitTorrent client, that would be really useful.

A slight change…

Just to say I’ve changed the contact form I use on here. I always wanted to use one with a built-in CAPTCHA facility, as Disqus seems to have pretty much killed off the comment spam, but I was still getting it via the contact form. I’ve put off doing something about it till now, but it was getting out of hand so I’ve gone and found a new contact form. Let’s hope this kills the spam…

Deleting unwanted Vim swap files using Perl

Yesterday I realised that I had somehow managed to scatter Vim swap files all across the Dropbox folder I use to share Perl and Python scripts I’d written between several computers, and it would be a good idea to clear them up. I didn’t like the idea of using grep to search for them and manually deleting them, so I decided this was the ideal opportunity to write a Perl script to do it for me!

I came up with the following:


#!/usr/bin/perl -w

use strict;
use Cwd;

sub searchDir
{
    # Subroutine to scan a directory looking for Vim swap files
    # Get directory to read and current directory
    my $readdir = shift;
    my $startdir = cwd();

    # Change directory to the target one
    chdir($readdir) or die "Unable to open $readdir! $!\n";
    print "Scanning contents of directory $startdir\n";

    # Open the directory and grab the names of all the files and folders in it
    opendir(DIR, ".") or die "Unable to open current directory! $!\n";
    my @entries = readdir(DIR) or die "Unable to read directory! $!\n";
    closedir(DIR);

    # Loop through the files and folders in the directory
    foreach my $entry (@entries)
    {
        # Skip this one and the one above it in the filesystem hierarchy
        next if($entry eq ".");
        next if($entry eq "..");

        # If a file is a directory, call the searchDir subroutine recursively in order to scan it
        if(-d $entry)
        {
            searchDir($entry);
            next;
        }

        # Use a regular expression to check to see if the current file starts with a period, and ends with .swp - if it does, it's a Vim swap file
        if($entry =~ m/^\..*\.swp$/)
        {
            # Inform the user that a Vim swap file has been found and print out the path to it
            print "Found a Vim swap file!\n";
            my $swppath = cwd();
            print "It's the file $entry in $swppath.\n";
            my $fullpath = $swppath . "/" . $entry;
            print "The full path is $fullpath.\n";

            # Prompt the user to delete the file
            print "Do you wish to delete this file? (Y/N)\t";
            chomp(my $reply = <STDIN>);
            if($reply =~ m/y/i)
            {
                print "Deleting $fullpath...\n";
                unlink($fullpath);
            }
        }
    }

    chdir($startdir);
}

# Get directory to begin the search
print "Enter directory to start search: ";
chomp(my $beginSearch = <STDIN>);

# call searchDir to start the search
searchDir($beginSearch);

Thankfully, I’ve now discovered the Preserve Code Formatting plugin for WordPress, which seems to do a good job at making the code look presentable!

This isn’t perfect – it uses recursion to examine subdirectories, and when I ran it on my /home folder it somehow wound up in /sys on my Ubuntu machine and I ended up getting a deep recursion warning (a little research suggests this happens when it goes over 100 directories in). However, it seems to work fine for scanning individual folders in my /home directory, and that’s all I really wanted anyway.

I love how Perl makes writing this kind of simple script so easy. It’s a great language for that kind of systems administration task.

A couple of things I love about Perl

In the time that I’ve been learning Perl, I’ve slowly grown to appreciate the strengths of the language more and more. There’s two things in particular that I like about Perl. Once that I really don’t think anyone is going to be surprised by is CPAN. It’s a fantastic resource – there are a huge quantity of Perl modules available for virtually any task under the sun, and they’re incredibly useful.

The other is just how good the documentation is – I’ve never considered myself to be someone who learns terribly well from Unix man pages, but perldoc seems to have very good documentation indeed, including that for CPAN modules. Also, it helps that if you don’t do well with the man page format, you have the option of running podwebserver and getting the documentation formatted as web pages.

To give an example, I’m particularly interested in all kinds of network programming, be it web development, IRC, Jabber or whatever, and I’d heard of the Net::IRC module so I decided to start using it to create a simple IRC bot (yes, I know I should really be using POE::Component::IRC instead!). Using the information gleaned from perldoc Net::IRC it was easy to get started writing a bot, and I’ve now come up with the following simple bot:

#!/usr/bin/perl -w

use strict;
use Net::IRC;

my $irc = new Net::IRC;
my $nick = "mattsbot";
my $server = "irc.freenode.net";
my $channel = "#botpark";
my $port = 6667;
my $ircname = "My wonderful bot";
my $owner = "mattbd"; 

sub on_connect
{
   my $self = shift;

   print "Joining $channel\n";
   $self->join($channel);
   $self->privmsg($channel,"Ready to go!");
}

sub on_disconnect
{
  my $self = shift;
  $self->join($channel);
  $self->privmsg($channel, "Sorry about that - dropped out for a sec.");
} 

sub on_join
{
  # Get the connection and event objects
  my ($conn, $event) = @_;

  # Get the nick that just joined
  my $newnick = $event->{nick};

  # Greet the new nick
  $conn->privmsg($channel, "Hello, $newnick! I'm a greeting bot!");
}

sub on_msg
{
  # Get the connection and event objects
  my ($conn, $event) = @_;

  # Get nick of messaging user
  my $messager = $event->{nick};

  # Respond negatively
  $conn->privmsg($messager, "Sorry, I'm just a bot. Please don't message me!");
}

sub on_public
{
  # Get the connection and event objects
  my ($conn, $event) = @_;

  # Get nick of messaging user
  my $messager = $event->{nick};

  # Get text of message
  my $text = $event->{args}[0];

  # Check to see if text contains name of bot - if so message the user negatively
  if($text =~ m/$nick/)
  {
    $conn->privmsg($channel, "Sorry, $messager,I'm just a simple bot!");
  }
}

my $conn = $irc->newconn(Nick =>$nick,Server=>$server,Port=>$port,Ircname=>$ircname);
$conn->add_global_handler('376', \&on_connect);
$conn->add_global_handler('disconnect', \&on_disconnect);
$conn->add_global_handler('msg', \&on_msg);
$conn->add_global_handler('join', \&on_join);
$conn->add_global_handler('msg', \&on_msg);
$conn->add_global_handler('public', \&on_public);
$irc->start();

Now, this bot isn’t exactly hugely capable – all it does is greet new joiners, and tell you to leave it alone if you try to talk to it, but it was pretty easy to code it, thanks to the documentation, and it’s a good base to build on. From here, it’s easy to extend the on_public and on_msg subroutines to deal with other messages – for instance, I could use a regular expression to look for “!respond” in the text of the message and if it’s found, respond with any appropriate text.

I’ve hard-coded the appropriate details into the script in this case to make it quicker and easier to test it, but it would be trivial to change it to either accept settings passed as arguments from the command line, or have it grab these from a separate text file.

My initial doubts about Perl are really wearing off. It’s a powerful language and one that, now I’ve picked up the basic syntax, I’m having little trouble getting work done with.

Go to Top