Globbing vs. Directory Handles


Upon undertaking the sitemap generation project, I started looking into the best and most efficient ways to extract a list of directory contents using perl.

That brought me to two options - Globbing or using Directory Handles. The question of which one is better implies that you have an answer to the even more fundamental question - what makes one option better than the other - or perhaps, what is the difference between these to options. But first, I probably should answer the question - "What in the world are you talking about?"

Yet another opportunity to use my newly discovered definition list tag. I don't know why it's not more widely used.

Globbing
A method used to pull file information from a given directory into an array
Directory Handles
An alternative method for globbing offering similar functionality

So you've been a perl programmer for years and you haven't heard the term globbing? That's because the glob operator didn't always have a name. Here are two examples of the same thing using slightly different syntax - one using the glob operator, and one using the syntax defined before glob became an official word.

my @ListofFiles = <*>;

my @ListofFiles = glob "*";

Essentially, what this does is returns a list (in the @ListofFiles variable of all files within the current directory - the same result that you would see from ls * (linux/unix) or dir * (windows). Of course - that's not all files. to get all files you need a slightly different mask - ** *.* or something even more complicated that I'm not going to get into right now.

So now that we have a basic understanding of globbing, let's learn the basics about directory handles. The easiest way for me to describe directory handles in a way that I would understand is as follows:
A directory handle is similar to a file handle - once you have the handle, you can read the lines individually to see what's inside a given directory. Example:

my $directory = "/home/bozo_the_clown";

opendir DH, $directory or die "Cannot open $directory: $!";

foreach $filename (readdir DH) {

print "$filename\n";

}

closedir DH;

It seems much more complicated, but I have actually included more processing - the above code snippet actually outputs the filenames, whereas the previous snipped only showed how to put the filenames into a variable.

Directory handles are not sorted, and they cannot be filtered - which means that all filtering and sorting has to happen within your processing routine. From my initial review of the code, it does look like globbing is the preferred choice - since it offers a bit more in the way of capability, and a bit less in the way of complexity. So far, the only resource I've used is a book called "Learning Perl, 4th Edition", but I know that I've seen reference in the past to a method which provides more information - things like last modified date and file size which I think will be helpful. There is also the issue of directory recursion, which I'm sure when I'm researching will probably bring to light some additional modules that will make my life easier - either built in Perl modules or something from CPAN.

Further tales of my research will be linked here as I make progress.



What visitors have to say about Globbing vs. Directory Handles