Perl Database Iteration Problem

Ancient History -- This bug plagued wiki for a year or so. None of the suggestions below worked. Nor did a half dozen others I tried. The problem went away when I upgraded to perl5 and tied in a different implementation of dbm. -- WardCunningham


One of my long standing WikiWikiBugs really has me stumped. I use Perl's associative array iteration construct,

while (($key, $value) = each %db) {...}

to search pages of this database. I report the number of pages visited which is normally in the hundreds. I say normally because sometimes it stops early. This messes up my search command and could be catastrophic in my compress-database script. (Don't worry, I check.)

Does anyone know what is going on? My only other hint is that the problem corrects itself with the addition of another page. -- WardCunningham


Do you lock the database? I don't think UnixDbm? or the PerlDbmInterface? does this automatically. Thus, you may be seeing problems only when the iterator runs simultaneously with some other access. -- MarkEichin

Yes, now I do. When I first saw this problem I interpreted it as a WikiWikiDatabaseSerializationProblem, but now I think otherwise. -- WardCunningham


I suspect that this is related to a problem I've seen with just exactly this type of iteration. Try changing the loop to be a

foreach my $key (keys %hash) {
my $value = $hash{$_}; 
# do the thing with $key and $value ...
}

When I changed it to this, an unpredictable and undependably irreproducible error that seemed to be related to memory management quit happening, and we were able to handle a lot more entries in the hash without running out of memory.

 --- JoeMcMahon


I agree with Joe; use keys rather than each.

According to PERLFUNC(1)'s definition of each:

There is a single iterator for each hash, shared by all each(), keys(), and values() function calls in the program; it can be reset by reading all the elements from the hash, or by evaluting keys HASH or values HASH. If you add or delete elements of a hash while you're iterating over it, you may get entries skipped or duplicated, so don't.

To use the DesignPatterns terminology, Perl has only internal iterators, not external iterators.

Calling keys grabs all the keys at once, and stores them in an array; you then iterate over the array. (Storing the keys for a few thousand page names should be okay.) --PaulChisholm


Using keys instead of each is an essential first step. The next step requires locking.

The generally accepted way of locking (via flock) a DBM file is to use a separate lock file. The prior accepted scheme -- extracting a file handle from the DBM and flock'ing the file handle -- has been shown to be unsafe.

--DaveSmith


CategoryWiki


EditText of this page (last edited June 7, 2001) or FindPage with title or text search