Auto Vivification

So-called AutoVivification is a PerlLanguage feature allowing complex nested data structures to spring into existence based on the types implied in the syntax of the statement of code accessing through the data structure. That is, by accessing through the branches to store (or fetch) a value at a leaf, the branches themselves are created, if not yet in existence, in order to "make a path" to the leaf.

Contributors: BillKelly


Example: AutoVivification in Action

The following complex datastructure will appear automagically by just storing into its depths, writing the leaf values. The hierarchy of branch nodes, if not yet existing, will spring into existence by the type implied in the statement of code indexing "through" that branch to get to the leaf. We will show in more detail below how AutoVivification will create a datastructure like the following in two lines of code like these:

 $hitlist{QUERYTERMS}->{$term} = 1;
 $hitlist{HITDOCS}->{$docKey}->{TERMHITS}->{$term} = 1;
Here's an example of the structure of the data this code will create:

 # Please note the following is not code, but an example of what
 # the resulting data structure will look like.  (OK technically
 # it IS more or less code - if you wanted to create this particular
 # data structure statically.)
 #
 #   %hitlist == {
 #
 #       # QUERYTERMS will be a key of the %hitlist hash, whose value
 #       # will be a sub-hash containing terms and corresponding document counts
 #       'QUERYTERMS' => {
 #           'term'        => 1,  # the "1"'s will be the total number of documents
 #           'anotherterm' => 1   # found having this term
 #       },
 #
 #       # HITDOCS will be a key of the %hitlist hash, whose value
 #       # will be a sub-hash . . . of sub-hashes . . .  of sub-hashes.
 #       'HITDOCS' => {
 #           # At this level we'll be creating search result nodes for
 #           # each document "hit" by one or more query terms
 #           'WikiWikiWeb' => {
 #               # At this level we'll be creating a term hits results
 #               # sub-hash for term hits found in this particular
 #               # document (WikiWikiWeb).  It may seem redundant or
 #               # wasteful to be creating a sub-hash under WikiWikiWeb
 #               # with only the single key TERMHITS - but, another
 #               # routine (not include in this autovivification
 #               # example) would later rank the results and be
 #               # adding a HITRANK key at the same level as
 #               # TERMHITS - for example.
 #               'TERMHITS' => {
 #                   'term'        => 1,  # the "1"'s will be the total number of times
 #                   'anotherterm' => 1,  # this term occurred in this document
 #                   etc. . .
 #               }
 #           },
 #           'GroundBreakingLanguagesDiscussion' => {
 #               'TERMHITS' => {
 #                   'someotherterm' => 1,
 #                   'anotherterm' => 1,
 #                   etc. . .
 #               }
 #           },
 #           etc . . . {
 #           }
 #       }
 #   };
 #
Okay. Before presenting the "real" code to create the above data structure out of search results in a loop, let's look at a literal "unrolled" example that would create the exact data structure shown above (excepting the "etc. . ." parts)

 my %hitlist;  # declare hash, which will become complex data structure

# build the $hitlist{QUERYTERMS}->{$term}... branch $hitlist{QUERYTERMS}->{'term'} = 1; $hitlist{QUERYTERMS}->{'anotherterm'} = 1;

# build the $hitlist{HITDOCS}->{$docKey}->{TERMHITS}->{$term}... tree $hitlist{HITDOCS}->{'WikiWikiWeb'}->{TERMHITS}->{'term'} = 1; $hitlist{HITDOCS}->{'WikiWikiWeb'}->{TERMHITS}->{'anotherterm'} = 1; $hitlist{HITDOCS}->{'GroundBreakingLanguagesDiscussion'}->{TERMHITS}->{'someotherterm'} = 1; $hitlist{HITDOCS}->{'GroundBreakingLanguagesDiscussion'}->{TERMHITS}->{'anotherterm'} = 1;

That's it! All those nested sub-hashes along the branches just appear by implication, in order to allow us to store the values ("1") in the leaf nodes as we've requested. That would be AutoVivification in action. :-) But of course the real power shows up when those "static" (literal) lines above reduce to just two lines in the loop below.

Here's an example of code to perform a search and create the above %hitlist data structure - or rather, a dynamic version of it having that form - out of the search results.

The line below that performs the search

 my $termHits = $searchdb{$term};
is the only statement invoking code outside of this routine. The two lines doing all the work to build the %hitlist data structure are the analogues of our static examples above. Here, they're:

 $hitlist{QUERYTERMS}->{$term} = scalar(keys %$termHits);
and

 $hitlist{HITDOCS}->{$docKey}->{TERMHITS}->{$term} = $docTermHits;
Here they are in context:

 sub getHitlist {
     my @terms = @_;
     my %hitlist;  # declare hash, which will become complex data structure
     my $term;
     foreach $term (@terms) {
         my $termHits = $searchdb{$term};  # fetch search results through tied hash
         $hitlist{QUERYTERMS}->{$term} = scalar(keys %$termHits);
         my ($docKey, $docTermHits);
         while (($docKey, $docTermHits) = each %$termHits) {
             $hitlist{HITDOCS}->{$docKey}->{TERMHITS}->{$term} = $docTermHits;
         }
     }
     return \%hitlist;
 }
Note: Perl also supports AutoVivification of vectors in the same manner as hashes, e.g.

 $hitlist{WHATEVER}->[123]->{ETC} = 1;
which, in contrast with

 $hitlist{WHATEVER}->{123}->{ETC} = 1;
would have brought about a vector and stored the sub-hash {ETC} at index 123 of the vector, rather than creating a hash, and storing the sub-hash {ETC} as the value of the hash's key '123'.


Anyway, I hope all this was able to convey the idea of AutoVivification without programmers who aren't familiar with Perl having to get too tangled in the Perl syntax itself. I'd be very interested to know if there are other languages out there with built-in mechanisms analogous to Perl's AutoVivification. -- BillKelly


This resembles somewhat the mkdir -p command in unix. This can be used to create a whole directory structure at once, i.e.:

  mkdir -p foo/bar/baz
will create foo, foo/bar and foo/bar/baz if they don't yet exist. -- StephanHouben

Yeah, it does kindof doesn't it? :-) -- bk

No. It's different, because in 'mkdir' you actually say that you want to make a directory. The -p switch just says, that the whole hierarchy should be created if necessary. When you are using

  $hitlist{foo}->[5]->{bar} = 1;
in perl, you are just telling perl to make an assignment, not that it should create arrays and hashes. Nevertheless, it does it for you. In my opinion that is AutoVivification.

But Stephan's observation that AutoVivification "resembles somewhat" mkdir -p, is I think still valid. Sometimes in my use of AutoVivification the value assigned isn't even important: I only need the hash keys that are created along the way. E.g.

  $topics{$curTopic}->{FROM}->{$refTopic} = 1;
 # i.e...
  $topics{kennedy}->{FROM}->{johnson} = 1;
  $topics{kennedy}->{FROM}->{apollo_program} = 1;
Which seems somewhat similar to
  mkdir -p kennedy/FROM/johnson
  mkdir -p kennedy/FROM/apollo_program
Including accessing the resulting structure, since
  @refTopics = keys %{$topics{kennedy}->{FROM}};
seems in essence equivalent to (in a Unix shell)
  refTopics = `ls kennedy/FROM`
both of which result in refTopics being assigned the tokens 'johnson' and 'apollo_program'. So the usefulness of AutoVivification for me derives most from the ease with which it allows me to create complex sparse tree structures... kindof like subdirectories. :-) -- BillKelly

Upon further consideration, it really resembles something like

 vi some/nonexistent/directory/afile.txt
and having the O/S automagically create the directory structure in place as part of the call to open file. That could be handy, actually, but also quite confusing to some. -- StevenNewton


Thank you for describing this. Now I can use it like a "pattern" without the need to switch something to Perl. -- PhlIp


Microsoft BASIC has a form of auto-vivification which seems to me more of a curse than a blessing.

The first language I learned was Sinclair BASIC, followed by BBC BASIC (they are similar in many ways; I think BBC BASIC is probably nicer but then the whole machine was nicer). I could never understand why people slagged off BASIC, when they programmed in languages that required you to say what a variable was going to be called before you could even give it a value, or didn't trust a newline marker as the end of a line but required a semi-colon, and other things the stupid computer could and should have worked out for itself. Then I ended up in a job where I had to use VBDOS 1.0 and realised just why people don't like BASIC.

Under BBC BASIC the following was legal:

 GOTO (1000+100*A%)
assuming A% was within range to give an existing programme line. (In Sinclair BASIC a nonexistent line does not cause an error, execution proceeds from the next existent line; but the % notation for integers is not supported. Any numeric expression is tret as an integer unless it contains a fraction or strays outside the range -65535..65535.)

Not under Microsoft BASIC! You can't do calculated GOTOs ..... nor can you do calculated RESTOREs, which were a useful method of avoiding creating an array from your DATA statements and taking up twice the space (once in the programme text and once again in the array).

Also, where VBDOS has the VAL function for converting numberic literal stored in strings to numberic values, BBC BASIC had the excellent EVAL function for actually passing something in a string to the interpreter, so you could say

 A$="SIN(x)"
then EVAL(A$) would return whatever SIN(x) would return. You could even place a string expression in a string and assign EVAL of it to a string. The programme would stop with the dreaded "Syntax error" if the result did not make sense. Sinclair BASIC had VAL and VAL$ which returned numeric and string values respectively, since it was more fussy about what a function returned, and a different "Nonsense in BASIC" error which could only be invoked from a VAL or VAL$ expression, since syntax errors were trapped as programme lines were entered.

Now, Sinclair and BBC BASICs both used to complain if you tried to read a variable before assigning a value to it. (The exception being a statement such as

 count=count+1
which would never cause an undefined variable error even if count had not been initialised, since it would create a variable called count on seeing the beginning of the statement.)

Microsoft evidently don't think this way, they have abolished the "no such variable" error and any attempt to read the value of a variable which has never been assigned returns (without any error message) zero or a null string. And this even works for arrays, probably even multi-dimentional arrays.

Result: you mistype a variable name in one place, VBDOS assigns a null value to the variable and your program goes belly-up. And since you mistyped the variable name you can't even search for it, since you don't know what you are searching for.

-- submitted by Sparky.

Um, in perl(-w)'s defense it will warn you about variables which are only named once. These are likely the typos that Sparky complains of.


In PythonLanguage nothing like this is built in, but you could implement something like this using the __getattr__() (and __getitem__) magic methods:

 class AutoObject(object):
     def __init__(self):
         self.__store = {}

def __getattr__(self, name): # automatically create objects accessed via an_auto_object.name # Only get to this if normal lookup failed obj = AutoObject() setattr(self, name, obj) return obj

def __getitem__(self, name): # Automatically create objects accessed via [name] if they # don't exist already. return self.__store.setdefault(name, AutoObject())

>>> a = AutoObject() >>> a['test'].foo.bar[1].name = "blah" >>> a['test'].foo.bar[1].name 'blah'
All the intermediate objects would just be created. However, it can't infer anything about types, so all the intermediate objects are just AutoObjects.

However, most Pythonistas would probably be 'very' glad that Python doesn't do anything like this by default! Without AutoObject you would get an exception for each dot and [] apart from the last one. There is one library I know of, BeautifulSoup?, that does actually use dots like this to create a DSL to query an HTML document, something like "html.table.li.a" to find all anchors in list items in tables in the document, which is quite neat.


Actually I came here looking for information on pitfalls to avoid. At my job we have experienced problems when we do this:

if (defined .....) { }

I don't have a good example, which is what I'm looking for. The problem we encountered is that we were checking to see if something was defined, but because of auto-vivification it BECAME defined on the fly! Which was bad. We had to change those calls to if (exist... calls instead of if (defined... and that fixed the problem...

But can someone explain specifically when an if (defined... check will fail unexpectedly (if you don't know about auto-vivification)?? Thanks!

-

Autovivication will not become a value to be defined on the fly; the final value is still undefined. What can get you in trouble is the following sequence:

 my $x;
 if (defined($x->{a}->{b}))
 { ...code that doesn't run...}
 if (defined($x->{a})
 { ...uh-oh, code runs and you didn't think it would... }

Of course it's never as bald as that, except maybe the first time, but if you've got a data structure that gets passed every which way it's easy to end up with this situation.


CategoryPerl


EditText of this page (last edited November 11, 2006) or FindPage with title or text search