Leased String

(moved from LeasedStrings to avoid the WikiNamePluralProblem)

In CeeLanguage, strings are TerminatedStrings? -- there's a null character that tells you where the end is.

In PascalLanguage and some BasicLanguage implementations, strings begin with a byte or word (integer) that tells you how many characters are in the string. This initial byte/word is called a "leasing" byte/word, hence LeasedStrings?.

In case it's not clear, leased strings are generally considered inferior because they limit the possible length of a string (to 255 or 32K bytes).

In case it's not clear, leased strings are generally considered superior because they allow O(1) determination of length, thereby making things like automatic bound checking practical, and they allow embedded null characters since they do not rely on in-band signals to describe meta-information.

For in-memory representation, size_t will always be sufficient. For serialization, I suppose you might want some sort of bignums (but pointer-serialization only adds useless complexity; for machine-independent purposes your pseudopointer will need to be a bignum anyway). Be careful: a common bignum representation is limited to the range [0, 256^{2^32-1}). (Okay, okay, there aren't enough bits in the universe to hold a string that long...) Oh, and last time I checked, officially UTF-xx can't handle over {8:31, 16:20_and_a_bit, 32:32}[xx] bit numbers, but UTF-8 can be extended to arbitrary length by allowing bytes FE and FF (which UTF-8 avoids for easier detection of BOMs).


ForthLanguage calls it a "counted string". They are used for symbol storage, such as word names in the dictionary. However, most AnsForth string routines use the more flexible (address,length) tuple. The word COUNT converts from a counted string to an addr-len string. Most stack operators have a two-cell equivalent (e.g. DUP -> 2DUP), which are handy for dealing with these tuples.


See StringWithoutLength, NonNullTerminatedString


EditText of this page (last edited March 14, 2012) or FindPage with title or text search