Since the page was accreting incorrect statements, I've tried to add a more detailed explanation:
Goedel invented GoedelNumbering as a way of assigning a unique positive integer to every possible formula in a mathematical system, so that mathematical statements could be translated into numbers, which could then appear within other statements, allowing mathematical statements to become FirstClass values, which he then used to prove his two famous theorems.
The clever method he used for this encoding was to take advantage of the Fundamental Theorem of Arithmetic: every positive integer has a unique factorization. In GoedelNumbering, the factorization of an integer yields the statement that the integer encoded, and does so uniquely.
To do so, every symbol in the alphabet to be used for statements is assigned a unique number, e.g. "a"=100, "b"=200, "c"=300 (large values picked for clarity of exposition) etc., and every possible position in the statement is assigned a PrimeNumber, for instance, position 1 = 2, position 2 = 3, position 3 = 5, position 4 = 7, position 5 = 11, position 6 = 13, etc. The encoding is performed by taking the position encoding and raising it to the power of the symbol encoding at that position, and cumulatively multiplying together each such encoded position/symbol.
For instance, the message "bc" has "b" in position 1; "b" is represented by 200, position 1 is represented by the prime 2, so the first position/value pair is uniquely encoded by 2^200. The second symbol is "c", which is represented by 300, while position 2 is represented by the prime 3, so the second position value pair is uniquely encoded by 3^300, and the whole message is equal to 2^200 * 3^300, multiplied out in full:
21997612563412852819736854443172936461863101231358274561146636833168 58315086690275754329270589426383652921493514482045416477153318639590 74292125411622867441287958061576051884137642597711028147158438117376The original message can be recovered (in principle) by finding the unique factorization of that number, which has only one unique answer, 2^200 * 3^300, which tells us which symbols were encoded at which positions.
This example uses 200 to encode "b" and 300 to encode "c" merely because use of 2 and 3 would be visually confusingly similar to the encodings of positions 1 and 2, but arithmetically would cause no problems and would yield the correct answers. However, even with minimum encodings, even small statements tend to have very large GoedelNumbers?, frequently too large to be practical to work with, although factoring them is comparatively easy in principle because the factors come from a known "pool", so the factorization is tedious but straightforward.
Regardless, the purpose of GoedelNumbering is to allow statements to be made about statements.
A simple GoedelNumbering for a program written in ASCII text is to treat the entire program as a big base-256 number
But "any mathematical transformations" would include factorization, which doesn't lead to "utter rubbish". When dealing with large numbers, be tolerant of large amounts of arithmetic.
Hmmm. This bears some evident relation to the computation of CRC values. In computing a CRC, the entire message is treated as one large number (string-o'-bits) which is then divided (using digital slight of hand) by an appropriate prime polynomial, yielding a (nearly unique) remainder. One is never really interested in the "value" of the message as a number, but only in whether the math (performed again later, on retrieval) produces the same remainder.
Not directly a numerical interpretation, but evidently the consequence of this property.
Do I get an "amen" on this?
It bears some relation, but CRC values are not unique - it's quite possible to alter a file and keep the same CRC as a similar file
While this is true, it can be pretty darn hard to keep the length and essential content the same (given adequate CRC size), and the "similar" file would have to be really contrived -- which explains why checksums aren't worthless to begin with. No, the CRCs are not unique, but for a given length of text with substantially the same content, they pretty much are.
We're not describing self-correcting encoding schemes and Hamming distances. The point was that the numeric nature of the message can be exploited this way. The fact that it submits to math in any sort of useful way seems to relate to the "essentially unique number" property of otherwise non-numeric data.
GoedelNumbering is explained among other things in the book GoedelEscherBach.