Edit Distance

The edit distance between two strings is the number of changes you would have to make to one to turn it into the other, where a change is either the insertion of a symbol, the removal of a symbol, or the replacement of one symbol with another. It is a widely-used metric of string similarity, used in spell-checkers, DNA sequence analysis tools, etc.

Also known as Levenshtein distance, after the first person to formally define it and give an algorithm for computing it.

See http://www.merriampark.com/ld.htm

Contrast with HammingDistance.


And I though it was the number of edits needed to make two wiki pages agree. But that would be WikiDistance?, or not?

Isn't WikiDistance? the number of clicks needed to go from page A to page B? [Here you are referring to the WikiBrowseGame.]

I thought it might be amusing and/or interesting to compute Levenshtein distances for each Wiki name to each other one. "It'll only take a few hours", says I. "What's the point having a CPU, if I don't give it some work?".

Code at http://www.t8o.org/~mca1001/cgi/viewcvs/doodles/c2-all-Levenshtein.pl for the benefit of fellow doodlers.

Sample output below. Format is "word1 => word2(distance=percent-of-length-of-word1) word3(....)".

  ConstructiveInterference => NonInterference(10=0.416) ContainerIndependence(12=0.5) ContextIndependence(12=0.5) InteractiveExcellence(12=0.5) JavaNativeInterface(12=0.5) AbstractionInversion(13=0.541) ClientPresence(13=0.541) ConstantineOnPeopleware(13=0.541) ConstructConvincingArguments(13=0.541) InteractiveScreens(13=0.541)
  ConsumerSoftwareAndEvo => ComponentSoftware(12=0.545) ComputersAsTheater?(12=0.545) ComputersAsTheatre(12=0.545) ExtremeSoftware(12=0.545) OakTreeSoftware(12=0.545) SoftwareHandbook(12=0.545)
  ContainerIndependence => ContextIndependence(5=0.238) ContainerManagedPersistence(10=0.476) ContainmentBuilding(11=0.523) NonInterference(11=0.523) OneStartlingSentence(11=0.523)
  ContainerManagedPersistence => ComponentManagedPersistence(6=0.222) InheritanceManagedPersistence(9=0.333) ContainerIndependence(10=0.37) PackagedPersistence(11=0.407) ContextIndependence(14=0.518) OrderManagementSystems(14=0.518)
  ContainmentBuilding => FlatironBuilding(8=0.421) ConTention(10=0.526) ContinuousTesting(10=0.526) SortAndBuild(10=0.526)
  ContextIndependence => ContainerIndependence(5=0.263) CommentingChallenge(10=0.526) NonInterference(10=0.526) OneStartlingSentence(10=0.526)
  ContextSensitiveSubtyping => ContinuousTesting(13=0.52)

Note that the WikiMines' data is from June 1999, so there will be dead links above. They should probably stay dead.


CategoryMath


EditText of this page (last edited November 20, 2004) or FindPage with title or text search