Wiki Word Statistics

A search for each of these words against the search database available on 7 March 2000 gave these results

The database contained 7546 pages.

It's interesting to compare those last two separately, rather than combine them.

Please append to, rather than modifying these figures, so that we can compare against them at some later date. My guess would be that in, say, a years time, the XP pages will be a lower proportion of the total, since the WikiMind will have drifted elsewhere. --KeithBraithwaite

May 12th, 2001

The database contained 15,289 pages. Searching for word hits didn't work too well, as the size of the resulting pages caused network dropouts.

The database grew by 102%. ExtremeProgramming grew by 75%. XP grew by 104%, Wiki grew by 161%, and patterns by 27%.


It is a part of usual LanguageOrientedProgramming practice to look at the words that are used in a system. I had postponed this for a while (being new to the Wiki), but now I did.

The list at the end of this page is the first part of the output of processing wikiList.

If you do this on a software API you usually find something interesting. Special words, redundant words, wrong words ... but at first sight, I didn't find anything of significance.

Of course, if you look at the first few lines (strip some simple words) you find what this Wiki is about: Wiki Programming Patterns Extreme.

But when I read on, I felt like a shaman priest having thrown a bag of bones to read the present and the future:

to name just a few. Just try it! Perhaps some expert can read and interpret this. I'm unable to. -- HelmutLeitner

If you play a little loose, the very first few words sum up Wiki pretty well:


See also WikiMines.


On a similar note, I am trying to divise a way to determine the "centres" of a wiki (or things similar to wikis). My best attempt so far has been http://usemod.com/cgi-bin/mb.pl?ShortestPathPages. -- SunirShah


As one might expect, the number of occurrences of a given WikiWord per WikiPage obeys a PowerLaw. This hypothesis was tested in March 2003 with 724 pages containing the WikiWord "UnitTest". A LogLog plot of the count of the pages with a given number of occurrences of "UnitTest" was created. The values are linear for the first two orders of magnitude, though they diverge from the ideal value as the number of occurrences of "UnitTest" per page increases:

Linear regression yields r-squared = 0.936.

A second test with "ExtremeProgramming" and 1,189 BackLinked pages gave a similar result, with r-squared = 0.950:

Binning the data increases r-squared to 0.99+.


The original version of this list counted each entry twice. This has been corrected.

Files: 1 Found: 80843

Count Statistic:

 582 The
 541 Wiki
 463 Of
 293 And
 272 Programming
 247 Patterns
 234 To
 232 Extreme
 225 Is
 210 Xp
 208 In
 202 Pattern
 189 Software
 172 For
 166 Java
 163 Language
 161 Design
 146 Object
 126 Test
 111 Code
 110 Page
 108 On
  93 Web
  90 What
  85 As
  84 Category
  83 With
  82 John
  80 Are
  79 Not
  78 Smalltalk
  76 Unit
  75 It
  73 Discussion
  73 You
  72 Two
  70 Refactoring
  69 One
  68 Use
  67 Mc
  66 Do
  63 Group
  62 Project
  61 David
  60 How
  59 New
  58 From
  57 By
  56 About
  55 This
  54 Component
  54 Topic
  54 Work
  53 At
  52 Testing
  52 Vs
  50 Be
  50 Ejb
  50 Objects
  50 System
  50 Systems
  49 Development
  49 Dont
  49 Meeting
  48 User
  48 Visual
  47 Name
  47 Why
  47 Your
  46 Model
  46 Tcpg
  45 Management
  45 Michael
  45 Time
  44 First
  44 Good
  44 Process
  42 Class
  42 People
  41 Method
  40 Refactor
  39 All
  39 Com
  39 Data
  39 Free
  39 Just
  39 More
  39 Net
  39 Server
  39 That
  39 Three
  38 Architecture
  38 Link
  38 Mark
  38 Problem
  38 Value
  37 Big
  37 Book
  37 Interface
  36 Changes
  36 No
  36 Peter
  35 An
  35 Bill
  35 Cpp
  35 Jim
  35 Meta
  35 Source
  34 Challenge
  34 Programmer
  33 Books
  33 Case
  33 Dot
  33 Exceptions
  33 List
  33 Open
  33 Pages
  32 Change
  32 Engineering
  32 Mike
  32 My
  32 Robert
  31 Computer
  31 Dave
  31 Plus
  31 Principle
  30 Game
  30 Links
  30 Microsoft
  30 Oriented
  29 Pair
  29 Tom
  28 De
  28 Eric
  28 Go
  28 Methodology
  28 Story
  27 James
  27 Knowledge
  27 Mode
  27 Richard
  27 Steve
  27 Thing
  27 Way
  26 Bob
  26 Me
  26 Mind
  26 Space
  26 Up
  26 World
  25 Art
  25 Business
  25 Chris
  25 Example
  25 Form
  25 Function
  25 Law
  25 Real
  25 Stories
  25 Technology
  25 Vb
  25 Ytwok
  24 Information
  24 Martin
  24 Nine
  24 Or
  24 Paul
  24 Python
  24 Things
  24 Tim
  24 Too
  23 Alan
  23 Anti
  23 Ats
  23 Community
  23 Framework
  23 History
  23 Recent
  23 State
  23 Team
  23 Tests
  23 Text
  23 Thomas
  23 When
  23 Write
  22 Analysis
  22 Bad
  22 Delete
  22 Great
  22 Isa
  22 Life
  22 Make
  22 Metaphor
  22 Perl
  22 Thread
  22 Twenty
  22 Words
  22 Works
  21 Basic
  21 Beans
  21 Box
  21 Can
  21 Music
  21 Need
  21 Public
  21 Thousand
  21 Uml
  20 Based
  20 Exception
  20 Home
  20 Idea
  20 Languages
  20 Quality
  20 Science
  20 Talk
  20 Who
  20 Word
  19 Brian
  19 Coding
  19 Does
  19 Four
  19 Functional
  19 Green
  19 Jeff
  19 Once
  19 Review
  19 Rule
  19 Rules
  19 Self
  19 Should
  19 Smith
  19 Stone
  19 Users
  18 Abstract
  18 Before
  18 Common
  18 Interfaces
  18 Like
  18 Non
  18 Oo
  18 Out
  18 Scott
  18 Script
  18 Seven
  18 Together
  18 Tool
  18 We
  18 Writing
  17 Browser
  17 Classes
  17 Document
  17 Factory
  17 Implementation
  17 Little
  17 Ninety
  17 Plan
  17 Programmers
  17 Reuse
  17 Right
  17 Solution
  17 View
  16 Anonymous
  16 Bug
  16 Comments
  16 Components
  16 Considered
  16 Dead
  16 Distributed
  16 Hard
  16 Its
  16 Joe
  16 Know
  16 Leadership
  16 Mac
  16 Machine
  16 Multi
  16 Order
  16 Other
  16 Post
  16 Problems
  16 Program
  16 Question
  16 Questions
  16 Style
  16 Types
  16 Visitors
  15 Andrew
  15 Bean
  15 Card
  15 Content
  15 Could
  15 Dan
  15 Database
  15 Documentation
  15 Edit
  15 Enterprise
  15 Faq
  15 Fic
  15 Frank
  15 Games
  15 Gof
  15 Greg
  15 Grok
  15 Int
  15 Love
  15 Man
  15 Message
  15 Only
  15 Over
  15 Paper
  15 Please
  15 Point
  15 Power
  15 Reviews
  15 Side
  15 Simple
  15 Six
  15 Soft
  15 Solutions
  15 Stephen
  15 Success
  15 Think
  15 Tools
  15 Unix
  15 Using
  15 Ward
  15 Will
  14 Agent
  14 Application
  14 Bruce
  14 Computing
  14 Daniel
  14 Definition
  14 Effect
  14 Entity
  14 Flow
  14 Immersion
  14 Kent
  14 Kevin
  14 Line
  14 Methods
  14 Null
  14 Person
  14 Reading
  14 Requirements
  14 Roger
  14 Ron
  14 Search
  14 Star
  14 Thinking
  14 Tips
  14 Well
  14 Workshop
  13 Another
  13 Cant
  13 Cards
  13 Cee
  13 Clear
  13 Culture
  13 Developer
  13 Domain
  13 Don
  13 Doug
  13 Editing
  13 End
  13 Evil
  13 Examples
  13 Full
  13 Future
  13 Get
  13 Harmful
  13 Has
  13 Have
  13 Here
  13 Junit
  13 Lazy
  13 Learning
  13 Library
  13 Map
  13 Modeling
  13 Old
  13 Oopsla
  13 Planning
  13 Plop
  13 Principles
  13 Pro
  13 Resource
  13 Second
  13 Simplest
  13 So
  13 Task
  13 Type
  13 Van
  13 Wall
  13 Win
  12 Active
  12 Analogy
  12 Back
  12 Bell
  12 Best
  12 Binary
  12 But
  12 Client
  12 Control
  12 Corporation
  12 Cplus
  12 Editor
  12 Emacs
  12 Five
  12 Fix
  12 George
  12 God
  12 Human
  12 Ideal
  12 Inheritance
  12 Long
  12 Most
  12 News
  12 Quote
  12 Reference
  12 Research
  12 Sand
  12 Session
  12 Single
  12 Society
  12 Stuff
  12 Theory
  12 Tri
  12 Variables
  12 William
  12 Writers
  11 Age
  11 Better
  11 Between
  11 Blue
  11 Bugs
  11 Builder
  11 Charles
  11 Command
  11 Complex
  11 Context
  11 Continuous
  11 Cool
  11 Copy
  11 Cost
  11 Death
  11 Driven
  11 Ed
  11 Edward
  11 Factor
  11 File
  11 Frameworks
  11 Guide
  11 He
  11 Hot
  11 Hyper
  11 Integration
  11 Keith
  11 Ken
  11 Keyboard
  11 Lisp
  11 Memory
  11 Multiple
  11 Names
  11 Nature
  11 Org
  11 Play
  11 Plug
  11 Processing
  11 Small
  11 Spaces
  11 Standard
  11 Structure
  11 There
  11 Trial
  11 University
  11 Values
  11 Ware
  11 Where
  11 Zen
  10 Applications
  10 Architect
  10 Architectural
  10 Around
  10 Author
  10 Black
  10 Blocks
  10 Build
  10 Call
  10 Composite
  10 Crc
  10 Cultural
  10 Douglas
  10 Down
  10 Environment
  10 Evolutionary
  10 Evolving
  10 Experiment
  10 External
  10 Failure
  10 Fast
  10 Forth
  10 Groups
  10 Ian
  10 Institute
  10 Inter
  10 Issues
  10 Jean
  10 Larry
  10 Linux
  10 Load
  10 Never
  10 Nick
  10 Os
  10 Own
  10 Possibly
  10 Practice
  10 Product
  10 Projects
  10 Proof
  10 Quotes
  10 Ralph
  10 Read
  10 Really
  10 Replace
  10 Risk
  10 Rob
  10 Role
  10 Room
  10 Sam
  10 Servlet
  10 Short
  10 Silicon
  10 Study
  10 Thirty
  10 Threads
  10 Tree
  10 Very
  10 Visitor
  10 Without
  10 Xml


See also HowWeTalk, WikiStatistics


CategoryWikiStructure CategoryStatistics


EditText of this page (last edited January 9, 2007) or FindPage with title or text search