Su Doku

A logic-based placement puzzle, also known as Number Place in the United States. from http://en.wikipedia.org/wiki/Sudoku

See also IsTheBlankSudokuDiabolical?


I successfully ignored this activity for a couple of years. Unfortunately on a plane ride the sardine in the seat next to mine explained it to me. I am deeply disappointed and ashamed of the human race for SuDoku. It is more pointless than crosswords and Rubik's cubes. That people in millions spend uncounted hours on this ZapGun ... I despair for us.

Worse, I am beginning to believe that programming and finance are not qualitatively different to SuDoku. -- PeterMerel

Try this: The first time you ever see it, a mildly retarded guy on the train is trying to solve one, across a table from you, and keeps flicking eraser dust in your general direction. Instant loathing for SuDoku, even before I knew the name. -- PhlIp


SuDoku interests programmers because its AI problem is well-formed.

There are many programs that solve sudoku, in many programming languages. One of them which is worth studying is RogerHui?'s in JayLanguage at http://www.jsoftware.com/jwiki/Essays/Sudoku.

In addition to that, there are many programs that help solving sudoku. One great example of them is WardCunningham's sudokant at http://c2.com/~ward/sudokant.cgi.

Ward did it again! I bet the total lines of code would be not longer than a screenful size. If we asked an average programmer to write a program that helps solving sudoku problems, the result code would be longer than a few hundred LOC, I bet, too. Also, I bet the features that the program provide would be inferior to Ward's in terms of the values that they create in users, for example sharing the state of the game with friends and undoing the move.

I am always amazed how much enlightening and inspiring all the Ward's programs are, without many words. I sometimes feel that his each and every program is itself a full paper and tells so much without speaking. As always, he does so much with so little.

Also recommended to study are his programs at WriteSmallButUsefulProgramsEveryDay.

-- JuneKim

June is very kind to me. In return, I must offer up the source and some examples (one that sudokant solves directly, and one that needs at least one non-trivial choice by the user). See below. I should also mention that this solution aid is specialized to my desire to experience difficult puzzles quickly. I was interested in the distribution of choices as one progressed through easy, medium, hard and diabolical puzzles. -- WardCunningham

Ward's code applies two basic rules:
  1. Each row, column and square must contain each digit at most once.
  2. Each row, column and square must contain each digit at least once.

If any position is shown empty, the puzzle obviously cannot be completed. If any row, column or square has no position available for some digit, the puzzle cannot be completed. A useful enhancement would be to highlight the latter situation, as it isn't immediately obvious. Tip: bookmark a game before you guess or open a guess in a new window so that you can easily return to try an alternative.


In the following, "diabolical" is used as a technical term for grids that appear not to fall to straight-line reasoning, but appear to need a "try this and derive a contradiction" approach. The question is how to devise such grids. Simple-minded techniques appear not to guarantee such complexity.

Now read on ...

Anyone know how diabolical grids are devised in the first place?

An obvious suggestion:

I think it's a bit more complex than this. The key distinction of a "diabolical" grid is that it does not yield to application of the normal set of rules. It is thus necessary to guess the value of at least one "pair" cell (a cell with only two possible digits). If the subsequent conflict is discovered, then the guess was wrong and the other number is known to be correct. The other aspect that the above approach evades is how to accomplish the symmetry that characterizes a well-formed Sudoko starting grid.

Here is a diabolical sudoku, according to the above definition (from "Sudoku For Dummies", by Heron James, ISBN 0-470-01892-5 ):

Specifically, I think a guess is required at or after the following completion state: I submit that it "does not fall to straight-line reasoning", no matter how deep. I don't see how the above process assures that the result requires at least one such "guess". Here's an exercise for the author of the above recipe: try your recipe, and post the allegedly-diabolical result. You can c&p the URLs from Ward's sudokant to show the steps you took. I'll then see if I can solve it without guessing. -- TomStambaugh

I have not yet found a grid that does not fall to straight-line reasoning. Your example proceeds with:

The above reasoning starts with the guess that (9,4) might be 6. Hence it is not straight-line reasoning in the sense intended in the definition of diabolical.

Perhaps not, but to me it is. There are two choices, and I'm trying to find a simple reason for it to be one or the other. (9,4) has only two possibilities, 6 and 8. I can see that it has to be 8, because if it's 6 I get a contradiction. That, to me, is simple reasoning. I agree that such reasoning may not be what was to be avoided in the "definition of diabolical".

You correctly say there are two choices for (9,4), but your words "I'm trying" show that you then apply guesswork rather than reasoning. Even when 6 has been eliminated, the problem isn't solved, so you have to make another guess.

Additionally:

This falls to the following reasoning. I can see why some people might regard this as trial-and-error, or back-tracking, but it's pretty shallow. Now we're into arguing about definitions. I would say that "straight-line-reasoning" of the form is a form of back-tracking, because it's the same as putting the 3 in some place, seeing that it violates a rule, then backing out.

I think I invoke LaynesLaw.



What follows needs tidying - I've hacked things around a lot, but need to stop - I have urgent work to do. Tom, Doug, please tidy/refactor at will.



I see no necessity to guess in a "pair" cell for a diabolical puzzle, since another guess may suffice to make the puzzle solvable without further guessing. In the diabolical puzzle given above, for example, trying (9,9)=9 leads directly to the solution without further guessing. I don't understand why Colin used "straight-line reasoning" to refer to a process that starts with a guess (and needs at least one further guess to obtain a full solution).

You haven't, thereby, shown the solution to be unique. I refer to my reasoning process as straight-line because I don't think of it as starting with a guess. Your mileage may vary, but I speak as a mathematician for whom proofs are part of a daily diet. Programmers may think differently.

There's no escaping the fact that your decision as to which value to investigate first is a guess. It's trivial that any sudoku problem (or any other problem with a finite search space) can be solved by a sufficiently lengthy guesswork process. Your method requires that you investigate every choice (except those determined by the basic rules) at every stage in order to prove uniqueness, and (if you are unlucky) you may have to do just that before you find the solution. Hence you are relying on trial and error; the straight-line reasoning you employ does reduce the number of trials you need, but the overall method is still one of trial and error. Whether the solver is a programmer or mathematician (or both) is irrelevant.


I don't know which voice is "Colin". By "pair" cell I mean a cell that can have one of two values. I use the term "twin" or "twinned pairs" to mean two pair cells in the same group, row, or column. I think you'll find the diabolical sudoku I gave above will not yield to (pure) "straight-line reasoning". If it does, please share the rule you used! I'm ready to go as deep as you want, subject to the constraint that at no time do I use trial-and-error to determine the digit of a pair cell. Proof-by-contradiction (guessing a digit in order to see if it leads to a contradiction) is what I mean by "guessing". I define "diabolical" to be a sudoku that requires at least one such guess for its solution.

I know of no combination of rules, no matter how deeply applied, that can avoid needing to guess that the pair cell at row 1, column 5 has a value of "3". If someone can tell me such a rule and how it should be applied to this example, I'd love to hear it; I hate having to guess.

-- TomStambaugh

Strictly speaking, use of pair cells is not critical, since a puzzle need not have any pair cells.

But the entire puzzle is proof-by-contradiction. If you can't use proof-by-contradiction, there is no way to know that the value at row 1 column 5 must be a 3 or a 9. Try to prove that it is a 3 or a 9 without proving, for example, that it is not a 5. -- DougKing

Of course it's proof-by-contradiction. It isn't that you "can't" use proof-by-contradiction. Instead, some puzzles solve by direct, though sometimes deep, application of relatively simple rules. There exists a path, from start to finish, along which there is at least one cell for which one or more correct choices can be derived from application of the rules based on the current state. In "diabolical" sudokus, that path comes to a point where there is no way, other than a coin toss, to choose between two digits (yes, of course, you can pick one from a cell with more than two, but picking one from two guarantees that a conflict in one proves the correctness of the other). In these what-if scenarios, there is simply no shortcut besides playing out the rules and either solving the puzzle or reaching a dead-end.

In the example above, please tell me the rules that you apply - other than applying brute-force - to deduce that the value at 1,5 is "3". This is in contrast to, for example, cell 2,2 at the beginning of the puzzle, in which one can say that:

  1. Cells (1,2), (1,3), (2,1) and (2,2) are occupied
  2. Cells (1,1), (2,3), (3,2) and (3,3) cannot be a "1" because of the "1"s in cells (1,6), (3,8) and (8,3).
  3. Therefore, Cell (2,2) is the only cell in the top-left group that can contain a "1".
  4. Therefore, the value of Cell(2,2) is "1"

The alleged recipe claims to produce Sudokus that are always diabolical. I don't see how - in fact, it seems to me that it produces sudokus that are NEVER diabolical, because of the third and fourth steps. I argue that the recipe, by construction, can be reversed and so the result is not diabolical. Don't forget the symmetry constraints as well, and don't forget that the result must be unique.

-- TomStambaugh

Ah, no, the recipe wasn't intended to produce only "diabolical" grids, a term I'd never heard used in such a formal manner. The method can be modified slightly to produce "diabolical" grids. My apologies for the misunderstanding - that terminology is new to me.

The Sudoko book I referenced (there are at least two volumes in the series) uses "diabolical" to label its hardest section of puzzles. As nearly as I can tell, having worked all of them (sigh), they each have the property I'm defining here. That's the origin of my attempted definition. -- TomStambaugh

The method can be used to create "diabolical" grids, but does require modification. Please edit this text at will to make your question clearer and demonstrate the inadequacy of the method outlined.

Colin, I've tried to create sudokus, and it's much harder than your method suggests. For one thing, although starting with a full grid seems appealing, such a start makes it quite hard to identify squares that you can both erase (for symmetry) and also prove uniqueness about. Thus, the recipes that work better (I think) start with an empty grid then simultaneously forward- and reverse-chain towards a solution. I think it's essentially impossible to do the chaining without a program. I'm still unclear about how you prove uniqueness. There's another related question, which is to prove that the starting point is not only unique and symmetrical, but also minimal (though that may not be a requirement). -- TomStambaugh

Tom: I'm sure you're right. Chaining fwd and bkwd is basic to most optimization problems, and this is strongly related to such. I have no arguments with what you say, but regard most of this as refinements (sometimes non-trivial) of the initial idea.

Hmmm. The initial idea starts with a full grid. Since every minimal sudoku is unique, it seems to me that your recipe assumes the solution to the problem it attempts to solve. That's why I think you need to start with a) an empty grid, b) a desired symmetry and c) some arbitrary initial digit choices. See http://www.pro.or.jp/~fuji/sudoku/makesudoku/sudoku01.html.en for lots more information on creating sudokus.

In reference to the example you gave, if 9-across 4-down is 6, then (4,4)=9 => (5,5)=3 => (5,1)=9 => (9,2)=9 which gives 8 no-where to go in column 9. That's very similar to one of the techniques my process can use to find a square to erase. Sometimes the hardest puzzles are simply those set by someone with a different toolbox. That's not to deny your points, which are probably right.

But isn't the choice of 6, as opposed to 8, in (4,9) another "guess"? If you choose 8 instead, I think you end up needing to make another guess - the puzzle is still "diabolical". Is there a rule that leads you to examine the 6 versus the 8?

Doug: When a sub-square is full but for one place, "proof by contradiction" is a correct description of the process, but there's a more obvious direct argument. Tom is clearly asking about grids where you come to a place where direct arguments appear to fail. His example is a good one.

-- Colin

I meant only to show that the algorithm and rules are the same, and that it is only the depth of application which is different. The difference between "guess" and "direct" seems to be determined by the depth of application that can be kept without resorting to memory outside your head. (Not that I'm claiming to be able to solve Tom's example in my head!). It seems to be an example of when a quantitative difference leads to an apparent qualitative difference.

Whether or not it can be done from memory, can you enumerate a sequence of rule applications that unambiguously lead this example to its conclusion without, at some point, making an arbitrary choice between N equally likely digits in some cell? I don't try to do these in my head, but I do have a suite of rules that I apply. I haven't been able to find any combination of those rules that solve diabolical puzzles like this, no matter how deep. If you can offer such a sequence for this example, I would greatly appreciate it. -- TomStambaugh

By the way, there is another "direct" technique that leads to some progress: By examination of column 8, it can be seen that either (8,1) or (8,2) is a 6. This means that (7,1), (7,3), and (9,2) cannot be a 6. -- DougKing

Yes, I use this rule. I think it offers only local and minor help, though. Specifically, it doesn't identify any new digits, at least that I can find. -- TomStambaugh

Tom, this is the best I can do. Using (column,row) notation, consider where '9's may occur in the top middle and top right boxes.

Hence, we must choose the first alternative above, and it turns out that the full solution follows by repeated application of the basic rules.

We seem to be reaching a consensus. It appears that in your most recent effort, you chose to consider the nines in top middle and top right boxes. Having made this intuitive choice (no rule!), you then considered two possibilities - a 9 in (7,1) and a nine in (9,2). You then iterated to derive a conflict in the latter. Meanwhile, if you use Ward's totally-cool sudakant, you'll also note that your choice of a 9 in (9,2) immediately forces (5,1) to also be a 9 - one of the two equally-likely choices I offered above in identifying (5,1) as a pair-cell to use in discriminating the outcome. The point I'm probably belaboring is that there seems to be no "rule" to get us, in one step, deterministically past this "sticking point" in the puzzle; instead, we must make an arbitrary or intuitive choice, assume a value, and then look for success or conflict. This need to make an essentially arbitrary choice, without a deterministic rule governing the outcome of that specific choice, is what I mean by "diabolical". Some sudoko's (like this one) have this property, some don't. The original question was "Anyone know how diabolical grids are devised in the first place?" I remain unconvinced that the putative sudoku generator produces any sudoku's that conform to the symmetry and uniqueness constraints, never mind diabolical ones. -- TomStambaugh

Tom, that wasn't Doug's solution, but mine (and I'm the one who originally asked how diabolical sudoku grids are devised). I was aware that choosing a 9 for (9,2) forces (5,1) to be 9. I was applying a slightly vague rule that says that choices that have plenty of consequences deserve investigation, especially if you spot that one alternative is easily ruled out by a short chain of deductions (I don't think "brute force" is a good term for using such a chain). A deterministic version of this rule exists (investigate every remaining choice by applying the basic rules, and eliminate the choice if it leads to a dead end), but is tedious to use manually (one doesn't know in advance how many times one will apply the basic rules) and doesn't have the simple form (find a pattern, make deduction associated with that pattern) of the rules normally used. It's a trial and error process, of course, but sufficiently limited to be of practical use to a manual solver for some puzzles, especially if one additionally puts a fixed limit on the depth to which each possibility is examined. The deterministic version is guaranteed to solve any puzzle if applied recursively, but that is usually too tedious for manual solving. My original choice of what to consider was not entirely intuitive (I looked for places where there were few ways (preferably just two) of placing a particular digit), but I agree that there was still an element of guesswork. For some sudoku grids, the non-recursive use of the deterministic version of my rule doesn't allow any cells to be determined. There may be some grids for which such use doesn't even allow any possibilities to be eliminated.

I use a similar heuristic to decide which cells to try when I reach a dead end. I prefer pairs to cells with more than two choices, because I know that I'm assured at least one correct result (if I reach a conflict, I know the other choice was correct). I then look for "twinned pairs", with the same pair of numbers in the same group, row or column. Even better are "chains" of twinned pairs that span multiple groups, columns and rows. In general, I try to maximize the number of digits I determine in any one choice. I've therefore learned to recognize patterns of multiple cells that have only one or two possible states, even though they are comprised of many cells. The advantage of these is that a twinned pair that intersects one of these "pathways" will thus solve a great number of cells (often the whole puzzle) with just one "what-if" (perhaps that's a nicer name than "guess"). I think your original question is fascinating, though - how does one assure that a given soduko is "diabolical", as opposed to simply hard. I should add, by the way, that I have the perhaps unfair advantage that I own a pretty good, pretty fast copier. So when I reach one of these barriers, I choose a pair-cell, note the cell index on the page (I use my own Sudoku paper for each puzzle), then make two copies of the partially-completed puzzle. I pick what I think is the more-likely digit in the pair, and work it to either a dead end or a solution. If I reach a dead end, I restart with the other piece of paper. If I reach a second dead-end, I work the alternate choice until I reach a dead end or solution. I haven't yet found a sudoku that hasn't yielded to this manual approach. The reason I could offer my example is that I keep my worked puzzles, so I just grabbed a convenient copy at the choice-point. -- TomStambaugh

Here's a particularly awkward diabolical sudoku for you (based on one found using Google).

Thanks, I'll try working it. I note that this sudoku is not symmetrical; in my view, a non-symmetrical Soduko is similar to poem without a rhyming scheme or meter. Still a poem, but certainly a different kind of poem. The constraints of symmetry, like the constraints of rhyme and rhythm, contribute some ineffable quality to the result. In any case, it seems to me that this discussion naturally leads to an area that is more on-topic for the wiki: a property that StephenWolfram, in NewKindOfScience, calls "computational irreducibility" (ComputationalIrreducibility). A "diabolical" sudoku is a puzzle for which no "shortcut" to iterating through the rules exists. In a mathematical system, a closed-form solution is a similar shortcut to iterating through the repeated application of rules (theorems) of the system. To paraphrase Wolfram, ComputationalIrreducibility is the claim that for some - Wolfram argues "most" - systems, the computational effort of the search is the same or greater than the original iteration. He thus makes the claim that there is a certain ComputationalIrreducibility of any system. Until recently (in the last few decades, perhaps), science has only had access to systems with such shortcuts; systems whose ComputationalIrreducibility is very much smaller than the originally-presented problem. We therefore tend to squeeze real-world problems into such well-behaved containers. As I understand the core of his argument, we now stand on the threshold of an ability to engage the very much larger set of "misbehaving" problems on their own terms. I think we engage this dilemma in a very direct way when we tackle problems like Sudoku; this explains, in part, its appeal to me. It also provides at least one reason why I think the extended discussion on this page actually is relevant to this Wiki. -- TomStambaugh


HasSudokantBeenWithdrawn? Yes, but it has now returned. -- WardCunningham


JanuaryZeroSix and repeated interest in MarchZeroSix


EditText of this page (last edited February 10, 2008) or FindPage with title or text search