The first is about the normal form of a sudoku. The rules of sudoku have a huge symmetry group of order (3!)^8 x 2 x 9! = 1.22E12. It is generated by permutations of rows in a group of three rows, permutations of these groups, same thing for columns, transposition of the grid and permutations of the numbers 1,...,9 which are just labels for nine different things. So for each grid, there are 1.22E12 grids which are basically the same. Is there an easy way to determine if two puzzles are related by the symmetry and even better, is there a normal form (a distinct element for each orbit)?
There is a trivial solution to this problem: You just write out all 10^12 puzzles you obtain by acting with the symmetry group and then sort them according to some lexicographic ordering. This gives a normal form.
What I am asking for is a more direct construction. One which sounds more like: Use the 9! permutations of the symbols to make the first row read 123456789. Then use row permutations to make the square below the 1 as small as possible. Then use row permutations to make the next squares under that square as small as possible... Unfortunately, starting to permute colums screws up what was achieved with this ordering prescription. So how would a better algorithm read?
The other problem is how to rate the difficulty of a puzzle? Question one really applied after the puzzle is solved, this question is about puzzles to be done. Newepaper which publish these puzzles often give ratings such as "simple", "intermediate", "hard". But I found these ratings differ significantly between papers (what Zeit online considers hard is much much easier compared to what The Guardian calls a hard sudoku) but are also not consistent amongst themselves.
Earlier, I have talked about the perl program I wrote to solve sudokus. It recursively figures out for all squares which numbers are still allowed and then takes the square with the least number and tries to put these numbers there. If there is an empty square with no allowed numbers remaining it backtracks.
Thus the search can be represented by a tree where each node represents a square to be filled out and there are as many branches from that node as there are numbers which are not yet rules out. What I am looking for is a numerical rating for a puzzle which is a predictor of how hard I find to do the puzzle and for example correlates with the time it takes me to solve it. Even if I use a different strategy when doing these puzzles by hand I would expect the information could be obtained form the tree. Do you have any good idea for such a function from trees to the reals, say? Obviously the trees all have a depth given by the number of empty squares in the puzzle and each node can have at most nine branches but typically has much less (even for "hard" puzzles most of the nodes have only one branch).
An easy guess is of course the number of nodes or the number of leaves but I found those at least not be proportional to my manual solution time. To give you an idea: Today's hard puzzle from Die Zeit
has 52 nodes (four times the program encounters situtations with two possibilities, all others are unique or dead ends, manually it took me exactly 6:30) while
has 2313 nodes and took me well over an hour some months ago.
Of course, if early on you have several possibilities and learn only much later which ones do not work this is much worse than having many possibilities which are ruled out immediately.
UPDATE: In case anybody is interested, I put up the decision tree for the difficult puzzle.