Coding Problem

Amusing coding problem. -- RonJeffries

You have an ordered collection, size N, of objects.

Given K, answer an ordered collection of K ordered collections of size approximately N / K, such that the first contains the first N / K objects, the second the second N / K, etc.

Example

  input = #(1 2 3 4 5 6 7 8 9 10)
  k = 3
  output = (for example) #(
#(1 2 3)
#(4 5 6)
#(7 8 9 10)
Note ... for some reason, dealing the objects won't do, we want the first N/K in the first collection, etc.


Solutions to date:

For those who are interested in the wide diversity of methods towards solving this problem, please see:


What is so hard about this? It can be done in C++ with an STL iterator and a bit of code. Procedurally. Has to be some trick, eh? To make the fact that they are ordered collections significant? -- MichaelFeathers

No, there's no trick ... but we had to do it today and found it hard to get a clean solution. RalphJohnson sent me a cute solution using a stream. I wrote a clean one using dealing, but haven't found something clear and clean that does it as described here. Use arrays, vectors, I don't care ... just make the solution really clear and clean. -- RonJeffries


Simplest thing I can think of is this:

Make vector Distrib with K elements. Elements of Distrib are either N/K or N/K + 1 st. sum(Distrib) == N. An expression using modulus can be used to calculate the actual distribution. The elements of vector Distrib now contain the number of elements that will be in each resultant collection. The input vector can be iterated so that each element is placed in the current collection. Use the elements of Distrib in order to decide when to make the next collection the current collection.

Are there clearer strategies, Ron? Actually, is this the right problem? If so, some STL algorithms can make this even clearer.

-- MichaelFeathers

I like the notion of making the Distrib vector: Kent's trick with the error value (which I have just noted not understood) fits in to that nicely. My real objective isn't met yet, which is in all the solutions the central loop isn't obvious. It's funnier than it looks, tho, isn't it?


See CodingProblemSmalltalkSolution


I see a way to make it dead simple, but it wastes a little space. No time to code it right now, but here is the idea:

  // K
  K = 3
  // The input vector
  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  
  // make a distribution vector: size = K
  [ 3, 3, 4 ]
  // make an offset vector from the distribution vector (size = K)
  [ 0, 3, 6 ]
  // from the offset vector, make a map vector (size = N)
  [ 0, 0, 0, 1, 1, 1, 2, 2, 2, 2 ]
Values in the map vector are the number of the collection that corresponding elements in the input vector belong in.

  map= [ 0, 0, 0, 1, 1, 1, 2, 2, 2, 2 ]
  input = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
Each step is a simple loop. For all I know, there could be an operator in some language like APL that solves the whole problem :-)

BTW, I really like Kent's rounding approach. I hadn't thought of that.

-- MichaelFeathers

I don't see the "simple loop" for the distribution vector. Think then we can make the map vector from the distribution one directly ... -- R

What about something like:

for (int i = 0; i < K; i++) Distrib [i] = N/K + (i < N % K ? 1 : 0);

The expression can also be written so that the N/K elements come before the N/K + 1 elements. True enough, a lot can be folded. I was thinking about how to make each step as simple as possible. -- mf


OK, I decoded that C this way:

 k := 7.
 n := 20.
 baseSize := n // k.
 excess := n \\ k.
 distrib := Array new: k.
 (0 to: k -1) do: 
[ :each |
extraItem := each < excess
ifTrue: [1]
ifFalse: [0].
distrib
at: each + 1
put: baseSize + extraItem]
which actually gave me a clue what was going on ... then back (in Smalltalk) to:

  (0 to: k -1) collect: [ :each | n // k + (each < (n \\ k) ifTrue: [1] ifFalse: [0])]
Thanks! Don't think we have a clear clarity winner yet ;-> -- R


See CodingProblemFairShare

Now, which is simpler and easier to understand?


The mention of dealing suggests to me another approach to add to the stew. I will just sketch the behavior - my Smalltalk is pretty weak.

#(0 1 2 3 4 5 6 7 8 9 10) deal: 3 =>
#((0 3 6 9) (1 4 7 10) (2 5 8))

(#(0 1 2 3 4 5 6 7 8 9 10) deal: 3) transpose => #((0 1 2) (3 4 5) (6 7 8) (10))
That's pretty close to a solution, although the last array is lacking. Maybe I should deal 4 (that is, n/k rounded up) instead:

#(0 1 2 3 4 5 6 7 8 9 10) deal: 4 =>
#((0 4 8) (1 5 9) (2 6 10) (3 7))
(#(0 1 2 3 4 5 6 7 8 9 10) deal: 4) transpose =>
#((0 1 2 3) (4 5 6 7) (8 9 10))
Hmm, that seems to put all the "longer" arrays at the front. Is this the same as the RemaindersAtTheFrontSolution? to CodingProblemSmalltalkSolution?

-- BillTrost

Now that one is COOL! -- Ron


Rule: Never post code without compiling it. (broken here)

Design params.

 /////
 // Comment giving several examples that are tested by the unit test

void SplitIntoBlocks?( int *start, int N, int K, int **KBlocks) { int **InsertionPoint? = KBlocks; for(int i=1;i<=N;i++) { // for every item in collection

if ( (i%k) == 0 && // if Item is at start of a new k block i+2*k < N)// and there are > two k blocks left { KBlock++; // advance to next block InsertionPoint? = KBlocks; }
 }
PS I have NOT tested this. There may well be fence post issues that I have not quite dealt with yet. I have posted this because it seems so much like the SimplestThingThatCouldPossiblyWork. What's the problem? Some specification descriptions are intrinsically complex so is the code. Trying to break it into lots of loops makes it to me more complex as I have to work out and remember the state that one loop computes, and then work out what the next loop does with it.

This code addresses the problem directly.

Another way to do this that is simple is. Note the choice between these two will be affected by many factors including that the second will be faster because the loops have less code inside them. The difference is almost exactly like the difference between Bresnham and Run length line drawing.

Debating about the details of simplest beyond this point is IMHO a religious war akin to {} placement.

Now for the HARD question, what is a sufficient number of units tests to test this in the XP methodology. (This ones for you Ron :)

I have been having trouble coming to grips with what it is exactly that the self-proclaimed XP enthusiasts are saying. At times, I get the feeling that they have been infected with the latest silver bullet fever. At times, I think I missed the boat. How about we get out of the abstract and say something concrete?

What is the XP way? Show us an example, not a toy one chosen to make XP glow. Try this one. Then and this to me is the acid test what are the required unit tests for this code?


Someone got a bee in their bonnet - see CodingProblemSolutionByProof.


A (beginners) J solution is:

group =: 4 : 0
  s=. >. x.%~#y.
  isInt=. 0&= @ |
  (s isInt y.) <;.1 y.
  )  NB. A good J'er would probably dash off a one-liner tacit definition.

Here is an example run:

c=:i.15 c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 n=:4

n group c +-------+-------+---------+--------+ |0 1 2 3|4 5 6 7|8 9 10 11|12 13 14| +-------+-------+---------+--------+
It seems that an obvious unit test, that n groups are returned, would not always pass. For example:
  1. group i.12
  +-----+-----+-----+-------+
  |0 1 2|3 4 5|6 7 8|9 10 11|
  +-----+-----+-----+-------+
  1. group i.12
  +-----+-----+-----+-------+
  |0 1 2|3 4 5|6 7 8|9 10 11|
  +-----+-----+-----+-------+


PythonLanguage:

 def split(list, num) :
for index in range(num) :
yield list[index * len(list) / num:(index + 1) * len(list) / num]

for subset in split([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 7) : print subset
Output:

 [1]
 [2]
 [3, 4]
 [5]
 [6, 7]
 [8]
 [9, 10]
Alternate solution (same result, doesn't require Python 2.2's generators):

 def split(list, num) :
return map(lambda x: list[x * len(list) / num:(x+1) * len(list) / num], range(num))

for subset in split([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 7) : print subset
The question is, why is it obvious from the code that this returns the right thing? If you have to explain it, or run through an example showing how it works, then I think Ron would say it's not clear enough. I don't just want to know how it works, but also why it works.

--- Same as above with list comprehension instead of lambda: def split(list, num) :

 [list[x*len(list)/num:(x+1)*len(list)/num] for x in range(num)]


The above solution is really great!

I've implemented my clunky solution in RubyLanguage. But check out the check_segments method of the unit test. It should accept all the possible ways to segment the array and reject all the wrong ones! I've included two different implementations which return 2 valid but different segments for an array. The first one is a copy of the python solution above. The second one is my original clumsy implementation.

-- AurelianoCalvo

 require 'test/unit'
 include Test::Unit

class TestSegmenter? < TestCase

def check_segments( array, number_of_segments ) segments = array.segment number_of_segments

# segments form the original array assert_equal array, segments.flatten

# correct number of segments assert_equal number_of_segments, segments.size

# array elements are evenly distributed across segments segment_sizes = segments.collect {|s| s.size} assert segment_sizes.max - segment_sizes.min <= 1 end

def test_segment check_segments( [1,2,3,4,5,6], 2 ) check_segments( [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 3 ) check_segments( [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 3 ) end end
First solution. Clean, nice, elegant, plagiarized.

class Array

  def segment(segs)
(0..segs).collect do |x|
self[(x * size) / segs)...((x+1) * size / segs)]
end
  end
end Second solution. Dirty, awful, grotesque, original.

 class Array
  def segment( number_of_segments )
ideal_segment_size = self.size / number_of_segments
number_of_spare_values = self.size % number_of_segments
segment = []
segments = []
self.each_with_index do
|value,index|
segment << value
if segment.size == ideal_segment_size + (number_of_spare_values > 0 ? 1 : 0)  
number_of_spare_values = [0, number_of_spare_values - 1].max
segments << segment
segment = []
end
end
return segments
  end
 end

As I remember from university, we could use a bit of basic number theory to make the dealing solution work:

Assume initially that N and K are coprime. Let src be the offset into the source array, and dst be the destination, both initially zero.

Then as each element is dealt, we increment src by K and dst by 1, but then we only use src mod N and dst mod K. We stop dealing when src gets back to zero.

Each dst should now contain the appropriate chunk of src

If N and K are not coprime calculate their hcf (famous algorithm) and copy hcf(N,K) each time instead of one? or do the above hcf(N,K) times, offseting the starting src by one each time.

BillWeston


EditText of this page (last edited November 8, 2008) or FindPage with title or text search