Monday, October 27, 2014

“K”-value

"K"-Value (IEEE Electronic Devices Society problem)

 

Problem description

The K value of a sequence is defined as the Kth smallest value of that sequence. For example, the 1-Value is the smallest element while the M-Value, where M is the length of the sequence, is the biggest value.

Given a circular sequence S of N integers, and the parameters M < N and K <= N, we are asked to compute the smallest of the K values of each of the N subsequences of M successive elements of S.

Input is given in the following form:
The first line contains N M K (separated by spaces)
The second line contains the sequence S, elements separated by spaces.
 

Example

If S = [1,3,5,3,2,4,5] (N = 7), M = 4 and K = 3, then N subsequences of length M and their K-values are:
  • [1,3,5,3], 3
  • [3,5,3,2], 3
  • [5,3,2,4], 4
  • [3,2,4,5], 4
  • [2,4,5,1], 4
  • [4,5,1,3], 4
  • [5,1,3,5], 5
The minimum K-Value (and the solution of the problem) is 3.

The naive solution

We could take the N subsequences of length M, sort them, and record the Kth value. Then we take the minimum of the K values.

This solution has complexity O(N*M*log(M)). If we assume the sort has complexity O(M*log(M)), then we have to:
  • Copy M elements from the sequence S to a temporary buffer (O(M))
  • Sort this buffer (O(M*log(N))
  • Do this N times.
Also note that the constant factors are really bad, as we always perform all these steps.

The following Python code illustrates this solution:

S += S # paste S with itself so we can pretend it is a circular list

for i in xrange(N):
    S0 = S[i:i+M]   # O(M) copy
    S0.sort()       # O(M·log(M)) sort
    k0 = min(S0[K-1], k0) if i > 0 else S0[K-1] # O(1)

print k0

This solution timed out on most test cases.

A better solution

We can devise a better algorithm if we take into account the these facts:
  • Each successive sequence of M elements differs from the preceding one in one element only. We can see this in the example above: the first element is removed and a new one is inserted at the end.
  • When we sort the M-element subsequences, we don't really care about the elements from K+1 to M.
This solution can pass all test cases without getting a "Timeout". It is essentially the same as the naive solution, but with some optimizations:
  • When we sort the first M-element subsequence, we keep only the first K values. Let's call this sorted sequence S0
  • Instead of sorting every new subsequence, we modify S0, using in-order insertion and deletion.
  • As we sweep through S, se can ignore all values greater than the current k-value, as these cannot influence the result.
The core of the code is as follows:
# K-Value, by Juan I. Carrano | IEEEXtreme 8.0 (2014)
import bisect
S += S    # paste S with itself so we can pretend it is a circular list

s0 = S[0:M]    # Select the first M-element subsequence
s0.sort()
del s0[K:]     # keep only the first K elements

mink = s0[-1]  # the first K-value is our current minimum k value
mink_real = mink  # mink is not always valid. mink_real is the real minumum k-value

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect.bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    raise ValueError


for i in xrange(N):
    n_in = S[i+M]    # the number that is added to S0
    n_out = S[i]     # the number which leaves S0

    if n_in <= mink and n_out > mink:
        if len(s0) == K:
            del s0[-1]
        bisect.insort_left(s0, n_in)
        mink = s0[-1]
    elif n_in <= mink and n_out <= mink:
        del s0[index(s0, n_out)]
        bisect.insort_left(s0, n_in)
        mink = s0[-1]
    elif n_out < mink:
        del s0[index(s0, n_out)] 
     # mink is a k-value only when s0 has M elements
    if len(s0) == K:
        mink_real = min(mink, mink_real)

print mink_real

Let's go over it:
  • We start with the first M-element subsequece. We sort it, and keep only the first K elements.
  • The last element of S0 is the first K-value.
  • Each subsequence differs from the preceding one in exactly one element. A number is removed and replaced by another one. We call these numbers n_out and n_in.
  • Instead of taking the next subsequence (M elements), sorting it, and extracting the Kth element, we can use the current S0, which is sorted, and update it with n_in and n_out:
    • If n_in is greater than the greatest element in S0 (called mink), we can ignore it.
    • If n_in is less or equal, we must insert it in order. We use binary search (O(log(n)), so insertion time is dominated by the insertion itself (O(n)). We must also remove an element.
      • If n_out is greater than mink, it is ignored. The element of S0 that is removed is the greatest one (ie. the last one).
    • If n_out is least than or equal to mink, it means it is in S0, and must be searched and removed
  •  Note that it can happen that S0 is left with less than M elements: if the incoming number n_in is greater than the last element of S0 and n_out is smaller or equal. This only means that the missing elements at the end of the list would be greater than the current minimum k-value, so they can be ignored.
  • The last element of S0 is a K value only if S0 has M elements. mink_real keeps track of the minimum k-value, while mink tracks the last element of s0.
  • The solution is in mink_real.
The code is "Pure Python". Binary search is implemented by the bisect module (which is also pure python).
On hackerrank, test case 6 takes the longest to complete: 4.13 seconds, while the maximum allowable time is 10s for python submissions.

Code

The code is available here.
You can also download the code for  the naive solution.

Testcases

Easy test case and solution.
Difficult test case and solution.

Access all files in this Drive folder.