CSC 454/654 - Spring 2020 - Programming Challenge 1 (for Assignment 2)

CSC 454/654 - Spring 2020 - Programming Challenge 1

Introduction: This is your first “programming challenge” assignment. There will be three of these throughout the semester, putting what you learn about algorithms into practice — solving real problems is the whole point, after all! Your solutions will be submitted to an automated judging system that will test it against a range of test inputs. You will get fast feedback on whether your program passed all the tests; however, if it fails a test you’ll only get an indication that it failed, and you won’t get to see the input or any other details. Your job is to think through the problem and make sure your program works on all possible inputs! In general you will be given two challenges — one will be a thinly disguised version of an algorithm that is given in the book, and for the other you’ll have to be more creative.

Challenge Specifics: For this challenge your job is to work with data sequences, where each data point has a “reliability rating” between 0 and 100. There are \(n\) values in the sequence, and the reliability values are given as a sequence \(\langle x_1, x_2, \ldots, x_n\rangle\). On this scale, 50 is “neutral,” while 0 is completely unreliable, and 100 is perfectly reliable. Your task is to write programs that determine which part of this sequence maximizes the overall reliability as determined by a given metric.

For example, the first metric simply adds up how much better each sample’s reliability is than the “neutral” value of 50. In other words, for any subsequence starting at sample \(i\) and going to sample \(j\) the reliability of this subsequence is \[ R_1( \langle x_i, x_{i+1},\ldots, x_j\rangle ) = \sum_{k=i}^j (x_k - 50) \]

Your program then should find which subsequence has the largest \(R_1\) reliability rating. One obvious solution is to try all possible values of \(i\) and \(j\) with \(1\leq i\leq j \leq n\) and calculate \(R_1\) for each. Since there are \(\Theta(n^2)\) possible \(i\) and \(j\) values, if you’re careful you can write a program that does this in \(\Theta(n^2)\) time (if you’re not careful, it might take \(\Theta(n^3)\) time, so be careful!). The challenge here is to do better than \(\Theta(n^2)\), where \(O(n\lg n)\) time is possible using the techniques we are discussing in class.

Example: Consider reliability values \(\langle x_1, x_2, x_3, x_4, x_5, x_6, x_7\rangle=\langle 55, 43, 52, 60, 55, 42, 52\rangle\). For this input, the subsequence that maximizes the \(R_1\) reliability metric is \(\langle x_3, x_4, x_5\rangle=\langle 52, 60, 55\rangle\), which has \(R_1\) value \(2+10+5=17\).

To provide this input to your program, it is given in the following format: on the first line is the number \(n\), and then the \(n\) lines following that each have the values \(x_1\), \(x_2\), and so on. So for the example above, the input will given like this:

Input should be read from standard input, although I recommend that you save inputs to files and redirect into your program for testing. Some students prefer to use file I/O in the program while they are developing the program (especially since most IDEs have poor support for providing input through standard input), but if you do this make sure you switch the input source back to standard input before submitting — the judging system will only provide input via standard input!

Your goal is to program this so that it takes less than \(5\) seconds to process its input and find the maximum \(R_1\) value. A reasonably fast processor (like those on our judging system) can easily run the \(\Theta(n^2)\) time algorithm in less than \(5\) seconds on inputs of \(10,000\) of fewer values. However, much larger than this and the \(\Theta(n^2)\) algorithm won’t be good enough.

Fortunately, for \(R_1\) there is a divide-and-conquer algorithm that will find the maximum \(R_1\) value in time \(O(n\lg n)\), allowing you to solve problems up to around \(n=500,000\) in less than 5 seconds. [Hint: this algorithms is basically the same as an algorithm given in Chapter 4 of the textbook. Once you see the connection, it’s just a matter of turning the book’s pseudocode into real code and making some minor adjustments.]

There are four problems you can (and should!) submit to on the judging system:

Problem 1 (MaxR1Small): [5 points] Your program should read the input data, then compute and print the maximum possible \(R_1\) value for this input. In this, the “small” version of the problem, all inputs will have \(n\leq 10,000\) and so the \(\Theta(n^2)\) time algorithm should be able to solve the problem. However, if you implement the faster \(O(n\lg n)\) time algorithm, it will work for this problem too!

Problem 2 (MaxR1Large): [10 points] This is the same as Problem 1, but inputs can get as large as \(n=500,000\) and so the \(\Theta(n^2)\) algorithm will not be able to pass these tests within the required time bound (under 5 seconds).

Problem 3 (MaxR2Small): [5 points] For Problems 3 and 4, we use a different metric, based on the idea that “data is only as good as its least-reliable value.” To put that in practice, for any sequence of values the minimum reliability value is found and then it is multiplied by the number of values. In other words,

\[ R_2( \langle x_i, x_{i+1},\ldots, x_j\rangle ) = (j-i+1)\cdot\min_{k\in\{i,\ldots,j\}} (x_k - 50) . \]

Example: For the sample input above, the subsequence that maximizes the \(R_2\) metric is \(\langle x_4,x_5\rangle = \langle 60,55\rangle\), with \(R_2\) value \(10\). The values immediately before and after this subsequence have low reliability ratings, so including them would reduce \(R_2\) a lot, even through you are including more values. However, if the “42” were changed to “55”, then that value would be included, bringing the \(R_2\) value up to \(15\).

The input for this program is the same as all the others, but the program should print out the maximum possible \(R_2\) value. For Problem 3, inputs will be “small,” with \(n\leq 10,000\).

Problem 4 (MaxR2Large): [5 points] This is the same as Problem 3, but inputs can get as large as \(n=500,000\) and so the \(\Theta(n^2)\) algorithm will not be able to pass these tests within the required time bound (under 5 seconds). There is an \(O(n\lg n)\) time divide-and-conquer algorithm for maximizing \(R_2\), but it’s not in the book so you’ll have to be clever and figure it out!