Solo Assignment 5 -- Due Wednesday, December 4

Solo Assignment 5 – Due Wednesday, December 4 (before class)

Remember that these problems are for individual work only (no collaboration).

Textbook, Problem 17.2
These are meant to be quick probability derivations. Any explanation of your calculation should be at most one or two sentences.
Textbook, Problem 18.16
Textbook, Problem 19.7
Textbook, Problem 19.18

Extra Credit (Up to 20 points – you don’t need to answer all parts to get some extra credit!)

When user passwords are saved in a database, the typical practice is that a hash of the password is stored rather than the password itself. That is, there is a function H that maps passwords into some set S of size n. For example, if H outputs a 64-bit value than S is the set of all 64-bit values, and n = |S| = 2⁶⁴ (this value is too small to be secure, but this is a foundations class and not a security class…). We will assume that if an input x is chosen randomly, then H(x) is uniformly distributed over S. Our problem is this: Given a value h ∈ S, find an x such that H(x) = h (in other words, x has hash value h, so works as the password). In order to find such an x, we will pick a random value x and test whether H(x) = h, and repeat this until we find an x that works.
1. Let U be a random variable which denotes the number of distinct hash values we see in this process. For a really small example, if S = {0, 1, 2, 3}, and we are looking for something with hash value 2 and compute the sequence of hash values 1, 3, 3, 1, 2 then U = 3. For arbitrary S of size n, and some value k, what is Pr[U = k]?
2. What is E[U]?
3. Let X_i be a random variable that represents the number of queries that must be made to see a new value (one that hasn’t appeared before) after the ith distinct value has been seen. What is E[X_i]? (Hint: Make sure you understand the coupon collector problem!)
4. Let Q be a random variable which denotes the total number of queries made by this algorithm. In the small example given in part (a), Q=5. Given that the value you are looking for is the kth distinct value seen by the algorithm, what is the expected value of Q? In other words, what is E[Q|U = k]? You can leave this answer as a summation.
5. What is E[Q]? This answer should not contain a summation – it should be a simple (very simple!) formula.
6. Let’s say that we ran this algorithm once, and remembered all values that have been seen in a large table. We are now given a new random value h′, and asked to find a y such that H(y) = h′. If we saw such a y already in the first execution of the algorithm, then we find this in our lookup table, and can produce y without having to make any queries to H. If we haven’t seen it before, then we have to continue searching like above. If we saw exactly n/2 distinct values the first time we ran the algorithm, what expected number of queries required to find such a y?