A function can be expressed by a formula or by a table.
For example, a boolean function with 2 inputs:
XOR
0 0 0
0 1 1
1 0 1
1 1 0
We have 2 * 2 inputs, and each input has exactly one output.
So, how many “boolean functions with 2 inputs” are there?
Each row has 2 possible outputs, so there are 2^4 such functions.
We fill out 2^4 tables, randomly choose one from them, which means we’ve chosen a random function.
We take a blank table with 4 rows. For each row, we randomly choose an output. Fill all 4 rows.
We take a blank table with 4 rows. We don’t fill it at first.
When someone asks for a certain input, if the output is empty, we randomly fill in an output.
If it’s not empty, we return the previously filled output.
These three definitions are equivalent.
When the range of input and output is very large, the table of a random function will be very large. (exponential)
Definitions 1 and 2 are both space-consuming. Even with definition 3, it still requires space to record.
In 1986, Goldreich - Goldwasser - Micali proposed the concept of “Pseudorandom Function”.
F(key, x)
Given a key, we can get a function with only a single argument, and this function behaves almost the same as a random function, making it difficult to distinguish the difference.
F(1, x) is very similar to a random function. It’s a pseudorandom function.
F(2, x) is very similar to a random function. It’s a pseudorandom function.
…
(F is a pseudorandom function ensemble indexed by key)
We only need the space of one key, and it’s as if we’ve obtained an entire table filled with random values. It’s convenient for storage and transmission.
Because each output is randomly selected, it’s possible that f(x) = f(y) but x != y . (for a fixed key)
However, if the output bits of f are long enough, the probability of this situation becomes negligible.
We can almost say that f(x) = f(y) => x = y .
Fix the key. Given a value v, the probability of finding an x such that f(x) = v is also negligible. (Assuming the output is long enough and there are enough inputs)
This enters the realm of MPC.
The Sender knows the key.
The Receiver knows x.
We want to compute F(key, x) without leaking the key and x
In the end, the Receiver gets F(key, x). The Sender doesn’t get anything.
This can be calculated using a generic MPC protocol. However, a custom protocol should be faster.
One party, the Sender, has set X = {x1, x2, … xn}
The other party, the Receiver, has set Y = {y1, y2, … yn}
The two want to know which elements X and Y have in common (X ∩ Y). Elements not in X ∩ Y will not be leaked.
Like OPRF, PSI has many implementation methods. Here we introduce the OPRF-based approach proposed by Hazay and Lindell in 2008:
Because it’s oblivious, the key and y won’t be leaked. Because it’s one-way and has a key, x won’t be leaked either.
hash(a)
to the Receiver for comparison? Although hash is one-way, it would still leak information.