# Ad Hoc Multi-Input Functional Encryption (aMIFE)

## Overview:

In this paper, authors proved two main results:

1. aMIFE for general functionality can be constructed by standard MIFE + general FE + function re-runnable two-round MPC.
2. As a special case, aMIFE for the inner product functionality can be constructed from standard assumption: Learning From Error (LWE)

To figure out how it is achieved, we need to start from the building blocks:

## Functional Encryption (FE):

Consider a cloud image storage service. Law enforcement may require the cloud to search for images containing a particular face. For protecting users’ privacy (if there is still any), the cloud needs a restricted secret key that only decrypts a target image, and only decrypts the target face in clear, while still keeps all other parts of the images blurred. In this scenario, the secret key only reveals a function of the plaintext image. We know traditional public key cryptography is an “all or nothing” scheme, which means either we can decrypt and read the entire plaintext, or we are not able to decrypt it and thus learn nothing about the plaintext. So here function encryption (FE) comes to rescue. Just as in traditional encryption, in FE, ciphertexts can be generated with an encryption key. However, each decryption key is associated with a function $f$, and decryption of an encryption of $m$ using this key results in not $m$ but $f(m)$. Informally, the security definition of FE requires nothing more than $f(m)$ can be learned from the encryption of $m$ and the decryption key for $f$.

## Multi-Input Functional Encryption (MIFE):

Note that FE only works for computing a function $f$ on a single ciphertext. Moving one step further, what if we want to compute a function defined over multiple plaintexts given their ciphertexts each encrypted under a different key? Now we need to introduce MIFE. In MIFE, decryption keys allow computing joint functions of different plaintexts underlying multiple ciphertexts. That is, decryption takes a key for a function $f$ and ciphertexts $c_1, ... , c_n$ encrypting $m_1, ... , m_n$, and outputs $f(m_1, ... , m_n)$. The interactive pattern of MIFE can be summarized into the following steps:

1. Suppose there are $l$ users involved. This number $l$ is counted and sent to a trusted global setup. The global setup outputs a master secret key $MSK$, and a set of $l$ encryption keys $\{EK_1, EK_2, ..., EK_l\}$, one for each user.
2. The global setup picks a function $f$, and generates a decryption key $DK_f$ using $MSK$ and $f$. The subscript indicates this decryption key is associated with $f$.
3. The global setup publishes the public parameters to all users, and distributes each encryption key $EK_i$ to user with index $i$. Note that this step can be done in parallel with step 2.
4. Each user encrypts their plaintext $m_i$ upon receiving encryption key $EK_i$ to get a ciphertext $c_i$.
5. All users send their ciphertexts to the aggregator, who now computes the result $y = f(m_1, m_2, ..., m_l)$ using the decryption key $DK_f$.

## Motivation:

The motivation of aMIFE comes from these two drawbacks of the standard MIFE presented above:

1. In MIFE, encryption and decryption keys are generated via a global setup procedure run by an external entity, called the “key authority”. This key authority is able to decrypt all the ciphertexts gathered from users. It also picks which function $f$ to be computed. Therefore, it must be trusted to maintain the security of this scheme.
2. In MIFE, the number of function arity must be declared at setup time. It does not support a dynamic setting in which users can join or leave.

## Challenges:

Where is challenging part to improve these two drawbacks?

MPC:

For eliminating the key authority, a natural idea is to use MPC to replace it. However, naively using MPC as setup would introduce interaction between users, which MIFE doesn’t allow. Hence we can try Two-round MPC.

Two-round MPC:

Here is how it works:

1. Round One: Assume the number of users $n$ and the function to be computed $f$ of arity $n$ is predetermined. Each user feeds their index $i$ and plaintext $m$ into the first round MPC, and get the first round message $\rho^{(1)}$ and a secret $s$.
2. Round Two: All users publish their first round message $\rho^{(1)}$. Using all these first round messages, together with each secret $s$, each user computes the second round message $\rho^{(2)}$
3. Compute Result: All users send their second round messages to an aggregator, and the aggregator computes the final result $y$.

However, naively using a two-round MPC does not suffice, as it introduces two issues:

1. This template publishes fresh public parameters $PP$ for each $DK_f$, but MIFE requires publishing $PP$ only once.
2. $EK_i$ is not known given just the first round MPC messages, so users are not able to encrypt.

Solution to 1: Function re-runnable two-round MPC:

We require that the first round MPC message to be sent only once, and reused for all subsequent second round messages. This property is called “function re-runnable”. Moreover, we require the number of users to be declared during the second round of MPC, instead of at the time that the first round is sent. This property is called “function delayed”.

Solution to 2: Delaying Encryption:

We allow the users to delay encryption until $EK_i$ are known. During the setup, each user runs a FE scheme to get the FE master key. This master key is used to compute the first round MPC message. Additionally, each user encrypts their input $m_i$ using FE.Encrypt.

## aMIFE Interaction Pattern:

We want to let each user run a local setup, which replaces the trusted global setup in standard MIFE. In addition, we want the number of users to be undetermined in the setup procedure, so that it allows users to join or leave in the computation process. For simplicity, we present the aMIFE in the private-key setting:

1. Each user $i$ runs a local setup to generate some public parameters and a private encryption key $EK_i$.
2. Each user publishes their public parameters, and encrypts their input $x$ using the private encryption key only, nothing else.
3. Using their private encryption keys, users can issue partial decryption keys to the aggregator. Each partial decryption key is generated by the public parameters of $l - 1$ other users.
4. If these $l - 1$ other users also issue the matching partial encryption keys for $f$ to the aggregator, then the aggregator can decrypt the ciphertexts $c_1, c_2, ..., c_l$ using all partial decryption keys $PDK_{1,f}, PDK_{2,f}, ..., PDK_{l,f}$. The final result is $y = f(m_1, m_2, ..., m_l)$

But how to replace the global setup using local setup? The idea is to nest FE, MIFE and Two Round MPC together.

## aMIFE for General Functionalities:

Now we are ready to present the complete aMIFE scheme for general functionalities:

1. Local Setup: Each user runs FE scheme to generate a FE master secret key $MSK_{FE}$. Feed this $MSK_{FE}$ into the first round MPC to get the first round protocol message $\rho^{(1)}$ and a secret $s$. Set public parameter $PP = \rho^{(1)}$ and master secret key $MSK = (MSK_{FE}, s)$. Note that the global setup in standard MIFE is bypassed using FE and first round MPC.
2. Key Generation: Suppose $l$ users are chosen to participate in the computation, then these $l$ users publish their public parameters and master secret keys. Each user runs second round MPC to generated a second round message $\rho^{(2)}$, which is the partial decryption key $PDK_i$.
3. Encryption:
4. Decryption:

## “Ad hoc Friendliness”:

We say a MIFE scheme is “ad hoc friendly” if it satisfies the following three conditions:

1. Decentralized Setup: The encryption keys $EK_i$ and the partial master secret key $MSK_i$ for each user can be generated locally.
2. Local Encryption: Each user is able to encrypt using their encryption key $EK_i$ and plaintext $x$, nothing else. The encryption process is fully independent from the number of users $n$.
3. Piecewise Master Secret Key: If we restrict the master secret key $MSK = \{MSK_1, MSK_2, ..., MSK_n\}$ to some subset S with $|S| = l$, the subset has the same distribution as a master secret key generated for functions of arity $l$.

# Yes, There is an Oblivious RAM Lower Bound!

Written by Chenyang Yu, for CS 282B final presentation.

## Readings:

1. Simons Institute Talk: “Oblivious RAM” by Boyle
2. Software protection and simulation on oblivious RAMs” by Goldreich and Ostrovsky, we will refer it as [GO96]
3. Is There an Oblivious RAM Lower Bound?” by Boyle and Naor
4. Simons Institute Talk: “Yes, There is an Oblivious RAM Lower Bound!” by Larsen
5. Yes, There is an Oblivious RAM Lower Bound!” by Larsen and Nielsen

In particular, number 5 is the paper that I am presenting (picked from CRYPTO 2018).

## Terminology:

Online: The content of the operation sequence is given to the ORAM one at a time.

Offline: The ORAM is given the entire operation sequence ahead of simulation time.

Bandwidth overhead: The multiplicative factor extra memory blocks that must be accessed to hide the operation sequence. The formula will be given later. When we say “upper bound” or “lower bound” of ORAM, we always mean the bound of amortized bandwidth overhead.

Balls in bins: The data items are modeled as “balls”, and CPU registers and server-side memory are modeled as “bins”, and the set of allowed data operations consists only of moving balls between bins, in other word, shuffling.

Active: Allow the server to perform untrusted computation on behalf of the user.

Passive: Doesn’t allow the server to perform untrusted computation on behalf of the user.

## Overview:

ORAM was first introduced for hiding memory access pattern in a software protection context. The main performance metric of an ORAM is the bandwidth overhead. In lectures, we studied square root solutions, hierarchy solution, and so on, in great details. Those models were built for pushing down the upper bound. Today, we are going to discuss the lower bound. In [GO96], we already had a $\Omega(log n)$ lower bound. Is there anything we can do to further reduce the lower bound?

## Directions:

There are three directions to think about for further exploring lower bound:

1. [GO96] lower bound only applies to “balls in bins” algorithms, so we may consider to remove this assumption. Boyle and Naor removed this assumption and showed it will result in super linear lower bounds for sorting circuits. To overcome this barrier, they proposed an “online” ORAM. They also argued that most implementations of ORAM fall into the “online” category.
2. [GO96] lower bound holds even for offline ORAM. Boyle and Naor pointed out that most practical constructions of ORAM are online, so we may hope to find a better lower bound for online ORAM only.
3. [GO96] lower bound crucially relies on the statistically closeness of access pattern distributions for any two inputs. In many cases, statistical security might be stronger than necessary, so we may consider to relax this assumption to computationally secure, which is sufficient in real world application. Note that in [GO96], authors pointed out that in practice, one can implement such ORAM using PRFs, so the result is computationally secure only. But the lower bound was proved assuming access to random oracle, so that the proof itself relies on statistical closeness. Hence by “relaxing this assumption”, I mean for proving lower bound, not for implementation.

All together, in this paper, we are going to prove a lower bound for non-“balls in bins” ORAM that is online, passive and computationally secure only. That is, the ORAM algorithm can be arbitrary, the adversary must be efficient, and it holds only for online ORAM.

## Definitions and Notations:

An online ORAM supports the following two operations:

1. $write(a, data)$: Store data in the block of address a, where $a \in [n]$ and $data \in \{0.1\}^{r}$.
2. $read(a)$: Return the contents of the block of address a.

We refer to blocks at the server as “memory cells“, where each cell contains w bits. We refer to blocks in the ORAM array as “entries“, where each block contains r bits. Note that this “r” is just the length of “data” as defined above. We also let the client memory to have m bits. Note that if after N accesses the ORAM makes M probes, then the bandwidth overhead is $Mw/Nr$.

Let y be an operation sequence: $y := (op_{1}, ... , op_{M})$, where $op_{i} \in \{write, read\}$. Let $A(y) := (A(op_{1}), A(op_{2}), ... , A(op_{M}))$ denote the memory access pattern to the server from the online ORAM, where each $A(op_{i})$ represents the list of addresses of the memory cells accessed at the server while processing operation $op_{i}$.

An online ORAM is secure in the following sense: for two distinct operation sequences y and z with length M, A(y) and A(z) are computationally indistinguishable.

## Main Theorem (informal):

Theorem: In such setting, any online ORAM must have an expected amortized bandwidth overhead of $\Omega(log(nr/m))$ on sequences of $\Theta(n)$ operations.

Comment: Note that this theorem holds in the random oracle model, requiring only computational indistinguishability, holds for any $w$ and allows for arbitrary representations of the data in memory. For the natural setting of parameters $r \leq m \leq n^{1-\epsilon}$ for arbitrary small constant $\epsilon > 0$, the lower bound simplifies to $\Omega(log n)$.

## Proof Strategy:

Note that above definition is very close to the definition of an oblivious data structure, which is defined as follows:

Definition: In the array maintenance problem, we must maintain any array $B$ of $n$ $r$-bit entries under the following two operations:

1. $write(a, data)$: Set the contents of B[a] to data, where $a \in [n]$ and $data \in \{0,1\}^{r}$.
2. $read(a)$: Return the contents of B[a].

Hence the idea is to prove a lower bound for oblivious data structures solving the array maintenance problem, and then use the same technique to prove the same lower bound for online ORAM.

## Models:

Data Structure lower bounds are typically proved in the cell probe model of Yao. The model is very similar to standard RAM, the difference is that the computation is free and we only pay for memory accesses. To meet our goal, we will define our model as “oblivious cell probe model“. Based on this model, we will use “information transfer” method. This method comes from the paper “Logarithmic Lower Bounds in the Cell-Probe Model“.

The first page of the handout contains a graph of such model. Each operation can be a read or a write. Each operation $op$ makes read and write probes to the server memory. The sequence of probes made during an operation $op$ is called the access pattern of $op$ and is written as $A(op)$. Words in the server memory are called cells. Each cell is $w$ bits. The ORAM is restricted to $m$ bits of storage between two operations. During an operation it can use unlimited storage.

# Yao’s Garbled Circuit: An Introduction

## Citation:

1. “A Gentle Introduction to Yao’s Garbled Circuits” by Sophia Yakoubov
2. UCLA CS 282A Notes by Rafail Ostrovsky, part 15

In this post, we will discuss some of the very basics of Yao’s Garbled Circuit, as it is indeed an important prerequisite that appears a lot in other protocols.

## Problem:

Consider two people Ginny and Evan. Ginny and Evan wish to figure out whether they should collaborate or not, but neither of them wants the other person to know his/her attitude.

## Setup:

In other word, let Ginny holds a bit $g$ and let Evan holds a bit $e$. The bit has value 1 if he/she agrees to collaborate, or has value 0 if he/she disagrees. They want to design a protocol such that both of them learn the value of $g \wedge e$ at the end of protocol, but neither Ginny or Evan learns the other person’s bit during the protocol. In this scenario, we call Ginny the “garbler”, who is in charge of “garbled gate generation”. We call Evan the “evaluator“, who is in charge of “garbled gate evalution“.

## Garbled Gate Generation (by Ginny):

The idea is to design a “garbled gate” and generalize it into garbled circuit. Garbling basically means to obfuscate boolean gate truth table. Ginny picks four random strings as “labels“, each corresponds to one of $g = 0, 1$ or $e = 0, 1$.

### step 1: encryption

We pick a label from the $g$ side and a label from the $e$ side to form a label pair. Now we have four labels pairs, and together they represent the boolean truth table. Each label pair is used as input for a key derivation function $H$, the output $H(label 1, lable 2)$ is a symmetric encryption key, and the key is used to encrypt the output of $g \wedge e$. Hence now the garbled outputs (ciphertexts) takes the form $Enc(H(label 1, label 2), g \wedge e)$ after obfuscation.

### step 2: permutation

Simply permute the garbled outputs, so that the garbled gate appears to have random order.

## Garbled Gate Evaluation (by Evan):

Evan now needs to decrypt the ciphertext corresponding to the real values of $g$ and $e$, so he needs to receive those two labels with the real values of $g$ and $e$ from Ginny. We call these two labels “label(g)” and “label(e)” respectively.

### step 1: sending label(g)

Since Ginny knows $g$, she knows which label is label(g). Since labels are just random strings picked by Ginny, Evan is not able to learn anything about $g$ from label(g) itself. So Ginny could just send her label(g) to Evan.

### step 2: sending label(e)

Since Ginny doesn’t know $e$ this time, she doesn’t know which label is label(g). She can’t send both labels corresponding to $e$ to Evan, because then Evan can decrypt two ciphertexts. Evan can’t just ask for the one he wants, since he doesn’t want Ginny to learn the value of $e$. Hence they choose to use “oblivious transfer (OT)” to pass label(e).

## From Gates to Circuits:

In reality, Ginny and Evan may wish to compute much more complicated functions using this protocol. In this case, Ginny will garble the entire function circuit, instead of a single gate. Now garbled gate becomes garbled circuit.

# Oblivious RAM Done Right (1): Overview

### Reading:

1. UCLA CS 282A Notes

In this section, we will discuss some preliminaries for the study of Oblivious RAM (ORAM). We will clarify the definition of security of remote data storage problem, and provide some basic solutions to the problem.

## 1. Definition of Security (of Remote Data Storage)

Consider two companies A and B. A runs a cloud storage service, and B is considering to use this service. B would like to do so because leasing remote storage from A is much cheaper than buying machines and store data locally. However, B doesn’t trust A completely. B is afraid of the loss of privacy after uploading these data completely to A. In this scenario, we call company A “cloud” and company B “user. Here is how we define what is meant to be secure:

1. Privacy: cloud is not able to read user’s data in clear.
2. Mutation of data: cloud is not able to change user’s data.
3. Loss of data: user is able to check if any data is lost or missing.
4. Re-introduction of old data: user is able to check if any old data is introduced as new data by cloud at some later date.
5. Invisible Access Pattern: If user performs a series of read/write operations, cloud is not able to the sequence of user’s operations.

If above criteria hold, we say this remote data stroage service is secure. Here is how we do to make them come true:

1. Privacy: Encryption
2. Mutation of data: Digital Signature Scheme
3. Loss of data or re-introduction of old data: Hash functions and Merkle trees
4. Invisible Access Pattern: ORAM

In the following blogs, we focus on 4, which is the study of ORAM. For a trivial solution, suppose user stores $N$ piceces of data in the cloud, and cloud could pass all data back to user one by one in $M$ separate times. User could uses local memory to hold them, perform read/write on relavant data, encrypt them again and pass them back to the cloud. This is the worst solution to solve this problem, and it has overhead $O(N)$. Next, we present a solution with overhead $O((logM)^4)$.

## 2. A Poly-log Solution:

### Data Structure:

We refer user’s data as “words”. A word is a 2-tuple $(i, w_{i})$, where $i$ is the location of data and $w_{i}$ is the actual data at location $i$. Cloud creates arrays $A_{i}$ with length $2^i$, where $2 \leq i \leq$, so the first array has length 4, second array has length 7, third array has length 16, and so on. Each array is called a “buffer“, and each entry of an array is called a “bucket“. Each bucket holds $logM$ words. For each buffer $A_{i}$, user will use a secret seed and a pseudo-random function (PRF) to generate a hash function. These hash functions obfuscate the location of each piece of data, so cloud is not able to learn anything from the storing patten. The last buffer (also the longest) is called the “Storage Array”. It holds all $N$ pieces of data that user uploaded at beginning.

# Oblivious RAM Done Right (2): [GO96]

## Reading:

1. Software Protection and Simulation on Oblivious RAMs” by Rafail Ostrovsky

## 1. Introduction:

This paper starts from the problem of software protection, here we mean “protection against illegitimate duplication”, i.e. ensuring there is no efficient method for creating executable copies of the software. Purely software-based solution doesn’t exist, so hardware must involve. Our task is to minimize the participation of hardware, so that the solution is less expensive.

An existing solution is “software-hardware-package (SH-package)“, that is, selling a physically shielded CPU together with an encrypted program. An instruction cycle of such CPU will be fetching -> decrypting -> executing an instruction. Every ATM machine has such a protected chip.

However, above solution is not good enough, since it doesn’t hide the “access pattern“, i.e. the addresses of the memory cells accessed during the execution are not kept secret. The lack of such property may create trouble, e.g. an adversary could learn the loop structure of the program, which is important in certain scenario. If we come up with a solution that hides the access pattern, we consider it as secure. Informally, “hiding access pattern” means no PPT adversary can distinguish these two cases:

Case 1: The adversary is experimenting with the real CPU that runs the encrypted program.

Case 2: The adversary is experimenting with a fake CPU that runs dummy program, such as “while True do skip”. After the same number of steps as in case 1, the CPU writes exactly the same outputs as in case 1 into memory.

To protect software to such level of security, the cost is “speed”, called software protection overhead. Informally, software protection overhead means the number of steps the protected program makes for each each step of the program. Our task is to minimize this overhead.

## 2. Model and Definitions:

A machine is oblivious if the sequence in which it accesses memory locations is equivalent for any two inputs with the same running time.

# Computational Complexity

## Readings：

1. Algorithm Design” by Kleinberg and Tardos
2. Introduction to the Theory of Computation” by Michael Sipser
3. UCLA CS 282A Notes by Rafail Ostrovsky

You don’t have to memorize the precise definition for these computational models, since most of the time we won’t use details in their defitions to construct proofs. However, a poor understanding on these topics creates barriers when reading paper, so it is neccessary to at least know what they are talking about.

### 1. Turing Machine：

Turing machines mimic the function of a computer, but it uses an infinitely-long tape as memory, so it is rather an ideal model. Turing machines can read and write, and use a “read-write head” to point to a specific location on the tape to access the content of that location. For a brief introduction, please refer to this page.

### 2. Polynomial Time (P):

P class contains all decision problems that a deterministic single-tape Turing machine can solve in polynomial time。Generally speaking, P contains problems of time complexity O(n^k), k is a natrual number. We can assume a modern computer is able to solve a problem of class P, but note that this “easy-to-compute” impression comes from the difficulty of NP class. A problem of complexity O(n^1000000) is definitely not easy to solve.

### 3. Non-Deterministic Polynomial Time (NP):

NP class contains all problems that are polynomially verifiable。For example, given integers p and q, it is easy to verify whether x = p * q is a composite or not, simply by doing multiplication. However, if we are given an integer x and we are asked to find a factorization of x into p and q, this task can be very difficult to do. Let’s call this problem the “composite problem”, and we say composite problem is polynomially verifiable but not polynomially decidable, so it is of class NP, not P. Similarly, we can assume NP problems are unsolvable on modern computer, although some of them are easy to solve in special cases.

### 4. P versus NP:

From 2 and 3, we can define P and NP in the following way:

1. P = the class of languages for which membership can be decided quickly (in polynomial time)
2. NP = the class of languages for which membership can be verified quickly (in polynomial time)

If P equals NP， then all polynomially verifiable problems are polynomially decidable，As a result, one-way function doesn’t exist，and we are unable to build most of the modern cryptographic theory without the existence of one-way function. Hence P does not equal NP is a foundamental assumption that we make all the time.

### 11. Oracle Machines:

An oracle machine can be viewed as a Turing machine + a black box structure, called an oracle. An oracle can answer any decision problem in a single operation. The decision problem can be of any complexity class, even undecidable problem, so oracle machine is a rather ideal model.

### 12. Random-Access Machine (RAM):

RAM is a class of register machines, can be viewed as a counting machine + the ability of indirect addressing. RAM has registers labelled R1，R2，R3，R4 and etc，Each register stores an integer。RAM has a control unit, which contains a program. RAM uses a program counter (PC) points to and execute certain statement in the program。Here is a graph of the RAM structure.

### 13. Some notes:

1. log function has a change of base formula, so any log function can be changed to another base by multiplying a constant。As a result，O(log n) can represent log function with any base, so we don’t need to specify what kind of base number we are working with, and replace everything with logn.