Oblivious RAM Done Right (1): Overview

1. UCLA CS 282A Notes

In this section, we will discuss some preliminaries for the study of Oblivious RAM (ORAM). We will clarify the definition of security of remote data storage problem, and provide some basic solutions to the problem.

1. Definition of Security (of Remote Data Storage)

Consider two companies A and B. A runs a cloud storage service, and B is considering to use this service. B would like to do so because leasing remote storage from A is much cheaper than buying machines and store data locally. However, B doesn’t trust A completely. B is afraid of the loss of privacy after uploading these data completely to A. In this scenario, we call company A “cloud” and company B “user. Here is how we define what is meant to be secure:

1. Privacy: cloud is not able to read user’s data in clear.
2. Mutation of data: cloud is not able to change user’s data.
3. Loss of data: user is able to check if any data is lost or missing.
4. Re-introduction of old data: user is able to check if any old data is introduced as new data by cloud at some later date.
5. Invisible Access Pattern: If user performs a series of read/write operations, cloud is not able to the sequence of user’s operations.

If above criteria hold, we say this remote data stroage service is secure. Here is how we do to make them come true:

1. Privacy: Encryption
2. Mutation of data: Digital Signature Scheme
3. Loss of data or re-introduction of old data: Hash functions and Merkle trees
4. Invisible Access Pattern: ORAM

In the following blogs, we focus on 4, which is the study of ORAM. For a trivial solution, suppose user stores $N$ piceces of data in the cloud, and cloud could pass all data back to user one by one in $M$ separate times. User could uses local memory to hold them, perform read/write on relavant data, encrypt them again and pass them back to the cloud. This is the worst solution to solve this problem, and it has overhead $O(N)$. Next, we present a solution with overhead $O((logM)^4)$.

2. A Poly-log Solution:

Data Structure:

We refer user’s data as “words”. A word is a 2-tuple $(i, w_{i})$, where $i$ is the location of data and $w_{i}$ is the actual data at location $i$. Cloud creates arrays $A_{i}$ with length $2^i$, where $2 \leq i \leq$, so the first array has length 4, second array has length 7, third array has length 16, and so on. Each array is called a “buffer“, and each entry of an array is called a “bucket“. Each bucket holds $logM$ words. For each buffer $A_{i}$, user will use a secret seed and a pseudo-random function (PRF) to generate a hash function. These hash functions obfuscate the location of each piece of data, so cloud is not able to learn anything from the storing patten. The last buffer (also the longest) is called the “Storage Array”. It holds all $N$ pieces of data that user uploaded at beginning.