Secure Function Evaluation Using an FPGA Overlay Architecture

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Secure Function Evaluation (SFE) has received considerable attention recently due to the massive collection and mining of personal data over the Internet, but large computational costs still render it impractical. In this paper, we leverage hardware
  Secure Function EvaluationUsing an FPGA Overlay Architecture Xin Fang Dept of Electrical andComputer EngineeringNortheastern UniversityBoston, MA, USA fang.xi@husky.neu.eduStratis Ioannidis Dept of Electrical andComputer EngineeringNortheastern UniversityBoston, MA, USA ioannidis@ece.neu.eduMiriam Leeser Dept of Electrical andComputer EngineeringNortheastern UniversityBoston, MA, USA ABSTRACT Secure Function Evaluation (SFE) has received considerableattention recently due to the massive collection and miningof personal data over the Internet, but large computationalcosts still render it impractical. In this paper, we lever-age hardware acceleration to tackle the scalability and effi-ciency challenges inherent in SFE. To that end, we propose ageneric, reconfigurable implementation of SFE as a coarse-grained FPGA overlay architecture. Contrary to tailoredapproaches that are tied to the execution of a specific SFEstructure, and require full reprogramming of an FPGA witheach new execution, our design allows repurposing an FPGAto evaluate different SFE tasks without the need for repro-gramming. Our implementation shows orders of magnitudeimprovement over a software package for evaluating garbledcircuits, and demonstrates that the circuit being evaluatedcan change with almost no overhead. Keywords FPGA; Secure Function Evaluation; Garbled Circuits 1. INTRODUCTION Mining behavioral data is a ubiquitous practice amongInternet companies, and is presently happening at an un-precedented scale. Google, Netflix, Amazon, and Facebookroutinely monitor and mine a broad array of behavioral sig-nals collected from their users, including ad clicks, pagesvisited, and products purchased. Such information is mon-etized through targeted advertising or personalized prod-uct recommendations. Behavioral data collection is there-fore of considerable business value to online companies [2];moreover, there are often benefits to society at large: min-ing such data can aid in the detection of medical emer-gencies or the spread of diseases [33], in polling to assess political opinions [1] or news adoption [21], in the assess- ment of terrorist threats [35], etc. On the other hand, this Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s). FPGA ’17 February 22-24, 2017, Monterey, CA, USA c  2017 Copyright held by the owner/author(s).ACM ISBN 978-1-4503-4354-1/17/02.DOI: massive data collection and mining has given rise to sig-nificant privacy concerns, extensively documented by re-searchers [20, 24, 26, 30, 34, 36, 40] as well as the popular press [2,42]. Such concerns are only likely to further in- crease with the emergence of the “Internet of things”, aswearable devices and home automation sensors connectedto the Internet proliferate.This state of affairs gives rise to the following challenge:given the benefits of mining behavioral data to both on-line companies and the society at large, is it possible to enable data mining practices without jeopardizing user pri-vacy  ? A series of recent research efforts [4, 6, 11, 27–29] have attempted to address this issue through cryptographicmeans and, in particular, through secure function evalua-tion (SFE). SFE allows an interested party to evaluate anydesirable polynomial-time function over private data, whilerevealing only the answer and nothing else about the data.This offers a strong privacy guarantee: an entity executing asecure data-mining algorithm over user data learns only thefinal outcome of the computation, while the data is neverrevealed to the entity. SFE can thus enable, e.g., a dataanalyst, a medical professional, or a statistician, to conducta study of sensitive data, without jeopardizing the privacyof the participants (online users, patients, etc.).Any algorithm to be executed over amounts of data at thescale encountered in the above settings needs to be highlyefficient and scalable. SFE over private data therefore posesa significant challenge, as it comes at a considerable ad-ditional computational cost compared to execution in theclear. Prior work has made positive steps in this direc-tion, showing that a variety of important data mining al-gorithms [27–29] can be computed using Yao’s Garbled Cir- cuits (GCs) [43,44] in a parallel fashion. The function to be evaluated is converted to a binary circuit which is“garbled”in such a way that an evaluator of the circuit learns onlythe values of its output gates. Execution of this circuit issubsequently parallelized, e.g., over threads [28] or across a cluster of machines [27]. Nevertheless, this approach to parallelization leaves muchto be desired: for example, in [27], even under parallelization over 128 cores, executing a typical data-mining algorithmlike Matrix Factorization through SFE is of the order of 10 5 slower compared to (parallel) execution in the clear. Inpractice, this means that applying MF to a dataset of 1Mentries requires roughly 11 days under SFE, a time largelyprohibitive for practical purposes.In this paper, we advocate leveraging hardware acceler- 257  ation to tackle the scalability and efficiency challenges in-herent in SFE. FPGAs are by design an excellent hardwareplatform for the implementation of SFE primitives and, inparticular, garbled circuits. This is precisely because FP-GAs are tailored to executing nearly identical operationsin parallel. The types of operations encountered in garbledcircuits (namely, garbling and un-garbling gates) fit this pat-tern precisely: they involve, e.g., a series of symmetric keyencryptions,  xor s, and other well-defined primitive opera-tions (see also Section 3). In that sense, an FPGA imple-mentation of SFE benefits from both high speed evaluationand hardware-level parallelization.On the other hand, the amount of computation requiredto evaluate a garbled circuit for an application at the usualdata-mining scale cannot fit in a single FPGA. For this rea-son, evaluating a function securely entails partitioning com-putations into sub-tasks to be programmed and evaluatedover a single FPGA. A practical implementation thereforeneeds to allow repurposing an FPGA to quickly compute dif-ferent SFEs or different sub-tasks of a larger SFE. For thisreason, tailored approaches that are tied to the executionof a specific SFE structure, and require full reprogrammingof an FPGA with each new execution, cannot be appliedefficiently to the types of SFE problems we wish to address.To address these challenges, we propose a  generic, recon- figurable implementation of SFE as a coarse-grained FPGAoverlay architecture  . As FPGAs have become more denseand capable of holding a large number of gate equivalents,there has been an increased interest in FPGA overlay archi-tectures [5,13,14,16–18,41]. An FPGA overlay consists of  two parts: (1) a circuit design implemented on the FPGAfabric using the usual design flow, and (2) a user circuitmapped onto that overlay circuit. Garbled circuits are ex-cellent candidates for an FPGA overlay design. Preciselybecause components of a circuit follow a generic structure,an overlay approach that does not reprogram FPGAs fromscratch, but simply  reroutes   connections between elementarycomponents (in our case, garbled  and  and  xor  gates) leadsto important efficiency improvements.This paper makes the following contributions: •  We design and implement a generic FPGA overlay ar-chitecture for the execution of arbitrary garbled circuittopologies. In our design, FPGAs are programmedonce to contain implementations of garbled compo-nents ( and ,  xor  gates). Wiring and instantiation isdetermined at execution time through writing to regis-ters and memory. Thus, the overhead for repurposingthe FPGA for different circuit computations is keptvery low. •  We integrate our implementation with ObliVM [22], a framework mapping code written in a high-level lan-guage to a garbled circuit, allowing arbitrary programswritten in ObliVM to be mapped to our FPGA overlayarchitectures. •  We evaluate the performance of our GC overlay archi-tecture on several examples and demonstrate orders of magnitude speedup over ObliVM. We demonstrate theeffects of using the overlay architecture, which resultsin change time for different circuit computations thathave little effect on overall performance.The remainder of the paper is organized as follows. Wepresent related work in Section 2 and background on garbledcircuits in Section 3. Our implemented system and overlay architecture are presented in Section 4 and experimental re-sults in Section 5. Finally, we present our conclusions and future work in Section 6. 2. RELATED WORK Garbled Circuits.  Although garbled circuits were pro-posed by Andrew Yao nearly three decades ago [43,44], it is only in the last few years that the research communityhas made progress at improving their efficiency, bringingtheir application closer to practicality. Several improve-ments over the original protocol have been proposed, in-cluding the point-and-permute [3], row reduction [25], and the Free- xor  [19] optimizations; we use all of these in our implementation.Building on these optimizations, there has been a surgeof recent programming frameworks, such as TASTY [9],FastGC [10], Fairplay [23], and ObliVM [22], that provide software implementations of GCs. These frameworks, par-ticularly ObliVM, allow developers without any cryptogra-phy expertise to convert algorithms expressed in a high-levellanguage to GCs. None of these frameworks focus on hard-ware acceleration. We provide an interface to ObliVM in ourwork; this allows us to describe algorithms in a high levellanguage, map them to circuits through ObliVM, and thenuse our software to execute these circuits over our FPGAoverlay architecture. FPGA overlays.  Recently, as FPGAs have become moredense and capable of holding a large number of gate equiv-alents, there has been an increased interest in FPGA over-lay architectures. An FPGA overlay consists of two parts:(1) a circuit design implemented on the FPGA fabric usingthe usual design flow, and (2) a user circuit mapped ontothat overlay circuit. Overlays are in general used for twopurposes. The first is to create FPGA designs that are in-dependent of the specific structures on a particular FPGAand therefore to make designs portable, or, in other words,able to be mapped to FPGAs from different vendors and todifferent devices from the same vendor. This class of FPGAoverlay designs [5,41] creates basic FPGA structures, such as Look-Up Tables (LUTs) and routing, built on top of thoseprovided in silicon on the target FPGA chip. We are usingan FPGA overlay for the second purpose; namely, to reducethe amount of time to translate a design to an FPGA im-plementation. FPGAs offer a great deal of reconfigurabilityand flexibility; however, this comes at the cost of program-ming. Generating designs that run efficiently on FPGAs canbe challenging for the end user. In addition, the  compilation  process for a high end FPGA design can take several hours.Here  compilation   refers to the complete set of steps fromspecification (in a hardware description language (HDL) orhigh level language) to generating a bit stream to downloadto the FPGA. These steps include synthesis, place-and-routeand bit stream generation for the target FPGA. Examplesof this style of FPGA overlay architecture include Networkon a Chip (NoC) overlays [16,17] and instruction set exten- sion overlays [18]. In these cases, structures are built on the FPGA and the overlay architecture provides flexible routingamong them. Hardware Implementations of Garbled Circuits.  Re-cent studies have used FPGAs as well as GPUs [12, 31] for hardware implementations of garbled circuits. TinyGar- 258  ble [37] uses techniques from hardware design to implement GCs as sequential circuits and then optimize these designs,reporting on results using high level synthesis tools and sim-ulation, but do not report any actual hardware implemen-tations. TinyGarble produces more efficient solutions for asingle GC instance, but does not handle multiple pieces of a GC or different garbled circuits the way our overlays do.J¨arvinen et al. [15] target embedded system and describe the first FPGA implementation of GC. This implementa-tion is at a completely different design point than ours: whilegeneric and able to support a wide range of hardware im-plementations, the proposed FPGA design implements onlyone encryption core. In contrast, our overlay architectureimplements hundreds of encryption cores on a single FPGAand executes them in parallel.A recent FPGA implementation of garbled circuits bySonghori et al. [38] take a different approach. Rather than implement garbled circuits directly, they implement a  gar-bled   MIPS core. Problems to be evaluated securely are writ-ten in code, that is compiled to MIPS assembler and thenrun securely on their garbled MIPS processor. The goalof  [38] is to fabricate this MIPS core; FPGAs are used for prototyping the design. Using MIPS assembly code to rep-resent the problem being evaluated alleviates the problem of lengthy FPGA place and route cycles. However, the FPGAis not used as efficiently as in our implementation: Songhoriet al. use considerably fewer encryption cores, and runningcode on a MIPs processor creates an extra level of overhead.Our architecture uses much more parallelism than otherFPGA implementations of garbling. For starters, we imple-ment four SHA-1 cores in hardware for each  and  gate, whileothers use one encryption core serially [15]. In addition, we implement as many garbled  and  gates as we can keep busyat the same time, and implement garbled circuits directlyon top of an efficient overlay. 3. TECHNICAL BACKGROUND Yao’s protocol (a.k.a.  garbled circuits  ) [43] is a generic cryptographic protocol for secure function evaluation. Inshort, it allows the secure evaluation of an arbitrary functionover private data, provided this function can be representedas a circuit. We give a brief overview of the protocol below(see, e.g., [8], for a detailed treatment). 3.1 Garbled Circuits Overview In the variant we study here (adapted from [25,29]), Yao’s protocol runs between (a) a set of private input owners (e.g.,Google’s users), (b) an Evaluator, (e.g., a data analyst work-ing for Google), that wishes to evaluate a function over theprivate inputs, and (c) a third party called the Garbler, thatfacilities and enables the secure computation. Formally, let n  be the number of input owners, and let  x i  ∈ { 0 , 1 } ∗ denotethe private input of individual  i , 1  ≤  i  ≤  n , represented asa binary string. Finally, let  f   : ( { 0 , 1 } ∗ ) n → { 0 , 1 } ∗ be thefunction that the Evaluator wishes to compute over the pri-vate data. The protocol satisfies the following property: atthe conclusion of the protocol, the Evaluator learns  only   thevalue  f  ( x 1 ,x 2 ,...,x n ) and nothing else about  x 1 ,...,x n ,while the Garbler learns nothing.A critical assumption behind Yao’s protocol is that thefunction  f   can be expressed as a  Boolean circuit  , and, morespecifically, as a directed acyclic graph (DAG) of   and  and GARBLER EVALUATOR USERS TRANSMIT PROXY OBLIVIOUS TRANSFER 1 , x 2 , . . . , x n Private Inputs    G   A   R   B   L   E KeysGarbled CircuitEVALUATE f  f  ( x 1 , x 2 , . . . , x n )    P   H   A   S   E   I   P   H   A   S   E   I   I   P   H   A   S   E   I   I   I Figure 1:  Yao’s protocol. The Evaluator wishes to evaluate afunction  f  , represented as a binary circuit of   and  and  xor  gates,over private user inputs  x 1 ,x 2 ,...,x n . In Phase I, the Garbler“garbles” each gate of the circuit, outputting (a) a “garbled cir-cuit”, namely, the garbled representation of every gate in the cir-cuit representing  f  , and (b) a set of keys, each corresponding to apossible value of the inputs  x 1 ,...,x n . In Phase II, through proxyoblivious transfer, the Evaluator learns the keys corresponding tothe true user input values, while the Garbler learns nothing. Inthe final phase, the Evaluator uses the keys as input to the garbledcircuit to evaluate the circuit, ungarbling the gates in breadth-first order. At the conclusion of Phase III, the Evaluator learns f  ( x 1 ,...,x n ) . xor  gates. 1 The structure of the circuit – and, thus, thefunction to be computed – is known to all participants: e.g.the circuit could be computing the sum or the maximumamong all inputs  x i .Overall, Yao’s protocol consists of three phases:1.  Garbling Phase.  During the garbling phase, theGarbler prepares (a) a set of encrypted (i.e.,“garbled”)truth tables for each binary gate in the circuit, as wellas (b) a set of random strings, termed  keys  , one foreach possible binary value in the strings representingthe inputs. At the conclusion of this phase, the Gar-bler sends to the Evaluator the garbled truth tables;each such table is referred to as a “garbled gate”, andall gates together constitute the “garbled circuit”.2.  Oblivious Transfer Phase.  Subsequently, the Eval-uator, Garbler, and the input owners engage in a proxyoblivious transfer [7,25,32]. Through this, the Eval- uator retrieves the input keys from the Garbler thatcorrespond to true input binary values held by theowners. Oblivious transfer ensures that, although theEvaluator learns the correct keys, the cleartext inputvalues are never revealed to either the Garbler or theEvaluator.3.  Evaluation Phase.  Finally, the Evaluator uses theseinput keys to “evaluate” the gates of the circuit, ef-fectively decrypting the garbled gates. Each such de-cryption reveals a new key that allows the Evalua-tor to ungarble/decrypt subsequent gates connectedto it. Ungarbling the output gates reveals the value f  ( x 1 ,...,x n ). 1 Recall that any Boolean circuit can be represented usingonly  and s and  xor s. 259  The above three phases are illustrated in Figure 1. Theexecution flow (as well as the opportunity for parallelism) isdetermined by the Boolean circuit representing function  f  .Both the“garbling”of the gates, that occurs at the Garbler,and the“ungarbling/evaluation”, that occurs at the Evalua-tor, are computationally intensive tasks; these are preciselythe operations that we propose to accelerate using FPGAs.We describe these phases in more detail below. 3.2 Garbling Phase We now describe how gates are garbled in Yao’s protocol.As illustrated in Figure 2, each binary gate in the DAG rep- resenting the circuit is associated with three wires: two inputwires and one output wire. At the beginning of the garblingphase, the Garbler associates two random strings,  k 0 w i  and k 1 w i , with each wire  w i  in the circuit. Intuitively, each  k bw i  isan encoding of the bit-value  b  ∈ { 0 , 1 }  that the wire  w i  cantake. For each gate  g , with input wires ( w i ,w j ) and outputwire  w k , let  g ( b i ,b j )  ∈ { 0 , 1 }  be the binary output of thegate given inputs  b i ,b j  ∈ { 0 , 1 }  at wires  w i  and  w j , respec-tively. For each gate  g , the Garbler computes the followingfour ciphertexts, one for each pair of values  b i ,b j  ∈ { 0 , 1 } : Enc ( k biwi ,k bjwj ,g ) ( k g ( b i ,b j ) w k  ) =  SHA1 ( k b i w i  k b j w j  g )  ⊕  k g ( b i ,b j ) w k  ,  (1)where  SHA1  is the  SHA1  hash function,    indicates concate-nation,  g  is an identifier for the gate, and  ⊕  is the  xor operation.The “garbled” gate is then represented by a  random per-mutation   of these four ciphertexts. An example of a garbled and  gate is illustrated on Fig. 2. Observe that, given the pair of keys ( k 0 w i ,k 1 w j ) it is possible to successfully recoverthe key  k 1 w k  by decrypting  c  =  Enc ( k 0 wi ,k 1 wj ,g ) ( k 1 w k ) through 2 : Dec ( k 0 wi ,k 1 wj ,g ) ( c ) =  SHA1 ( k b i w i  k b j w j  g )  ⊕  c.  (2)On the other hand, the other output wire key, namely  k 0 w k ,cannot be recovered having access only to ( k 0 w i ,k 1 w j ). Moregenerally, it is worth noting that the knowledge of (a) theciphertexts, and (b) keys ( k b i w i ,k b j w j ) for some inputs  b i  and b j  yields  only   the value of key  k g ( b i ,b j ) w k  ; no other input oroutput keys of gate  g  can be recovered.At the conclusion of the garbling phase, the Garbler sendsthe garbled gates (each comprising a random permutationof four ciphertexts) to the Evaluator. It also provides thecorrespondence between the garbled value and the real bit-value for the circuit-output wires (the outcome of the com-putation): if   w k  is a circuit-output wire, the pairs ( k 0 w k , 0)and ( k 1 w k , 1) are given to the Evaluator. Finally, the Gar-bler keeps the random wire keys ( k 0 w i ,k 1 w i ) that correspondto circuit-input wires  w i , i.e., wires at the very first layerof the circuit, encoding user inputs; all other wire keys arediscarded. 3.3 Oblivious Transfer Phase To transfer the correct keys of the circuit-input wires tothe Evaluator, the Garbler engages in a proxy oblivioustransfer (OT) with the Evaluator and the users. Throughproxy OT, the Evaluator obliviously obtains the circuit-input value keys  k bw i  corresponding to the actual bit  b  of  2 Note that the above encryption scheme is  symmetric  , as Enc , Dec  are the same function. w i w j w k b i  b j  g ( b i ,b j ) Garbled value0 0 0  Enc ( k 0 wi ,k 0 wj ,g ) ( k 0 w k )0 1 1  Enc ( k 0 wi ,k 1 wj ,g ) ( k 1 w k )1 0 1  Enc ( k 1 wi ,k 0 wj ,g ) ( k 1 w k )1 1 1  Enc ( k 1 wi ,k 1 wj ,g ) ( k 1 w k ) Figure 2: A garbled  and  gate. Each of the three wires w i ,w j ,w k  is associated with two random keys  k 0 ·  ,k 1 ·  , repre-senting the value 0 and 1, respectively. The garbled and gateconsists of the four ciphertexts appearing on the rightmostcolumn of the table above: each possible output value key isencrypted using the corresponding input pair of keys, alongwith a string  g  representing the gate id. A random permu-tation of these four ciphertexts is revealed by the Garbler tothe Evaluator.circuit-input wire  w i . Proxy OT ensures that (a) the Garblerdoes not learn the user inputs, (b) the Evaluator can com-pute the function on these inputs alone, and (c) the Garblerlearns nothing. Note that this OT only pertains to circuit-inputs (the first layer of the circuit); as such, its communi-cation cost is typically several orders of magnitude smallerthan the Garbler to Evaluator transfer occurring at the endof the previous phase. As proxy OT is not as intensive incomputation or communication as the other two phases, anddoes involve hardware acceleration, we do not describe it indetail. We refer the interested reader to [7,25,32] for a for- mal description and implementation. 3.4 Evaluation Phase Having the keys corresponding to true user inputs, theEvaluator can “evaluate” each gate, by decrypting each ci-phertext of a gate in the first layer of the circuit throughEq. (2): only one of these decryptions will succeed 3 , re-vealing the key corresponding to the output of this gate.Each output key revealed can subsequently be used to un-garble/evaluate any gate that uses it as an input. The evalu-ator can thus proceed ungarbling gates in breadth first orderover the DAG blueprint of the Boolean circuit, until finallyobtaining the keys of gates at the last layer of the circuit.Using the table mapping these keys to bits, the Evaluatorlearns the final output. 3.5 Optimizations Several improvements over the srcinal Yao protocol havebeen proposed recently, that lead to both computational andcommunication cost reductions. These include the point-and-permute [3], row reduction [25], and Free- xor  [19] opti-mizations, all of which we implement in our design. Point-and-permute reduces the four decryptions at the evaluatorto one. Row-reduction reduces the number of ciphertextsthat need to be transmitted by the Garbler to the Evaluatorfrom four to three. Free- xor  significantly reduces the com-putational cost of garbled  xor  gates.  xor  gates do not needto be encrypted and decrypted, as the xor output wire key is 3 This can be detected, e.g., by appending a prefix of zerosto each key  k bw k , and checking if this prefix is present upondecryption. In practice, the point-and-permute optimizationof  [3] eliminates the need for attempting to decrypt all four ciphertexts. 260  !"#$"%& !"# $"%&%'()*+( ,-'*.'/0(1)*234%!   "(0)*+6%'/+   7+/*28/9':+( ,;/('</0(9':+(   =>?0@ AB'(+C 9':+(   =>?0@ %'()*+C D')*+ =>E./ F'()*+C -'*.+ -2' 1@D@ "!=, "!=, Figure 3: System Overview. An algorithm written inhigh-level language is translated to a Boolean circuit us-ing ObliVM. The resulting circuit is passed through a layerextractor, identifying layers through BFS. The Boolean cir-cuit DAG, annotated with layer information for each gate, ispassed to the Garbler and Evaluator CPUs, that use it as a“blueprint” for execution. The CPUs subsequently use thisblue print to garble gates and evaluate them at the FPGAsof the Garbler and Evaluator, respectively.computed through an  xor  of the corresponding input keys.In addition, the free- xor  optimization fully eliminates com-munication between the Garbler and the Evaluator for  xor s:no ciphertexts need to be communicated between them forthese gates. Our implementation takes advantage of all of these optimizations. We note that, as a result, the circuit forcomputing an  and  gate, illustrated in Fig. 5, differs slightly from the  and  gate garbling algorithm outlined above. 4. FPGAOVERLAYARCHITECTUREFORGARBLED CIRCUITS4.1 System Overview Our FPGA acceleration of GC works as follows. We startwith a function  f   we wish to evaluate securely and a set of user inputs and generate an output of the function withoutrevealing the data. This is done by (a) translating the func-tion to a Boolean circuit and providing commands to theFPGA to garble/evaluate the function, and (b) acceleratingthe garbling or evaluation making use of an FPGA overlayarchitecture. We first describe how the system is partitionedbetween CPU and FPGA processing, and then describe theFPGA implementation.Our system consists of two host PCs (instantiating theGarbler and Evaluator, respectively) equipped with FPGAcards for acceleration, as shown in Figure 3. On the Garbler side, the function  f   is first translated to a Boolean circuitbefore it can be garbled. Toward this end, we make use of ObliVM [22], a software framework that allows developers without any cryptography expertise to convert algorithmsexpressed in a high-level language to GCs. A user writestheir problem in Java, and ObliVM translates it to a Booleancircuit and handles the garbling and evaluation. On thegarbler, we take the Boolean circuit representation outputfrom ObliVM, disabling garbling and evaluation through theframework. The Boolean circuit is sent to our layer extrac-tor: this program extracts the Boolean circuit’s  layers   usingbreadth first search. The resulting layered Boolean circuit(i.e., the “blueprint” for the garbling and execution) is sentto the Evaluator. This “blueprint” is subsequently used bythe two hosts CPUs to dictate the garbling and evaluationto be performed by each FPGA.In more detail, on each CPU, the Boolean circuit is rep-resented as a DAG whose nodes are  and  and  xor  gates.Layers are defined recursively: gates whose input wires areglobal inputs are at layer 0, while a gate is at layer  k  if oneof its input wires connects to a gate at layer  k  −  1 and theother connects to a gate at layer  ≤  k − 1. Note that gates inthe same layer can be executed (i.e., garbled or evaluated)in parallel. Layer information is used to guide the CPU onthe order with which gates are to be loaded to the FPGA,to be gabled or evaluated.The FPGA implements a sea of garbled  and  gates and xor  gates as described in Section 4.2. Each wire of theBoolean circuit has a unique wire ID associated with it.These wire IDs are also used as the memory addresses on theFPGA: these memory locations store the keys correspond-ing to these wires, used to garble or evaluate a gate. TheCPU is responsible for mapping Boolean circuit gates to theFPGA hardware  and  and  xor  gates that realize them. For xor  gates this is trivial, since we implement one  xor  gatein hardware (see below).  and  gates are a different matter,as our FPGA architecture implements as many  and  gatesin parallel as can be kept busy. Suppose the FPGA realizes A  and  gates in hardware. If there are more than  A  and gates in a layer, our FPGA architecture is designed in sucha way that the first hardware  and  gate will complete pro-cessing before the information for the  A  + 1st garbled gateis received by the FPGA. Thus, the CPU can transmit allthe  and  gates in a layer, and assign them to gates mod-ulo  A . More details of the FPGA architecture are given inSection 4.2.When a layer is completed, the CPU then transmits thenext layer of the circuit, until the circuit has been fully gar-bled. The CPU determines the order to send  and  and  xor gates to the FPGA. Currently we send all the  and  gatesfollowed by all the  xor  gates in a layer. Since the latencyof an  and  gate is much longer (82 cycles) than the latencyof an  xor  (one cycle), this results in a relatively efficientordering.For each layer, the Garbler sends to the FPGA (a) thenumber of gates in the layer, (b) the input and output wireIDs for the layer, (c) labels indicating whether a gate is  and or  xor , and (c) for the  and  gates, which hardware gate the and  is mapped to (among the  A  available gates). Layer 0requires key values for the inputs, which are 80 bit randomvalues generated for each possible input value, i.e.,  k 0 w i  and k 1 w i . These strings are generated using a random numbergenerator on the host for each of the input wires  w i  andcommunicated to the FPGA. The output of the FPGA gar-bling includes the ciphertext values (i.e., the garbled gates),as well as keys for output wires. These are sent by the FPGAto the host CPU; the Garbler CPU sends garbled gates di-rectly to the Evaluator CPU. The Garbler CPU also providesinput keys to the Evaluator CPU via proxy oblivious transfer 261
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks