Magazines/Newspapers

17 pages
5 views

Open-Source Variable-Precision Floating-Point Library for Major Commercial FPGAs

of 17
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
There is increased interest in implementing floating-point designs for different precisions that take advantage of the flexibility offered by Field-Programmable Gate Arrays (FPGAs). In this article, we present updates to the Variable-precision
Transcript
  20Open-Source Variable-Precision Floating-Point Libraryfor Major Commercial FPGAs XIN FANG and MIRIAM LEESER , Northeastern UniversityThere is increased interest in implementing floating-point designs for different precisions that take ad- vantage of the flexibility offered by Field-Programmable Gate Arrays (FPGAs). In this article, we presentupdates to the Variable-precision FLOATing Point Library (VFLOAT) developed at Northeastern Universityand highlight recent improvements in implementations for implementing reciprocal, division, and squarerootcomponents that scale to double precision forFPGAsfromthe two major vendors:Altera andXilinx.Ourlibrary is open source and flexibleand provides the user with many options.A designerhas many tradeoffs toconsider including clock frequency, total latency, and resource usage as well as target architecture. We com-pare the generated cores to those produced by each vendor and to another popular open-source tool: FloPoCo. VFLOAT has the advantage of not tying the user’s design to a specific target architecture and of providing the maximum flexibility for all options including clock frequency and latency compared to other alternatives.Our results show that variable-precision as well as double-precision designs can easily be accommodatedand the resulting components are competitive and in many cases superior to the alternatives.CCSConcepts:   Hardware →  Arithmeticanddatapathcircuits ;  ReconfigurablelogicandFPGAs ; Hardware accelerators ;   Mathematics of computing  →  Arbitrary-precision arithmetic  Additional Key Words and Phrases: FPGA, floating point, variable precision, cross-platform  ACM Reference Format:  Xin Fang and Miriam Leeser. 2016. Open-source variable-precision floating-point library for major commer-cial FPGAs. ACM Trans. Reconfigurable Technol. Syst. 9, 3, Article 20 (July 2016), 17 pages.DOI: http://dx.doi.org/10.1145/2851507 1. INTRODUCTION Support for floating-point operations in Field-Programmable Gate Arrays (FPGAs) hasbecome increasingly popular over the years. While many users are only interestedin IEEE single- or double-precision floating-point formats, others use the flexibilityavailable on FPGA fabric to customize their operations to specific data widths andintermediate sizes of floating-point numbers [Paschalakis and Lee 2003; Underwood2004; Zhuo and Prasanna 2007; Wang and Leeser 2007, 2009]. FPGAs provide a flexi- bility not available on other platforms and allow designers to make tradeoff decisionsbetween resources used and precision of the output, among others. Nonstandard preci-sions can potentially increase the stability of some numerical algorithms by computing with wider mantissa values. In applications where the data has a large dynamic rangebut precision is less important, large exponent ranges and narrower mantissa fieldscan be used. ThisworkwassupportedinpartbytheNationalScienceFoundationgrantsCCF-1218075andCNS-1337854,and by gifts from Altera and Xilinx. Authors’ addresses: X. Fang and M. Leeser, Department of Electrical and Computer Engineering, Northeast-ern University, 360 Huntington Ave, Boston, MA 02115; emails: fang.xi@husky.neu.edu, mel@coe.neu.edu.Permission to make digital or hard copies of all or part of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. Copyrights for components of this work ownedby others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish,topostonserversortoredistribute tolists,requirespriorspecificpermissionand/orafee.Requestpermissions from Permissions@acm.org.2016 Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 1936-7406/2016/07-ART20 $15.00DOI: http://dx.doi.org/10.1145/2851507  ACM Transactions on Reconfigurable Technology and Systems, Vol. 9, No. 3, Article 20, Publication date: July 2016.  20:2 X. Fang and M. Leeser  At Northeastern University, we have developed the Variable Precision FLOATing Point Library (VFLOAT) [Wang and Leeser 2010; NEU Reconfigurable Computing Lab2014]. VFLOAT supports basic floating-point operations such as addition and multipli-cation, more advanced components including accumulation and multiply accumulate,and format conversion operators to convert from fixed to floating point and back again.Inthisarticle,wepresentthelibrary;highlightnewimplementationsofdivision,recip-rocal,andsquarerootcomponents[FangandLeeser2013;Fang 2013];andpresentnew results targeting both variable-precision and double-precision IEEE formats. Compo-nents in the VFLOAT library are extremely flexible. The number of bits of mantissaand exponent can be specified by the user on an individual component basis. Pipelinecomponents can be combined in such a way that normalization and rounding are notimplemented after every step. Thus, a user can potentially save area in applicationsthat contain large adder trees, where extra bits can be added to the datapath insteadof normalizing and rounding after each step, trading off one use of FPGA resources foranother. VFLOAT works with the development environments for the two major FPGA com-mercial vendors: Altera and Xilinx. All components are deeply pipelined, and the VFLOAT library gives the user many options for developing floating-point implemen-tations for their designs. The level of pipelining is determined by the underlying com-ponents used and set by the user. A large range of latencies can be implemented foreach of the components in the VFLOAT library.The VFLOAT library is based on the binary IEEE 754-2008 standard floating-pointformat [Institute of Electrical and Electronics Engineers (IEEE) 2008]. Since VFLOATis variable precision, many more mantissa and exponent sizes than those specified inthe standard are supported. However, the basic format of sign bit, biased exponent,and mantissa are followed. VFLOAT does not comply with every detail of the standard.For example, subnormal numbers are not supported, and two rounding modes aresupported: roundTowardZero and the roundTiesToEven directional mode for round tonearest. This latter mode is the default IEEE rounding mode and the most commonlyused. Note that these simplifications are common to many embedded designs and toFPGA floating-point libraries. For example, both Xilinx and Altera only support theroundTiesToEven rounding mode, and neither supports subnormals. Altera has recently introduced hard-core floating-point implementations forsingle-precision addition and multiplication [Altera.com 2014b]. This new hard-coreimplementation is testament to the fact that users are increasingly interested infloating-point applications being realized on FPGAs. Note that these cores are for sin-gle precision only, and it is difficult to implement single-precision division and squareroot with these components, as more bits of precision may be needed before rounding.The contributions of this article are as follows:—Open-sourceimplementationsofreciprocal,division,andsquarerootbasedonTaylorseries that scale well with a large number of mantissa bits and are available in the VFLOAT library—Results for VFLOAT components targeting both Altera and Xilinx hardware, and acomparison with the vendors’ built-in IP cores—A comparison of VFLOAT components to the FloPoCo Floating Point Compiler [DeDinechin and Pasca 2011] and a demonstration that VFLOAT offers more flexibilityand in some cases higher performanceThe rest of the article is organized as follows. In the next section, we provide moredetails on the IEEE floating-point format and present related work. In Section 3, wegive an overview of the VFLOAT library, followed by more details of the division, recip-rocal, and square root implementations in Section 4. Then we compare our componentsto other FPGA floating-point libraries in Section 5 and present conclusions and future  ACM Transactions on Reconfigurable Technology and Systems, Vol. 9, No. 3, Article 20, Publication date: July 2016.  Open-Source Variable-Precision Floating-Point Library for Major Commercial FPGAs 20:3 Fig. 1.  IEEE floating-point format. work. All designs described in this article are available open source [NEU Reconfig-urable Computing Lab 2014] and build on the OpenFabric platform [Chiou et al. 2014]. 2. BACKGROUND2.1. IEEE Floating-Point Representation The VFLOAT library is based on the binary IEEE 754-2008 standard floating-pointformat [Institute of Electrical and Electronics Engineers (IEEE) 2008]. Since VFLOATis variable precision, many more mantissa and exponent sizes than those specified inthestandardaresupported.However,thebasicformatofsignbit,biasedexponent,andmantissa are followed, as shown in Figure 1. In VFLOAT, any number of exponent ormantissa bits can be specified, while in the IEEE format these are defined to be certain values for different fixed representations, including single and double precision.Thefloating-pointformathasthreepartstorepresentthenumericvalue:thesignbit,the exponent, and the mantissa. For base  b , sign bit  s , biased exponent  e , and fractionalpart of the mantissa  c , the value of a floating-point number is ( − 1) s ∗  1 . c  ∗  b  e − bias .The IEEE specification describes base 2 and base 10 implementations; we supportbase 2. Since a normalized mantissa always has the value 1 . c , the 1 is not explicitlyrepresented; however, it must be included in any computation. The bias for a exponentrepresented with  w  bits is 2 ( w − 1) − 1. The IEEE standard specifies specific formats withnumber of exponent and mantissa bits equal to (8,23) for single precision and (11,52)for double. VFLOAT supports these as a subset, as well as any number of exponentand mantissa bits starting from 2 bits. We have tested up to double-precision values,meaning exponents up to 11 bits and mantissas up to 52 bits.The IEEE standard specifies two directional attributes for rounding to nearest(roundTiesToEven, roundTiesToAway) as well as three directed rounding attributes(roundTowardPositive, roundTowardNegative, and roundTowardZero). An implemen-tation that complies with the spec will implement all three directed rounding at-tributes as well as roundTiesToEven. VFLOAT supports roundTiesToEven and Round-TowardZero. Note that the Altera and Xilinx floating-point libraries only supportRoundTiesToEven, which is the default rounding mode.The IEEE standard also defines several exceptions and special values, including NaN (Not a Number) and positive and negative infinity. The VFLOAT implementationsupportsthesespecialvaluesbypropagatingthemthroughthepipelineandgenerating new values where appropriate. For example, divide by zero and taking the square rootof a negative number will generate the appropriate values to be propagated in thepipeline. 2.2. Related Work  There has been an increased interest in floating-point implementations in FPGAs,due to the increased amount of logic available on reconfigurable hardware including logic blocks and embedded multipliers and adders, as well as the relatively low power  ACM Transactions on Reconfigurable Technology and Systems, Vol. 9, No. 3, Article 20, Publication date: July 2016.  20:4 X. Fang and M. Leeser consumption of FPGAs compared to other alternatives such as Graphics Processing Units (GPUs).Both major FPGA vendors provide their own floating-point IP cores. These are calledMegaCores for Altera [Altera.com 2014a] and IP Cores for Xilinx [Xilinx.com 2015]. Note that these are tied to the vendor’s chips and lock a designer in to that design flow.Both support addition/subtraction, multiplication, division, and conversion betweenfixed- and floating-point representation. Altera has the richest set of operators, whichinclude transcendental and trigonometric functions. Some of these components haverestrictions. For example, Xilinx’s reciprocal only supports single and double precision.For Altera MegaCores, only a few latencies are available for the designs we are inter-ested in for this article. We compare VFLOAT components to these vendor cores in theresults sections.The work most similar to VFLOAT is FloPoCo [De Dinechin and Pasca 2011], whichis the more recent version of the FPLibrary [Detrey and De Dinechin 2006]. FloPoCois a generator of arithmetic cores for FPGAs, as opposed to a library of operators. Itallows the user to target both Altera and Xilinx hardware. It is written in C++ andinputs operator specifications and outputs synthesizable VHDL code. Note that, whileFloPoCo’s representation of floating-point numbers is inspired by the IEEE standard,it differs in some key aspects. The main difference is that two leading bits are addedto each word to signal special cases such as exceptions and NaNs. This requires thatthe user convert each floating-point value before using this tool. With the FSF AGPLlicense, FloPoCo is provided open source to the public. We compare our results to theresults obtained by using the FloPoCo compiler in Section 5.There have been several papers in the past describing floating-point libraries andimplementations of floating-point components [Govindu et al. 2005]. A group from theUniversity of Politecnia in Madrid [Echeverr´ıa and L´opez-Vallejo 2011] describes alibrary of components including addition, multiplication, division, and square root aswellasexponential andlogarithm.TheydonotfullyimplementtheIEEEspecification;they treat subnormals as zero and implement truncation but not the round-to-nearestrounding mode. They compare their results to the Xilinx cores and show their com-ponents can support a high clock frequency with low hardware usage. Since they im-plement digit recurrence algorithms for division and square root, they require a largenumber of clock cycles to produce a result. Their library does not appear to be opensource. A group from Brazil has described a parameterizable floating-point library [S´anchezet al. 2009; Mu˜noz et al. 2010]. They support addition, multiplication, division, andsquare root and use Goldschmidt’s algorithm for division and square root. Their li-brary is parameterizable by bit width and by number of clock cycles of latency. Theirmultiplier has a one-cycle latency and, as a result, their clock frequency is low. Theirlibrary does not appear to be open source.Several methods are available for implementing floating-point reciprocal and di- vision [Ercegovac and Lang  2003]. The most common methods are digit recurrence[Ercegovac and Lang  1994; Lee and Burgess 2002; Wang and Nelson 2003], iterative or multiplicative methods, and table-based methods.The digit recurrence method for division has two steps. First, a number is chosenbased on the quotient digit selection function, which is decided before operation. Thesecond step is to update the quotient remainder based on the selected number. Stepsone and two are repeated until the required precision has been reached. For the mostcommon digit selection method, 1 bit of result is generated each time these two stepsare performed. This division method is similar to the paper-and-pencil shift and sub-tract algorithm and is the most common hardware implementation for division. Moresophisticated digit selection functions can generate more than 1 bit of the solution at  ACM Transactions on Reconfigurable Technology and Systems, Vol. 9, No. 3, Article 20, Publication date: July 2016.  Open-Source Variable-Precision Floating-Point Library for Major Commercial FPGAs 20:5 Fig. 2.  VFLOAT library components. a time. One of the most popular approaches is the SRT algorithm [Robertson 1958;Freiman 1961].Iterative or multiplicative methods [Roesler and Nelson 2002; Goldberg et al. 2007; Soderquist and Leeser 1996, 1997] are another popular way to implement division.These are based on multiplications that generate intermediate results iteratively andconverge to the number of bits required after a fixed number of iterations. NewtonRaphson is one of the most popular iterative methods. While the digit recurrencemethod generates 1 bit per recurrence, iterative methods generate multiple bits per it-eration and the output converges to the required result. The number of times needed toiterate is determined by the desired precision. A drawback of iterative methods is thatthey are not easy to pipeline unless the iterations are unrolled. A recent paper [Pasca2012a] presents division algorithms based on Newton Raphson and piecewise poly-nomial approximations and compares them to digit recurrence for both single- anddouble-precision floating-point division. The division algorithms implemented are cor-rectly rounded according to the IEEE FP specification. An alternative to all these istable-based algorithms; this is the approach used in the VFLOAT library.Square root implementations use the same methods: digit recurrence, iterative ormultiplicative, or table based. Most libraries implement the digit recurrence method,which, like division, takes the smallest area but requires the largest number of clockcycles. A recent article presents multiplicative implementations of square root for FP-GAs [De Dinechin et al. 2010] based on piecewise polynomial approximations andcompares their performance to digit recurrence algorithms. We compare our imple-mentation to this approach in the results section. 3. VFLOAT In this section, we go into more detail about the structure and organization of the VFLOATlibrary.Inthisarticle,wefocusonnewalgorithmsfordivision,reciprocal,andsquare root implemented since our previous publication on the VFLOAT library [Wang and Leeser 2010]. Figure 2 lists all the components currently supported in VFLOAT. 3.1. Library Component Features  All of our components are designed to be combined in a pipelined manner. Figure 3shows the signals provided in each component to support this. The input port(s)  OP(s) are the input operand(s). These are provided from the previous component(s) in thepipeline, along with a ready signal to show that the inputs are available. When the  ACM Transactions on Reconfigurable Technology and Systems, Vol. 9, No. 3, Article 20, Publication date: July 2016.
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks