12 pages

The OpenTM Transactional Application Programming Interface

Please download to get full document.

View again

of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Transactional Memory (TM) simplifies parallel programming by supporting atomic and isolated execution of user-identified tasks. To date, TM programming has re quired the use of libraries that make it difficult to achieve scalable performance with
  The OpenTM Transactional Application Programming Interface Woongki Baek, Chi Cao Minh, Martin Trautmann,Christos Kozyrakis and Kunle OlukotunComputer Systems LaboratoryStanford University { wkbaek, caominh, mat42, kozyraki, kunle } @stanford.edu Abstract Transactional Memory (TM) simplifies parallel pro-gramming by supporting atomic and isolated execution of user-identified tasks. To date, TM programming has re-quired the use of libraries that make it difficult to achievescalable performance with code that is easy to develop and maintain. For TM programming to become practical, it isimportant to integrate TM into familiar, high-level environ-ments for parallel programming.This paper presents OpenTM, an application program-ming interface (API) for parallel programming with trans-actions. OpenTM extends OpenMP, a widely used API for shared-memory parallel programming, with a set of com- pilerdirectivestoexpressnon-blockingsynchronizationand speculative parallelization based on memory transactions.We also present a portable OpenTM implementation that  produces code for hardware, software, and hybrid TM sys-tems. The implementation builds upon the OpenMP support in the GCC compiler and includes a runtime for the C pro-gramming language.We evaluate the performance and programmability fea-tures of OpenTM. We show that it delivers the performanceof fine-grain locks at the programming simplicity of coarse-grain locks. Compared to transactional programming withlower-level interfaces, it removes the burden of manual an-notations for accesses to shared variables and enables easychanges of the scheduling and contention management poli-cies. Overall, OpenTM provides a practical and efficient TM programming environment within the familiar scope of OpenMP. 1 Introduction Transactional Memory (TM)  [16] is emerging as apromising technology to address the difficulty of parallelprogramming for multi-core chips. With TM, programmerssimply declare that certain code sections should execute as atomic and isolated transactions  with respect to all othercode. Concurrency control as multiple transactions executein parallel is the responsibility of the system. Several TMsystems have been proposed, based on hardware [21, 24],software [26, 11], or hybrid techniques [18, 27, 6]. To achieve widespread use, TM must be integrated intopractical and familiar programming environments. To date,TM programming has primarily been based on libraries thatinclude special functions to define transaction boundaries,manipulate shared data, and control the runtime system.While the library-based approach is sufficient for initialexperimentation with small programs, it is inadequate forlarge software projects by non-expert developers. Library-based code is difficult to comprehend, maintain, port, andscale, since the programmer must manually annotate ac-cesses to shared data. Library annotations can also compli-cate static analysis and reduce the effectiveness of compileroptimizations [2].This paper presents  OpenTM  , an application program-ming interface (API) for parallel programming with trans-actions. OpenTM extends the popular OpenMP API forshared-memory systems [1] with the compiler directivesnecessary to express both  non-blocking synchronization and  speculative parallelization  using memory transactions.OpenTM provides a simple, high-level interface to expresstransactional parallelism, identify the role of key variables,and provide scheduling and contention management hints.OpenTM code can be efficiently mapped to a variety of hardware (HTM), software (STM), and hybrid TM imple-mentations. It can also be tuned by an optimizing compilerand dynamically scheduled by the runtime system.The specific contributions of this work are: •  We define the set of extensions to OpenMP that sup-port both non-blocking synchronization and specula-tive parallelization using transactional memory tech-niques. Apart from integrating memory transactionswith OpenMP constructs such as parallel loops andsections, the extensions address issues such as nestedtransactions, conditional synchronization, scheduling,and contention management.  •  We describe an OpenTM implementation for the Cprogramming language. The environment is based onthe OpenMP support in the GCC compiler [25] andproduces executable code for hardware, software, andhybrid TM systems. The implementation is easy toport to any TM system that provides a basic low-levelinterface for transactional functionality. •  We evaluate the performance and programmabilityfeatures of parallel programming with OpenTM. Us-ing a variety of programs, we show that OpenTM codeis simple, compact, and scales well.The rest of the paper is organized as follows. Section 2reviews the OpenMP API, while Section 3 introduces theOpenTM extensions. Section 4 describes our first OpenTMimplementation. Sections 5 and 6 present the quantitativeevaluation. Section 7 reviews related work, and Section 8concludes the paper. 2 OpenMP Overview OpenMP is a widely used API for shared-memory par-allel programming [1]. The specification includes a set of compiler directives, runtime library routines, and environ-ment variables. OpenMP follows the  fork-join  parallel ex-ecution model. The  master thread   begins a program ex-ecution sequentially. When the master thread encounters a  parallel construct, it creates a set of   worker threads . Allworkers execute the same code inside the  parallel  con-struct. When a work-sharing construct is encountered, thework is divided among the concurrent workers and executedcooperatively. After the end of a work-sharing construct,every thread in the team resumes execution of the code inthe  parallel  construct until the next work-sharing op-portunity. There is an implicit barrier at the end of the  parallel  construct, and only the master thread executesuser code beyond that point.The OpenMP memory model is shared memory withrelaxed consistency [3]. All threads can perform loadand store accesses to variables in shared memory. The  parallel directive supports two types of variables withinthe corresponding structured block:  shared   and  private .Each variable referenced within the block has an srci-nal variable with the same name that exists outside of the  parallel  construct. Accesses to shared variables fromany thread will access the srcinal variable. For private vari-ables, each thread will have its own private copy of the src-inal variable.OpenMP provides programmers with five classes of di-rectives and routines: parallel, work-sharing, synchroniza-tion, data environment, and runtime library routines. The  parallel  directive starts a parallel construct. The work-sharing directives can describe parallel work for iterative(loops) and non-iterative (parallel sections) code patterns.The following is a simple example of a parallel loop (forall)in OpenMP: #pragma omp parallel forfor (i=0;i<n;i++)b[i]=(a[i]*a[i])/2.0; Synchronization constructs and routines, such as critical  and  barrier , allow threads to coordinate onshared-memory accesses. Data environment primitives de-scribe the sharing attributes of variables referenced in par-allel regions. Finally, runtime routines specify a set of in-terfaces and variables used for runtime optimizations, syn-chronization, and scheduling. A detailed specification andseveral tutorials for OpenMP are available in [1]. 3 The OpenTM API OpenTM is designed as an extension of OpenMP to sup-port non-blocking synchronization and speculative paral-lelization using transactional techniques. Hence, OpenTMinherits the OpenMP execution model, memory seman-tics, language syntax, and runtime constructs. AnyOpenMP program is a legitimate OpenTM program. Non-transactional code, parallel or sequential, behaves exactly asdescribed in the OpenMP specification. We discuss the in-teractionofthenewOpenTMfeatureswithcertainOpenMPfeatures in Section 3.5. 3.1 OpenTM Transactional Model The transactional model for OpenTM is based on threekey concepts. Implicit Transactions : OpenTM supports implicittransactions. The programmer simply defines transactionboundaries withinparallelblockswithoutany additional an-notations for accesses to shared data. All memory opera-tions are executed implicitly on transactional state in orderto guarantee atomicity and isolation. Implicit transactionsminimize the burden on programmers but require compilerand/orhardware supportinordertoimplementtransactionalbookkeeping. For STM systems, the compiler inserts thenecessary barriers, while for HTM systems, the hardwareperforms bookkeeping in the background as the transactionexecutes regular loads and stores [2]. Even with hardwaresupport, compiler optimizations can be highly profitable.Another advantage of implicit transactions is that they sim-plify software reuse and composability. A programmer canreuse a function with implicit transactions without havingto reconsider the barriers necessary for transactional book-keeping under the new context. Strong Isolation : In OpenTM programs, memory trans-actions are atomic and isolated with respect to other trans-actions and non-transactional accesses. There also existsa consistent ordering between committed transactions andnon-transactional accesses for the whole system. This prop-erty is known as  strong isolation  and is necessary for correct  and predictable interactions between transactional and non-transactional code [19]. All hardware [21, 24] and somehybrid [6] TM systems guarantee strong isolation. For soft-ware TM systems, the compiler must insert additional readand write barriers outside of transactions in order to imple-ment strong isolation [29]. Virtualized Transactions : OpenTM requires that theunderlying TM system supports  virtualized transactions that are not bounded by execution time, memory footprint,and nesting depth. The TM system should guarantee cor-rect execution even when transactions exceed a schedulingquantum, exceed the capacity of hardware caches or phys-ical memory, or include a large number of nesting levels.While it is expected that the majority of transactions willbe short-lived [4, 10, 6], programmers will naturally expectthat any long-lived transactions, even if infrequent, will behandled correctly in a transparent manner. Virtualizationis a challenge for HTM systems that use hardware cachesand physical addresses for transactional bookkeeping. Ex-cluding the performance implications, OpenTM is agnos-tic to the exact method used to support virtualized transac-tions [4, 9, 8]. 3.2 Basic OpenTM Constructs The following are the basic OpenTM extensions for par-allel programming with transactions. Transaction Boundaries : OpenTM introduces the transaction  construct to specify the boundaries of strongly isolated transactions. The syntax is: #pragma omp transaction  [clause[[,] clause]...]structured-block  where  clause  is one of the following:  ordered  , nesting(open|closed) .  ordered   is used to spec-ify a sequential commit order between executions of thistransaction by different threads. This is useful for specu-lative parallelization of sequential code. If not specified,OpenTM will generate unordered yet serializable transac-tions. Unordered transactions are useful for non-blockingsynchronization in parallel code. During the execution of transactions, the underlying TM system detects conflictingaccesses to shared variables in order to guarantee atomic-ity and isolation. On a conflict, the system aborts and re-executes transactions based on the ordering scheme and acontention management policy. We discuss the  nesting clause along with the advanced OpenTM features in Sec-tion 3.3. Note that the definition of   structured-block   is thesame as in OpenMP. Transactional Loop : The  transfor  construct spec-ifies a loop with iterations executing in parallel as atomictransactions. The  transfor  syntax is: #pragma omp transfor  [clause[[,] clause]...] for-loop transfor  reuses most of the clauses of the OpenMP for construct such as  private and reduction to iden-tify private or reduction variables, respectively. Certainclauses are extended or added to specify the transactionalcharacteristics of the associated loop body. The  ordered  clause specifies that transactions will commit in sequen-tial order, implying a foreach loop with sequential seman-tics (speculative loop parallelization). If   ordered   is notspecified, transfor will generate unordered transactions,which implies an unordered foreach loop with potential de-pendencies. The OpenMP  for  construct specifies a forallloop.The  transfor  construct can have up to three param-eters in the  schedule  clause. Just as with the OpenMP for , the first parameter specifies how loop iterations arescheduled across worker threads (see discussion in Sec-tion 3.4). The second parameter identifies the number of iterations ( chunk size ), assigned to each thread on everyscheduling decision. The tradeoff for chunk size is betweenscheduling overhead and load balance across threads. Thethird parameter specifies how many loop iterations will beexecuted per transaction by each worker thread ( transac-tion size ). If it is not specified, each iteration executes as aseparate transaction. The tradeoff for transaction size is be-tween the overhead of starting/committing a transaction andthe higher probability of conflicts for long-running transac-tions.The following code example uses a  transfor  to per-form parallel histogram updates. In this case, iterations arestatically scheduled across threads with a chunk size of 42iterations. Transactions are unordered with 6 iterations pertransaction. void   histogram( int  *A, int  *bin){ #pragma omp transfor schedule ( static ,42,6) for (i=0;i<NUM_DATA;i++){bin[A[i]]++;}} A user can also define transactions using the transaction  construct within the loop body of anOpenMP  for  construct. This approach allows program-mers to write tuned code with transactions smaller than aloop body that may reduce the pressure on the underlyingTM system. On the other hand, it requires better under-standing of the dependencies within the loop body andslightly more coding effort. Transactional Sections : OpenTM supports transactionsin parallel sections (non-iterative parallel tasks) using the transsections  construct. Its syntax is: #pragma omp transsections  [clause[[,] clause]...][ #pragma omp transsection ]structured-block 1[ #pragma omp transsection ]structured-block 2 ...  Compared to OpenMP  sections ,  transsections uses an additional  ordered   clause to specify sequentialtransaction ordering. While  sections  implies that thestructured blocks are proven to be parallel and independent, transsections can express parallel tasks with potentialdependencies. Similar to the loop case, transactional sec-tions can also be specified using the  transaction  con-struct within OpenMP  sections .The following is a simple example of method-level spec-ulative parallelization using OpenTM  transsections construct: #pragma omp transsections ordered   { #pragma omp transsection WORK_A(); #pragma omp transsection WORK_B(); #pragma omp transsection WORK_C();} 3.3 Advanced OpenTM Constructs The basic OpenTM constructs discussed above are suf-ficient to express the parallelism in a wide range of appli-cations. Nevertheless, OpenTM also introduces a few ad-vanced constructs to support recently proposed techniquesfor TM programming. These constructs require advancedfeatures in the underlying TM system. Alternative execution paths:  The  orelse  constructsupports alternative execution paths for aborted transac-tions [15, 2]. The syntax is: #pragma omp transaction structured-block 1 #pragma omp orelse structured-block 2 ... When the transaction for block 1 successfully commits,the entire operation completes and block 2 never executes.If the transaction for block 1 aborts for any reason, thecode associated with the  orelse  construct is executed asan atomic transaction. A program can cascade an arbi-trary number of  orelse constructs as alternative executionpaths. Conditional synchronization:  OpenTM supports con-ditional synchronization in atomic transactions using the omp retry()  runtime routine.  omp retry()  indicatesthatthetransactionisblockedduetocertainconditions[15].The runtime system will decide whether the transaction willbe re-executed immediately or the corresponding threadwill be suspended for a while. The transaction can usethe  omp watch()  routine to notify the runtime systemthat it should monitor a set of addresses and re-execute theblocked transaction when one of them has been updated [7].The following is a simple example of the conditional syn-chronization within a transaction: #pragma omp transaction  { if  (queue.status == EMPTY) { omp_watch (addr); omp_retry ();}  else  {t = dequeue(queue);} Compared to conditional/guarded atomic blocks or con-dition variables [14, 2], conditional synchronization with retry  allows for complex blocking conditions that can occuranywhere within the transaction. Moreover, the directive-based OpenMP approach places additional restrictions inthe specification of the blocking condition. For example,a programmer cannot use array indices to specify the con-dition. Finally,  retry  is composable [15]. We should notethat when  omp retry()  is called in a transaction thatalso uses  orelse , the transaction is aborted and controlis transfered immediately to the alternative execution path. Nested Transactions : The  nesting  clause specifiesthe behavior of nested transactions. If   nesting  is notspecified, OpenTM uses closed-nested transactions by de-fault. Closed-nested transactions can abort due to depen-dencies without causing their parent to abort [22]. Thememory updates of closed-nested transactions become vis-ible to other threads only when the outermost transactioncommits. The  open  clause allows a program to start anopen-nested transaction that can abort independently fromits parent but makes its updates visible immediately upon itscommit, regardless of what happens to the parent transac-tion [22]. Open-nested transactions may require finalizingand compensating actions that execute when the outermosttransaction commits or aborts, respectively [13]. While wedo not expect that many programmers will use open-nestedtransactions directly, they can be helpful with addressingperformance issues and the implementation of additionalprogramming constructs [22]. Transaction Handlers : The  handler  construct spec-ifies software handlers that are invoked when a transactioncommits or aborts. OpenTM handlers follow the semanticspresented in [22]. The handler syntax is: #pragma omp transaction  [clause[[,] clause]...]structured-block 1 #pragma omp handler  clausestructured-block 2 where clause is one of the following:  commit ,  abort , violation . Violation refers to a rollback triggeredby a dependency, while abort is invoked by the transac-tion itself. The  handler  construct can be associatedwith  transaction ,  transfor ,  transsections ,and orelse constructs. Thecodebelowprovidesanexam-ple with an abort handler used to compensate for an open-nested transaction:  Routine Description omp in transaction()  Return  true  if executed within transaction. omp get nestinglevel()  Return the nesting-level of the currenttransaction. omp abort()  User-initiated abort of the current transac-tion. omp retry()  User-initiated retry of the current transac-tion. omp watch()  Add an address to a watch-set [7]. omp set cm()  Set the contention management scheme. omp get cm()  Return the current contention managementscheme. Table 1. The extra runtime routines in OpenTM. #pragma omp transaction  { #pragma omp transaction nesting ( open ){WORK_A();}  //end of open-nested transaction #pragma omp handler abort {COMPENSATE_WORK_A();}}  //end of parent transaction 3.4 Runtime System OpenTM also extends the runtime system of OpenMP tosupport transactional execution. Table 1 summarizes the ad-ditional runtime library routines available to programmers. Loop Scheduling : The  for  construct in OpenMP pro-vides four options for scheduling iterations across workerthreads:  static ,  dynamic ,  guided  , and  runtime .Static scheduling statically distributes work to threads,while dynamic scheduling assigns a chunk of iterations tothreads upon request during runtime. Guided schedulingis similar to dynamic scheduling, but the chunk size de-creases exponentially to avoid work imbalance. Runtimescheduling makes the decision dependent on the  run-sched-var   environment variable at runtime [1]. OpenTM reusesthese options but adds an additional task of grouping loopiterations into transactions. When grouping in a dynamicor guided manner, the system can use runtime feedback tominimize the overhead of small transactions without run-ning into the virtualization overhead or the higher probabil-ity of conflicts of larger transactions. Contention Management : OpenTM provides two run-time routines, presented in Table 1, to control the con-tention management scheme of the underlying TM sys-tem [12, 28]. The  omp get cm()  routine returns the typeof the currently used contention management scheme. The omp set cm()  routine allows the user to the change con-tention management scheme for the whole system in run-time. Programmers can use this interface to adapt con-tention management to improve performance robustness orprovide fairness guarantees [12]. The exact parameters for omp set cm()  depend on the available policies. For ex-ample, our current implementation supports a simple back-off scheme [28] that requires a parameter to specify themaximum number of retries before an aborted transactiongets priority to commit. As TM systems mature, the corre-sponding contention management techniques will be inte-grated into OpenTM. 3.5 Open Issues and Discussion There are certain subtle issues about OpenTM. Our ini-tial specification takes a conservative approach in severalcases. However, we expect that practical experience withtransactional applications, further developments in TM re-search, and advanced compiler support, will provide moresophisticated solutions for future versions of OpenTM. OpenMP Synchronization : OpenTM does not allowthe use of OpenMP synchronization constructs withintransactions (e.g.,  critical ,  atomic ,  mutex , and  barrier ). OpenMP synchronization constructs haveblocking semantics and can lead to deadlocks or viola-tions of strong isolation when used within transactions.We also disallow the use of transactions within OpenMPsynchronization constructs, as there can be deadlock sce-narios if   omp retry()  is used. In general, separatingtransactional code from blocking synchronization will helpprogrammers reason about the correctness of their codeat this point. The blocking semantics of   atomic  and critical  are also the primary reason we introduced thenon-blocking transaction construct. Reusing atomic or  critical  for TM programming could lead to dead-locks or incorrect results for existing OpenMP programs. I/O and System Calls : OpenMP requires that any li-braries called within parallel regions are  thread-safe . Sim-ilarly, OpenTM requires that any libraries called withintransactionsare transaction-safe . Thechallengefortransac-tional systems is routines with permanent side effects suchas I/O and system calls. Currently, OpenTM does not pro-pose any specification for such routines. Each OpenTMimplementation is responsible for specifying which librarycalls can be safely used within transactions as well as whatare the commit and abort semantics. There is ongoing re-search on how to integrate I/O and system calls with trans-actions using buffering and serialization techniques [22]. Nested Parallelism : For the moment, we do not allowuser code to spawn extra worker threads within a transac-tion. Relaxed Conflict Detection : Recent papers have pro-posed directives that exclude certain variables from conflictdetection (e.g.,  race  [31] or  exclude  [23]). The moti-vation is to reduce the TM bookkeeping overhead and anyunnecessary conflicts. At the moment, OpenTM does notinclude such directives for multiple reasons. First, with-out strong understanding of the dependencies in the algo-rithm, use of such directives can easily lead to incorrector unpredictable program behavior. Second, the directive-
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!