1 Introduction
2 Software Restoration
main
function (Lines 1–12), a pipeline of three stages is created using three threads. The stages are connected by queues such that all stages have an output queue, and stages two and three have an input queue. After creation, the main
function waits for the threads to finish their work (Lines 8–9) before continuing. In Lines 14–34, we show the function for the middle stage of the pipeline, which reads an integer from the input queue, increments it by one, then puts it into the output queue. The first and third stages have a similar structure, where the first stage acts as a source of integers for the second stage, and the third stage doubles its inputs before adding them to the final output queue. All the relevant synchronisation code for the queues can be found in two functions: add_to_queue
and read_from_queue
. Only add_to_queue
(Lines 36–47) is shown here; read_from_queue
is defined similarly. Both functions use one mutex lock and two conditional variables. The latter are used for synchronisation when threads are waiting to insert an element into a full queue or for reading from an empty queue (e.g. at the start of the program). When a thread needs to add to the queue, it first acquires the queue lock and checks if the queue is full (Lines 38–41). When the queue is full, the thread releases the lock and waits for a signal that some other thread has consumed an element of this queue (queue->queue_cond_read
conditional variable, Line 41). After this conditional variable is signalled, the thread enqueues the element, updating the queue counter and pointer in the process (Lines 42–44). Finally, the thread signals that an element has been added to the queue (queue->queue_cond_write
conditional variable, Line 45) and releases the queue lock (Line 46) before returning.2.1 Parallelism Elimination
Stage1
, Stage2
, and Stage3
. This process could be achieved by using a technique similar to the one described in [10] Following pattern discovery, the first code transformation step is applied, where pthread operations and primitives are either removed or transformed, eliminating parallelism. In Listing 1, this impacts main
and add_to_queue
. Listing 2 shows the resulting code.
Stage1
contains a
-loop that enqueues elements in its output queue. Since the second stage, which reads from that queue, is no longer consuming those elements concurrently, and the queue is smaller than the total number of elements produced, the second stage will now consume and process only a subset of its inputs in the original pthreaded version after Stage1
returns. Because the semantics of the program have changed following Parallelism Elimination, the code must be repaired.2.2 Code Repair
Stage1
, Stage2
, and Stage3
, thereby resulting in a loop where the operations in stages two and three are applied to each integer produced by stage one in the same iteration that produces it. Listing 3 represents the result of this process, where Stage1
, Stage2
, and Stage3
are first lifted into a new function, Pipe
, and subsequently unfolded (i.e. unfolding in the transformational sense, à la Burstall and Darlington [11]). The
-loops exposed by this unfolding are then merged, allowing all three stages to be executed within a single iteration. This avoids the first stage overflowing its output queue, and consequently, results in a program that is sequential but semantically equivalent to the original pthreaded program.
2.3 Program Shaping
2.4 Pattern Introduction
3 Pipeline Assumptions
Stage1
4 Restoration Transformations
4.1 Parallelism Elimination
-
Removes when all pthread operations within the file are found within the functions identified as part of the valid pipeline.
-
Removes all pthread operations within the pipeline functions aside from calls to both and .
-
Removes all variable declarations whose types are defined as part of the pthread library, excepting . This includes global declarations when those variables occur solely within the identified pipeline functions.
-
Declarations in the form are transformed into . As above, in the case where such declarations are global, the variables may occur only in the pipeline functions.
-
Calls to of the form, are transformed into the form: Recall that Parallelism Elimination converts the type of variables to variables of the same name(s), and that requires that
f
returns a value of type .×× -
Calls to are transformed according to whether the second argument is
NULL
. When the second argument is notNULL
, e.g. , the join operation is transformed into the formx = t
. Otherwise, the call to is removed. -
In cases where a call to or forms the right-hand-side of an assignment statement, e.g. in addition to the transformation of the pthread operation, an assignment statement is inserted where the variable being assigned,
r
, is assigned the value of a successful call to the original pthread operation, here and0
. The assignments resulting from the transformation is:×× -
Any -loop whose body contains no statements following the removal of a pthread operation will itself be removed.
-
Any -statement with a branch whose body contains no statements following the removal of a pthread operation will be transformed to have only the other branch, or itself removed, if no such branch exists. For instance, given the -loop from the synthetic pipeline example in Listing 1, since the second argument to is
NULL
, the join operation result is itself a statement, and the body of the -loop contains no other operations, this -loop is removed.×
4.2 Code Repair
Stage1
, Stage2
, and Stage3
have been lifted into the function Pipe
using extract method and then unfolded. It is possible to merge all three loops since the statements in between the loops can be safely executed prior to the first and second loops.
i1 = MAXDATA
, i1>=0
, and i1–
, respectively. The bodies of the individual loops are included in the same order as the original loops themselves. Since the merged loop uses its initialisation statement, test expression, and iteration expression, the body of the first loop is included unchanged. Bodies of subsequent loops, however, are preceded by their update statement. For example, the body of the second loop is preceded by the assignment to my_input2
on Line 12 in Listing 6 which is taken from the update statement on Line 18 in Listing 5. This ensures that the input queue for each stage is read from only when a task has been added to that queue by the preceding stage. In addition to the inclusion of the update expression, the
statements from the original
-loops are removed, leaving the propagation of the
token in all but the final stage. In the final stage of our pipeline (Lines 19–22) we remove the entire
branch in Listing 6 since it contains only the
statement on Line 34 in Listing 5. Whilst the removal of these
statements is not strictly necessary, since they will only be evaluated when the first stage emits an
token once all other tasks have been processed, they are removed because they are redundant now that the termination of the merged loop is controlled by the first stage of the pipeline, which will terminate after generating the
token.4.3 Program Shaping
4.4 Remove Token from Merged Loops
i1>=0
(Line 4 Listing 7), has been replaced with the condition of the
-statement from the first stage of the pipeline, i1>0
(Line 5, Listing 7). That
-statement has itself been replaced by its true branch. The
-statements for the other two stages (Lines 13–17 and 20–22, Listing 7) have similarly been replaced by their true branches, since their false branches handle the
token.4.5 Remove Intermediate Queues
add_to_queue
and read_from_queue
operations. Note that the add_to_queue
operation in the third stage is not unfolded since this is the output of the pipeline itself.
capacity
on Line 7 above. Similarly, a variable undergoes a write when it is being assigned to and is not being updated; e.g. elements
in the first output queue is written to on Line 6. Finally, a variable is updated when it occurs in a statement that is both reading from and writing to that variable; e.g. addTo
in Line 7 above. Basic increment operators, e.g. nr_elements++
on Line 8, are similarly considered updates due to their semantics. In order to transform these read, write, and update operations, we pair the operations in the order that they appear in the code and according to the variables they read, write, or update, and transform those pairs according to their composition. If two queues are semantically the same but referred to by different variables then they themselves will be considered the same during pairing; e.g. myOutputQueue1
and myInputQueue2
refer to the same intermediate queue, thus myOutputQueue1->elements
and myInputQueue2->elements
are similarly considered to be the same variable for pairing. In the above example, two cases arise: elements
on Line 6 and the read from elements
on Line 10 can be paired (due in part to the behaviour of the queue reading the element that has just been added). Since this represents passing my_output1
on Line 6 to my_input2
on Line 10, it is possible to remove Line 10 and transform Line 6 into the form my_input_2 = my_output_1
.capacity
on Line 7, or a paired write, e.g. addTo
on Line 6, is removed or otherwise transformed along with the update or paired write statement. Similarly, an unpaired read that is part of a paired read statement, e.g. readFrom
on Line 10, is also transformed according to the paired read statement. When applied, the above transformations result in the removal of the two intermediate queues.
5 Evaluation
5.1 Image Convolution
tq1
and tq2
for stage1
and stage2
, respectively), computes the unit of work on the task item (Lines 15 and 25) and then places the result on an output queue (Lines 16 and 26). Functions add_to_queue
and read_from_queue
put a task in an output queue and read a task from an input queue, respectivelly, in a thread safe manner. The code for add_to_queue
was shown in Listing 1.gettask
and puttask
in the stages, merging the stages together, and finally removing the intermediate queue between the two stages (leaving the input and output queue; see Listing 10).
5.2 Discussion
5.3 Limitations
Benchmark | McCabe | Lines | Performance (std dev) | ||||
---|---|---|---|---|---|---|---|
Before | After | Before | After | Before | After | ||
Blackscholes | F | 29 | 29 | 366 | 393 | 38.5 (0.07) | 39 (0.42) |
MatMult | F | 9 | 15 | 176 | 146 | 918.7 (24.6) | 922.24 (30.42) |
Mandelbrot | F | 12 | 11 | 145 | 142 | 2.21 (0.01) | 2.27 (0.04) |
Cholesky | F | 31 | 19 | 321 | 226 | 16.97 (0.07) | 17.08 (0.02) |
NQueens | P (2) | 41 | 24 | 421 | 337 | 8.63 (0.04) | 8.622 (0.27) |
PGPry | P (2) | 23 | 19 | 210 | 243 | 138.1 (0.23) | 131 (0.10) |
ImageConv | P (1) | 71 | 29 | 714 | 280 | 12.85 (6.08) | 5.2 (0.02) |