Glossary
Atomic
Indivisible. An instruction sequence executed by a strand is atomic if it appears at any moment to any other strand as if either no instructions in the sequence have been executed or all instructions in the sequence have been executed.
Chip multiprocessor
A general-purpose multiprocessor implemented as a single multicore chip.
Cilk
An extension to the C and C++ programming languages that allows a programmer to express task parallelism. Important milestones in the history of Cilk include Cilk-1, Cilk-5, Cilk++, Cilk Plus, and OpenCilk.
Cilk Plus
An important milestone in the history of Cilk developed by Intel Corporation, Cilk Plus extended Cilk++ with transparent interoperability with legacy C/C++ binary executables.
Cilk++
An important milestone in the history of Cilk developed by CilkArts, Inc., Cilk++ extended Cilk-5 to C++ and introduced reducer hyperobjects as an efficient means for resolving races on nonlocal variables.
Cilk-1
The first important milestone in the history of Cilk, Cilk-1 was developed at MIT and provided provably efficient work-stealing runtime support but little linguistic support.
Cilk-5
The next important milestone after Cilk-1 in the history of Cilk, Cilk-5 was developed at MIT and provided simple linguistic extensions to ANSI C for multithreading.
Cilksan
The Cilksan race detector is a tool provided in OpenCilk for finding race condition defects in Cilk code.
cilk_for
A keyword in the Cilk language that
indicates a for
loop whose iterations
can be executed independently in
parallel.
cilk_scope
A keyword in the Cilk language that indicates that all spawned children within the scoped region must complete before proceeding.
cilk_spawn
A keyword in the Cilk language that indicates that the named subroutine can execute independently and in parallel with the caller.
cilk_sync
A keyword in the Cilk language that
indicates that all functions spawned
within the current function must complete
before statements following the
cilk_sync
can be executed.
Commutative operation
An operation
Concurrent agent
A processor, process, thread, strand, or other entity that executes a program instruction sequence in a computing environment containing other such entities.
Core
A single processor unit of a multicore chip. The terms "processor" and "CPU" are often used in place of "core," although industry usage varies. Archaic: A solid-state memory made of magnetized toroidal memory elements.
CPU
Central Processing Unit. We use this term as a synonym for "core," or a single processor of a multicore chip.
Critical section
The code executed by a strand while holding a lock.
Critical-path length
See span.
Data race
A race condition that occurs when two or more parallel strands, holding no lock in common, access the same memory location and at least one of the strands performs a write. Compare with determinacy race.
Deadlock
A situation when two or more strands are each waiting for another to release a resource, and the "waiting-for" relation forms a cycle so that none can ever proceed.
Determinacy race
A race condition that occurs when two logically parallel strands access the same memory location and at least one strand performs a write.
Determinism
The property of a program when it behaves identically from run to run when executed on the same inputs. Deterministic programs are usually easier to debug.
Distributed memory
Computer storage that is partitioned among several processors. A distributed-memory multiprocessor is a computer in which processors must send messages to remote processors to access data in remote processor memory. Contrast with shared memory.
Execution time
How long a program takes to execute on a given computer system. Also called running time.
Fake lock
A construct that Cilksan
treats as
a lock but which behaves like a no-op
during actual running of the program. A
fake lock can be used to suppress the
reporting of an intentional race condition.
False sharing
The situation that occurs when two strands access different memory locations residing on the same cache block, thereby contending for the cache block. For more information, see the Wikipedia entry https://en.wikipedia.org/wiki/False_sharing.
Global variable
A variable that is bound outside of all local scopes. See also nonlocal variable.
Hyperobject
A linguistic construct supported by the OpenCilk runtime system that allows many strands to coordinate in updating a shared variable or data structure independently by providing different views of the hyperobject to different strands at the same time. The reducer is the only hyperobject currently provided by OpenCilk.
Instruction
A single operation executed by a processor.
Knot
A point at which the end of one strand meets the end of another. If a knot has one incoming strand and one outgoing strand, it is a serial knot. If it has one incoming strand and two outgoing strands, it is a spawn knot. If it has multiple incoming strands and one outgoing strand, it is a sync knot. A Cilk program does not produce serial knots or knots with both multiple incoming and multiple outgoing strands.
Linear speedup
Speedup proportional to the processor count. See also perfect linear speedup.
Lock
A synchronization mechanism for providing atomic operation by limiting concurrent access to a resource. Important operations on locks include acquire (lock) and release (unlock). Many locks are implemented as a mutex, whereby only one strand can hold the lock at any time.
Lock contention
The situation wherein multiple strands vie for the same lock.
Multicore
A semiconductor chip containing more than one processor core.
Multiprocessor
A computer containing multiple general-purpose processors.
Mutex
A "mutually exclusive" lock that only one
strand can acquire at a time, thereby
ensuring that only one strand executes
the critical section protected by the
mutex at a time.
For example, Linux* OS supports Pthreads pthread_mutex_t
objects.
Nondeterminism
The property of a program when it behaves differently from run to run when executed on exactly the same inputs. Nondeterministic programs are usually hard to debug.
Nonlocal variable
A program variable that is bound outside
of the scope of the function, method, or
class in which it is used. In Cilk
programs, we also use this term to refer
to variables with a scope outside a
cilk_for
loop.
OpenCilk
A task-parallel programming platform for multicore computers based on Cilk technology.
Parallel loop
A for
loop all of whose iterations can be
run independently in parallel. The
cilk_for
keyword designates a parallel
loop.
Parallelism
The ratio of work to span, which is the largest speedup an application could possibly attain when run on an infinite number of processors.
Perfect linear speedup
Speedup equal to the processor count. See also linear speedup.
Process
A self-contained concurrent agent that by default executes a serial chain of instructions. More than one thread may run within a process, but a process does not usually share memory with other processes. Scheduling of processes is typically managed by the operating system.
Processor
A processor implements the logic to execute program instructions sequentially; we use the term "core" as a synonym. This document does not use the term "processor" to refer to multiple processing units on the same or multiple chips, although other documents may use the term that way.
Race condition
A source of nondeterminism whereby the result of a concurrent computation depends on the timing or relative order of the execution of instructions in each individual strand.
Receiver
A variable to receive the result of a function call.
Reducer
A hyperobject with a defined (usually
associative) reduce()
binary operator
which the OpenCilk runtime system uses to
combine the each view of each separate
strand.
A reducer must have two methods:
- A default constructor which initializes the reducer to its identity value
- A
reduce()
method which merges the value of right reducer into the left (this) reducer.
Response time
The time it takes to execute a computation from the time a human user provides an input to the time the user gets the result.
Running time
How long a program takes to execute on a given computer system. Also called execution time.
Scale down
The ability of a parallel application to run efficiently on one or a small number of processors.
Scale up
The ability of a parallel application to run efficiently on a large number of processors. See also linear speedup.
Sequential consistency
The memory model for concurrency wherein the effect of concurrent agents is as if their operations on shared memory were interleaved in a global order consistent with the orders in which each agent executed them. This model was advanced in 1976 by Leslie Lamport.
Serial execution
Execution of the serial projection of a Cilk program.
Serial projection
The C or C++ program that results from
stubbing out the keywords of a Cilk
program, where cilk_spawn
, cilk_scope
, and
cilk_sync
are elided and cilk_for
is
replaced with an ordinary for
. The
serial projection can be used for debugging
and, in the case of a converted C/C++
program, will behave exactly as the
original C/C++ program. The terms "serialization" and "serial elision" are used in some of the literature.
Also, see "serial semantics".
Serial semantics
The behavior of a Cilk program when executed as the serial projection of the program. See the following article: Four Reasons Why Parallel Programs Should Have Serial Semantics.
Serialization
See serial projection.
Shared memory
Computer storage that is shared among several processors. A shared-memory multiprocessor is a computer in which each processor can directly address any memory location. Contrast with distributed memory.
Span
The theoretically fastest execution time
for a parallel program when run on an
infinite number of processors,
discounting overheads for
communication and scheduling. Often
denoted by
Spawn
To call a function without waiting for it to return, as in a normal call. The caller can continue to execute in parallel with the called function. See also cilk_spawn.
Speedup
How many times faster a program is
when run in parallel than when run on
one processor. Speedup can be
computed by dividing the running time
Strand
A serial chain of executed instructions without any parallel control (such as a spawn, sync, return from a spawn, etc.)
Sync
To wait for a set of spawned functions to return before proceeding. The current function is dependent upon the spawned functions and cannot proceed in parallel with them. See also cilk_sync.
Task-parallel
Task-parallel platforms provide a layer of software on top of threads to coordinate, schedule, and manage the processors of a multicore. Some task-parallel platforms are built as runtime libraries, but others provide full-fledged parallel languages with compiler and runtime support.
Task-parallel programming allows parallelism to be specified in a "processor-oblivious" fashion, where the programmer identifies what computational tasks may run in parallel but does not indicate which thread or processor performs the task. Thus, the programmer is freed from worrying about communication protocols, load balancing, and other vagaries of thread programming. The task-parallel platform contains a scheduler, which automatically load-balances the tasks across the processors, thereby greatly simplifying the programmer’s chore. Task-parallel algorithms provide a natural extension to ordinary serial algorithms, allowing performance to be reasoned about mathematically using "work/span analysis."
Thread
A thread executes a serial instruction chain. Scheduling of threads is typically managed by the operating system.
Throughput
A number of operations performed per unit time.
View
The state of a hyperobject as seen by a given strand.
Work
The running time of a program when run on one processor,
sometimes denoted by
Work stealing
A scheduling strategy where processors post parallel work locally and, when a processor runs out of local work, it steals work from another processor. Work-stealing schedulers are notable for their efficiency, because they incur no communication or synchronization overhead when there is ample parallelism. The OpenCilk runtime system employs a work-stealing scheduler.
Worker
A thread that, together with other workers, implements the OpenCilk runtime system's work stealing scheduler.