项目作者: taskflow

项目描述 :
A General-purpose Parallel and Heterogeneous Task Programming System
高级语言: C++
项目地址: git://github.com/taskflow/taskflow.git
创建时间: 2018-04-18T13:45:30Z
项目社区:https://github.com/taskflow/taskflow

开源协议:Other

下载


" class="reference-link">Taskflow

Ubuntu
macOS
Windows
Wiki
TFProf
Cite

Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++

Why Taskflow?

Taskflow is faster, more expressive, and easier for drop-in integration
than many of existing task programming frameworks
in handling complex parallel workloads.

Taskflow lets you quickly implement task decomposition strategies
that incorporate both regular and irregular compute patterns,
together with an efficient work-stealing scheduler to optimize your multithreaded performance.

Static Tasking Subflow Tasking

Taskflow supports conditional tasking for you to make rapid control-flow decisions
across dependent tasks to implement cycles and conditions that were otherwise difficult to do
with existing tools.

Conditional Tasking

Taskflow is composable. You can create large parallel graphs through
composition of modular and reusable blocks that are easier to optimize
at an individual scope.

Taskflow Composition

Taskflow supports heterogeneous tasking for you to
accelerate a wide range of scientific computing applications
by harnessing the power of CPU-GPU collaborative computing.

Concurrent CPU-GPU Tasking

Taskflow provides visualization and tooling needed for profiling Taskflow programs.

Taskflow Profiler

We are committed to support trustworthy developments for both academic and industrial research projects
in parallel computing. Check out Who is Using Taskflow and what our users say:

See a quick presentation and
visit the documentation to learn more about Taskflow.
Technical details can be referred to our IEEE TPDS paper.

Start Your First Taskflow Program

The following program (simple.cpp) creates four tasks
A, B, C, and D, where A runs before B and C, and D
runs after B and C.
When A finishes, B and C can run in parallel.
Try it live on Compiler Explorer (godbolt)!

  1. #include <taskflow/taskflow.hpp> // Taskflow is header-only
  2. int main(){
  3. tf::Executor executor;
  4. tf::Taskflow taskflow;
  5. auto [A, B, C, D] = taskflow.emplace( // create four tasks
  6. [] () { std::cout << "TaskA\n"; },
  7. [] () { std::cout << "TaskB\n"; },
  8. [] () { std::cout << "TaskC\n"; },
  9. [] () { std::cout << "TaskD\n"; }
  10. );
  11. A.precede(B, C); // A runs before B and C
  12. D.succeed(B, C); // D runs after B and C
  13. executor.run(taskflow).wait();
  14. return 0;
  15. }

Taskflow is header-only and there is no wrangle with installation.
To compile the program, clone the Taskflow project and
tell the compiler to include the headers.

  1. ~$ git clone https://github.com/taskflow/taskflow.git # clone it only once
  2. ~$ g++ -std=c++20 examples/simple.cpp -I. -O2 -pthread -o simple
  3. ~$ ./simple
  4. TaskA
  5. TaskC
  6. TaskB
  7. TaskD

Visualize Your First Taskflow Program

Taskflow comes with a built-in profiler,
TFProf,
for you to profile and visualize taskflow programs
in an easy-to-use web-based interface.

  1. # run the program with the environment variable TF_ENABLE_PROFILER enabled
  2. ~$ TF_ENABLE_PROFILER=simple.json ./simple
  3. ~$ cat simple.json
  4. [
  5. {"executor":"0","data":[{"worker":0,"level":0,"data":[{"span":[172,186],"name":"0_0","type":"static"},{"span":[187,189],"name":"0_1","type":"static"}]},{"worker":2,"level":0,"data":[{"span":[93,164],"name":"2_0","type":"static"},{"span":[170,179],"name":"2_1","type":"static"}]}]}
  6. ]
  7. # paste the profiling json data to https://taskflow.github.io/tfprof/

In addition to execution diagram, you can dump the graph to a DOT format
and visualize it using a number of free GraphViz tools.

  1. // dump the taskflow graph to a DOT format through std::cout
  2. taskflow.dump(std::cout);

Express Task Graph Parallelism

Taskflow empowers users with both static and dynamic task graph constructions
to express end-to-end parallelism in a task graph that
embeds in-graph control flow.

  1. Create a Subflow Graph
  2. Integrate Control Flow to a Task Graph
  3. Offload a Task to a GPU
  4. Compose Task Graphs
  5. Launch Asynchronous Tasks
  6. Execute a Taskflow
  7. Leverage Standard Parallel Algorithms

Create a Subflow Graph

Taskflow supports dynamic tasking for you to create a subflow
graph from the execution of a task to perform dynamic parallelism.
The following program spawns a task dependency graph parented at task B.

  1. tf::Task A = taskflow.emplace([](){}).name("A");
  2. tf::Task C = taskflow.emplace([](){}).name("C");
  3. tf::Task D = taskflow.emplace([](){}).name("D");
  4. tf::Task B = taskflow.emplace([] (tf::Subflow& subflow) {
  5. tf::Task B1 = subflow.emplace([](){}).name("B1");
  6. tf::Task B2 = subflow.emplace([](){}).name("B2");
  7. tf::Task B3 = subflow.emplace([](){}).name("B3");
  8. B3.succeed(B1, B2); // B3 runs after B1 and B2
  9. }).name("B");
  10. A.precede(B, C); // A runs before B and C
  11. D.succeed(B, C); // D runs after B and C

Integrate Control Flow to a Task Graph

Taskflow supports conditional tasking for you to make rapid
control-flow decisions across dependent tasks to implement cycles
and conditions in an end-to-end task graph.

  1. tf::Task init = taskflow.emplace([](){}).name("init");
  2. tf::Task stop = taskflow.emplace([](){}).name("stop");
  3. // creates a condition task that returns a random binary
  4. tf::Task cond = taskflow.emplace(
  5. [](){ return std::rand() % 2; }
  6. ).name("cond");
  7. init.precede(cond);
  8. // creates a feedback loop {0: cond, 1: stop}
  9. cond.precede(cond, stop);

Offload a Task to a GPU

Taskflow supports GPU tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing using CUDA.

  1. __global__ void saxpy(size_t N, float alpha, float* dx, float* dy) {
  2. int i = blockIdx.x*blockDim.x + threadIdx.x;
  3. if (i < n) {
  4. y[i] = a*x[i] + y[i];
  5. }
  6. }
  7. tf::Task cudaflow = taskflow.emplace([&](tf::cudaFlow& cf) {
  8. // data copy tasks
  9. tf::cudaTask h2d_x = cf.copy(dx, hx.data(), N).name("h2d_x");
  10. tf::cudaTask h2d_y = cf.copy(dy, hy.data(), N).name("h2d_y");
  11. tf::cudaTask d2h_x = cf.copy(hx.data(), dx, N).name("d2h_x");
  12. tf::cudaTask d2h_y = cf.copy(hy.data(), dy, N).name("d2h_y");
  13. // kernel task with parameters to launch the saxpy kernel
  14. tf::cudaTask saxpy = cf.kernel(
  15. (N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy
  16. ).name("saxpy");
  17. saxpy.succeed(h2d_x, h2d_y)
  18. .precede(d2h_x, d2h_y);
  19. }).name("cudaFlow");

Compose Task Graphs

Taskflow is composable.
You can create large parallel graphs through composition of modular
and reusable blocks that are easier to optimize at an individual scope.

  1. tf::Taskflow f1, f2;
  2. // create taskflow f1 of two tasks
  3. tf::Task f1A = f1.emplace([]() { std::cout << "Task f1A\n"; })
  4. .name("f1A");
  5. tf::Task f1B = f1.emplace([]() { std::cout << "Task f1B\n"; })
  6. .name("f1B");
  7. // create taskflow f2 with one module task composed of f1
  8. tf::Task f2A = f2.emplace([]() { std::cout << "Task f2A\n"; })
  9. .name("f2A");
  10. tf::Task f2B = f2.emplace([]() { std::cout << "Task f2B\n"; })
  11. .name("f2B");
  12. tf::Task f2C = f2.emplace([]() { std::cout << "Task f2C\n"; })
  13. .name("f2C");
  14. tf::Task f1_module_task = f2.composed_of(f1)
  15. .name("module");
  16. f1_module_task.succeed(f2A, f2B)
  17. .precede(f2C);

Launch Asynchronous Tasks

Taskflow supports asynchronous tasking.
You can launch tasks asynchronously to dynamically explore task graph parallelism.

  1. tf::Executor executor;
  2. // create asynchronous tasks directly from an executor
  3. std::future<int> future = executor.async([](){
  4. std::cout << "async task returns 1\n";
  5. return 1;
  6. });
  7. executor.silent_async([](){ std::cout << "async task does not return\n"; });
  8. // create asynchronous tasks with dynamic dependencies
  9. tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
  10. tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
  11. tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
  12. tf::AsyncTask D = executor.silent_dependent_async([](){ printf("D\n"); }, B, C);
  13. executor.wait_for_all();

Execute a Taskflow

The executor provides several thread-safe methods to run a taskflow.
You can run a taskflow once, multiple times, or until a stopping criteria is met.
These methods are non-blocking with a tf::Future<void> return
to let you query the execution status.

  1. // runs the taskflow once
  2. tf::Future<void> run_once = executor.run(taskflow);
  3. // wait on this run to finish
  4. run_once.get();
  5. // run the taskflow four times
  6. executor.run_n(taskflow, 4);
  7. // runs the taskflow five times
  8. executor.run_until(taskflow, [counter=5](){ return --counter == 0; });
  9. // block the executor until all submitted taskflows complete
  10. executor.wait_for_all();

Leverage Standard Parallel Algorithms

Taskflow defines algorithms for you to quickly express common parallel
patterns using standard C++ syntaxes,
such as parallel iterations, parallel reductions, and parallel sort.

  1. // standard parallel CPU algorithms
  2. tf::Task task1 = taskflow.for_each( // assign each element to 100 in parallel
  3. first, last, [] (auto& i) { i = 100; }
  4. );
  5. tf::Task task2 = taskflow.reduce( // reduce a range of items in parallel
  6. first, last, init, [] (auto a, auto b) { return a + b; }
  7. );
  8. tf::Task task3 = taskflow.sort( // sort a range of items in parallel
  9. first, last, [] (auto a, auto b) { return a < b; }
  10. );
  11. // standard parallel GPU algorithms
  12. tf::cudaTask cuda1 = cudaflow.for_each( // assign each element to 100 on GPU
  13. dfirst, dlast, [] __device__ (auto i) { i = 100; }
  14. );
  15. tf::cudaTask cuda2 = cudaflow.reduce( // reduce a range of items on GPU
  16. dfirst, dlast, init, [] __device__ (auto a, auto b) { return a + b; }
  17. );
  18. tf::cudaTask cuda3 = cudaflow.sort( // sort a range of items on GPU
  19. dfirst, dlast, [] __device__ (auto a, auto b) { return a < b; }
  20. );

Additionally, Taskflow provides composable graph building blocks for you to
efficiently implement common parallel algorithms, such as parallel pipeline.

  1. // create a pipeline to propagate five tokens through three serial stages
  2. tf::Pipeline pl(num_parallel_lines,
  3. tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
  4. if(pf.token() == 5) {
  5. pf.stop();
  6. }
  7. }},
  8. tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
  9. printf("stage 2: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
  10. }},
  11. tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
  12. printf("stage 3: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
  13. }}
  14. );
  15. taskflow.composed_of(pl)
  16. executor.run(taskflow).wait();

Supported Compilers

To use Taskflow, you only need a compiler that supports C++17:

  • GNU C++ Compiler at least v8.4 with -std=c++17
  • Clang C++ Compiler at least v6.0 with -std=c++17
  • Microsoft Visual Studio at least v19.27 with /std:c++17
  • AppleClang Xcode Version at least v12.0 with -std=c++17
  • Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
  • Intel C++ Compiler at least v19.0.1 with -std=c++17
  • Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20

Taskflow works on Linux, Windows, and Mac OS X.

Although %Taskflow supports primarily C++17, you can enable C++20 compilation
through -std=c++20 to achieve better performance due to new C++20 features.

Learn More about Taskflow

Visit our project website and documentation
to learn more about Taskflow. To get involved:

CppCon20 Tech Talk MUC++ Tech Talk

We are committed to support trustworthy developments for
both academic and industrial research projects in parallel
and heterogeneous computing.
If you are using Taskflow, please cite the following paper we published at 2021 IEEE TPDS:

More importantly, we appreciate all Taskflow contributors and
the following organizations for sponsoring the Taskflow project!

License

Taskflow is licensed with the MIT License.
You are completely free to re-distribute your work derived from Taskflow.