Release 3.6.0 (2023/05/07)
Taskflow 3.6.0 is the 7th release in the 3.x line! This release includes several new changes, such as dynamic task graph parallelism, improved parallel algorithms, modified GPU tasking interface, documentation, examples, and unit tests.
Download
Taskflow 3.6.0 can be downloaded from here.
System Requirements
To use Taskflow v3.6.0, you need a compiler that supports C++17:
- GNU C++ Compiler at least v8.4 with -std=c++17
- Clang C++ Compiler at least v6.0 with -std=c++17
- Microsoft Visual Studio at least v19.27 with /std:c++17
- AppleClang Xcode Version at least v12.0 with -std=c++17
- Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
- Intel C++ Compiler at least v19.0.1 with -std=c++17
- Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20
Taskflow works on Linux, Windows, and Mac OS X.
Release Summary
This release contains several changes to largely enhance the programmability of GPU tasking and standard parallel algorithms. More importantly, we have introduced a new dependent asynchronous tasking model that offers great flexibility for expressing dynamic task graph parallelism.
New Features
Taskflow Core
- Added new async methods to support dynamic task graph creation
- Added new async and join methods to tf::Runtime 
- Added a new partitioner interface to optimize parallel algorithms
- Added parallel-scan algorithms to Taskflow- tf::Taskflow:: inclusive_scan(B first, E last, D d_first, BOP bop) 
- tf::Taskflow:: inclusive_scan(B first, E last, D d_first, BOP bop, T init) 
- tf::Taskflow:: transform_inclusive_scan(B first, E last, D d_first, BOP bop, UOP uop) 
- tf::Taskflow:: transform_inclusive_scan(B first, E last, D d_first, BOP bop, UOP uop, T init) 
- tf::Taskflow:: exclusive_scan(B first, E last, D d_first, T init, BOP bop) 
- tf::Taskflow:: transform_exclusive_scan(B first, E last, D d_first, T init, BOP bop, UOP uop) 
 
- tf::
- Added parallel-find algorithms to Taskflow
- Modified tf::Subflow as a derived class from tf:: Runtime 
- Extended parallel algorithms to support different partitioning algorithms- tf::Taskflow:: for_each_index(B first, E last, S step, C callable, P&& part) 
- tf::Taskflow:: for_each(B first, E last, C callable, P&& part) 
- tf::Taskflow:: transform(B first1, E last1, O d_first, C c, P&& part) 
- tf::Taskflow:: transform(B1 first1, E1 last1, B2 first2, O d_first, C c, P&& part) 
- tf::Taskflow:: reduce(B first, E last, T& result, O bop, P&& part) 
- tf::Taskflow:: transform_reduce(B first, E last, T& result, BOP bop, UOP uop, P&& part) 
 
- tf::
- Improved the performance of tf::Taskflow:: sort for plain-old-data (POD) type 
- Extended task-parallel pipeline to handle token dependencies
cudaFlow
- removed algorithms that require buffer from tf::cudaFlow due to update limitation 
- removed support for a dedicated cudaFlow task in Taskflow- all usage of tf::cudaFlow and tf:: cudaFlowCapturer are standalone now 
 
- all usage of tf::
Utilities
- Added all_same templates to check if a parameter pack has the same type
Taskflow Profiler (TFProf)
- Removed cudaFlow and syclFlow tasks
Bug Fixes
- Fixed the compilation error caused by clashing MAX_PRIORITYwtihwinspool.h(#459)
- Fixed the compilation error caused by tf::TaskView:: for_each_successor and tf:: TaskView:: for_each_dependent 
- Fixed the infinite-loop bug when corunning a module task from tf::Runtime 
If you encounter any potential bugs, please submit an issue at issue tracker.
Breaking Changes
- Dropped support for cancelling asynchronous tasks
// previous - no longer supported tf::Future<int> fu = executor.async([](){ return 1; }); fu.cancel(); std::optional<int> res = fu.get(); // res may be std::nullopt or 1 // now - use std::future instead std::future<int> fu = executor.async([](){ return 1; }); int res = fu.get();
- Dropped in-place support for running tf::cudaFlow from a dedicated task 
// previous - no longer supported taskflow.emplace([](tf::cudaFlow& cf){ cf.offload(); }); // now - user to fully control tf::cudaFlow for maximum flexibility taskflow.emplace([](){ tf::cudaFlow cf; // offload the cudaflow asynchronously through a stream tf::cudaStream stream; cf.run(stream); // wait for the cudaflow completes stream.synchronize(); });
- Dropped in-place support for running tf::cudaFlowCapturer from a dedicated task 
// previous - now longer supported taskflow.emplace([](tf::cudaFlowCapturer& cf){ cf.offload(); }); // now - user to fully control tf::cudaFlowCapturer for maximum flexibility taskflow.emplace([](){ tf::cudaFlowCapturer cf; // offload the cudaflow asynchronously through a stream tf::cudaStream stream; cf.run(stream); // wait for the cudaflow completes stream.synchronize(); });
- Dropped in-place support for running tf::syclFlow from a dedicated task- SYCL can just be used out of box together with Taskflow
 
- Move all buffer query methods of CUDA standard algorithms inside execution policy
// previous - no longer supported tf::cuda_reduce_buffer_size<tf::cudaDefaultExecutionPolicy, int>(N); // now (and similarly for other parallel algorithms) tf::cudaDefaultExecutionPolicy policy(stream); policy.reduce_bufsz<int>(N);
- Renamed tf::Executor::run_and_wait to tf::Executor:: corun for expressiveness 
- Renamed tf::Executor::loop_until to tf::Executor:: corun_until for expressiveness 
- Renamed tf::Runtime::run_and_wait to tf::Runtime:: corun for expressiveness 
- Disabled argument support for all asynchronous tasking features- users are responsible for creating their own wrapper to make the callable
 
// previous - async allows passing arguments to the callable executor.async([](int i){ std::cout << i << std::endl; }, 4); // now - users are responsible of wrapping the arumgnets into a callable executor.async([i=4]( std::cout << i << std::endl; ){});
- Replaced named_asyncwith an overload that takes the name string on the first argument
// previous - explicitly calling named_async to assign a name to an async task executor.named_async("name", [](){}); // now - overlaod executor.async("name", [](){});
Documentation
- Revised Request Cancellation to remove support of cancelling async tasks
- Revised Asynchronous Tasking to include asynchronous tasking from tf::Runtime 
- Revised Taskflow algorithms to include execution policy
- Revised CUDA standard algorithms to correct the use of buffer query methods
- Added Task-parallel Pipeline with Token Dependencies
- Added Parallel Scan
- Added Asynchronous Tasking with Dependencies
Miscellaneous Items
We have published Taskflow in the following venues:
- Dian-Lun Lin, Yanqing Zhang, Haoxing Ren, Shih-Hsin Wang, Brucek Khailany and Tsung-Wei Huang, "GenFuzz: GPU-accelerated Hardware Fuzzing using Genetic Algorithm with Multiple Inputs," ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, 2023
- Tsung-Wei Huang, "qTask: Task-parallel Quantum Circuit Simulation with Incrementality," IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, Florida, 2023
- Elmir Dzaka, Dian-Lun Lin, and Tsung-Wei Huang, "Parallel And-Inverter Graph Simulation Using a Task-graph Computing System," IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), St. Petersburg, Florida, 2023
Please do not hesitate to contact Dr. Tsung-Wei Huang if you intend to collaborate with us on using Taskflow in your scientific computing projects.