C++并发实战-第十与十一章-总学习笔记第8
[TOC]
C++ 并发实战《C++ Concurrency in Action》的学习笔记 8, 记录第十与十一章的部分 Parallel algorithms & Testing and debugging multithreaded applications.
内容是:
第十章
- Using the C++17 parallel algorithms
第十一章 - Concurrency-related bugs
- Locating bugs through testing and code review
- Designing multithreaded tests
- Testing
Chapter 10 Parallel algorithms
10.1 Parallelizing the standard library algorithms
execution policy: is permission, not a requirement. 与单线程算法相比:
- the requirements on the algorithm complexity have changed, and are usually slacker than the requirements for the normal serial algorithm.
This is because parallel algorithms often do more total work in order to take advantage of the parallelism of the system.
10.2 Execution policies
three execution policies(<execution>
header):
std::execution::sequenced_policy
std::execution::parallel_policy
std::execution::parallel_unsequenced_policy
three corresponding policy objects to pass to the algorithms
std::execution::seq
std::execution::par
std::execution::par_unseq
注意点:
- You cannot define your own execution policies.
- You cannot rely on being able to construct objects from these policy classes yourself.
10.2.1 General effects of specifying an execution policy
affects several aspects of the behavior:
- The algorithm’s complexity
- The behavior when an exception is thrown
- Where, how, and when the steps of the algorithm are executed
EFFECTS ON ALGORITHM COMPLEXITY
两方面的复杂度增加:
- scheduling overhead of managing the parallel execution.
- perform more of the core operations of the algorithm (whether swaps, comparisons, or applications of a supplied function object).
一般而言增加的程度: if an algorithm specifies something will happen exactly some-expression times, or at most some-expression times, then the overload with an execution policy will slacken that requirement to O(some-expression).
EXCEPTIONAL BEHAVIOR
All the standard-supplied execution policies will call std::terminate if there are any uncaught exceptions. 唯一的例外: std::bad_alloc
.
This is one of the key differences between using std::execution::seq
and not providing an execution policy.
WHERE AND WHEN ALGORITHM STEPS ARE EXECUTED
The execution policy will also specify
- whether there are any ordering constraints on how the algorithm steps are run:
- whether or not they are run in any particular order,
- whether or not parts of separate algorithm steps may be interleaved with each other, or run in parallel with each other,
- and so forth.
10.2.2 std::execution::sequenced_policy
all operations be performed on the same thread + must be performed in some definite order, so they are not interleaved.(not guaranteed to be the same as that of the corresponding overload without an execution policy 例子如下.)
For example, the following call to std::for_each
will populate the vector with the numbers 1-1,000, in an unspecified order. This is in contrast to the overload without an execution policy, which will store the numbers in order:
1 | std::vector<int> v(1000); |
对于此 option:
cannot rely on the order of these operations.
10.2.3 std::execution::parallel_policy
表现:
- Operations may be performed either on the thread that invoked the algorithm, or on threads created by the library.
- Operations performed on a given thread must be performed in a definite order, and not interleaved, but the precise order is unspecified, and may vary between invocations.
- A given operation will run on a fixed thread for its entire duration.
要求:
they must not cause data races if invoked in parallel,
and must not rely on being run on the same thread as any other operation, or indeed rely on not being run on the same thread as any other operation.
不适用的条件: It’s only where there is specific ordering between elements that is required, or unsynchronized access to shared data, that is problematic. 例如 populating a vector is UB(the variable count
is modified from every invocation of the lambda, 即便 even if the library doesn’t use multiple threads for this call.).
1 | std::for_each(std::execution::par,v.begin(),v.end(), |
补救措施: by making count
an std::atomic<int>
rather than a plain int
, or by using a mutex. 但是在此例子上没什么意义: In this case, that would likely defeat the point of using the parallel execution policy, because that would serialize all the calls.
10.2.4 std::execution::parallel_unsequenced_policy
strictest requirements:
- perform the algorithm steps on unspecified threads of execution, unordered and unsequenced with respect to one another.
- may be interleaved with each other on a single thread.
- may be migrated between threads.
因此只适用于 must only operate on the relevant element, or any data that can be accessed based on that element, and must not modify any state shared between threads, or between elements.
10.3 The parallel algorithms from the C++ Standard Library
std::accumulate
没有并行版 which is strictly a serial accumulation.
并行版本可能会对 iterator 类型有要求: if the “normal” algorithm allows Input Iterators or Output Iterators, then the overloads with an execution policy require Forward Iterators instead.
并行版本对 iterator 的有效性也有要求: it means that the iterators can be freely copied around, and used equivalently. Also, the requirement that incrementing a Forward Iterator does not invalidate other copies is important.
10.3.1 Examples of using parallel algorithms
CHOICE OF EXECUTION POLICY
std::execution::par
use most often.std::execution::par_unseq
: 必须完全依赖算法的并行设计, you cannot use mutexes or atomic variables, or any of the other mechanisms to ensure that accesses from multiple threads are safe; instead, you must rely on the algorithm itself.
Listing 10.1 Parallel algorithms on a class with internal synchronization
1 | class X |
Listing 10.2 Parallel algorithms on a class without internal synchronization
1 | class Y |
这种做法消除了并行带来的好处 downside is that concurrent accesses from other threads outside the parallel algorithm invocation must now wait for the entire operation to complete.
10.3.2 Counting visits
Listing 10.3 Using transform_reduce
to count visits to pages of a website
1 |
|
Chapter 11 Testing and debugging multithreaded applications
- Concurrency-related bugs
- Locating bugs through testing and code review
- Designing multithreaded tests
- Testing
11.1 Types of concurrency-related bugs
- Unwanted blocking
- Race conditions
11.1.1 Unwanted blocking
several variations on this theme:
- Deadlock
- Livelock: The key difference with deadlock here is that the wait is not a blocking wait but an active checking loop, such as a spin lock. In serious cases, the symptoms are the same as deadlock. In not-so-serious cases, the livelock will eventually resolve because of the random scheduling.
- Blocking on I/O or other external input: even if the waited-for input is never going to come.
11.1.2 Race conditions
Not all race conditions are problematic.
race conditions often cause the following types of problems:
- Data races: unsynchronized concurrent access to a shared memory location.
- Broken invariants: dangling pointers (because another thread deleted the data being accessed), random memory corruption (due to a thread reading inconsistent values resulting from partial updates), and double-free (such as when two threads pop the same value from a queue, and so both delete some associated data), among others. can be temporal- as well as value-based.
- Lifetime issues: 可以视为上面的一种, 原因为 the thread outlives the data that it accesses.
11.2 Techniques for locating concurrency-related bugs
11.2.1 Reviewing code to locate potential bugs
It’s important that the reviewer have the time to do the review properly.
One useful technique is to try to explain how it works in detail to someone else. ==> writing detailed notes can be hugely beneficial. by asking myself these questions and thinking carefully about the answers, the problem often reveals itself.
QUESTIONS TO THINK ABOUT WHEN REVIEWING MULTITHREADED CODE
- Which data needs to be protected from concurrent access?
- How do you ensure that the data is protected?
- Where in the code could other threads be at this time?
- Which mutexes does this thread hold?
- Which mutexes might other threads hold?
- Are there any ordering requirements between the operations done in this thread and those done in another? How are those requirements enforced?
- Is the data loaded by this thread still valid? Could it have been modified by other threads?
- If you assume that another thread could be modifying the data, what would that mean and how could you ensure that this never happens?
By assuming the existence of a bug related to a particular line of code, you can then act as a detective and track down the cause.
11.2.2 Locating concurrency-related bugs by testing
难点在于 Having a potential race condition doesn’t mean the code will fail always, just that it might fail sometimes.
策略 you can best isolate the code that’s faulty if the test fails.
有时候不一定是并发的原因: Just because a bug occurs in the multithreaded portion of your application doesn’t mean it’s automatically concurrency-related. 策略: modify the code to use a single thread for the test. 其中的一个特例 if the problem goes away on a single-core system (even with multiple threads running) but is present on multicore systems or multiprocessor systems, you have a race condition and possibly a synchronization or memory-ordering issue.
testing a concurrent queue 需要考虑的测试 cases:
- One thread calling
push()
orpop()
on its own to verify that the queue works at a basic level. - One thread calling
push()
on an empty queue while another thread callspop()
. - Multiple threads calling
push()
on an empty queue - Multiple threads calling
push()
on a full queue - Multiple threads calling
pop()
on an empty queue - Multiple threads calling
pop()
on a full queue - Multiple threads calling
pop()
on a partially full queue with insufficient items for all threads - Multiple threads calling
push()
while one thread callspop()
on an empty queue - Multiple threads calling
push()
while one thread callspop()
on a full queue - Multiple threads calling
push()
while multiple threads callpop()
on an empty queue - Multiple threads calling push() while multiple threads call
pop()
on a full queue
additional factors about the test environment:
- What you mean by “multiple threads” in each case (3, 4, 1,024?).
- Whether there are enough processing cores in the system for each thread to run on its own core.
- Which processor architectures the tests should be run on.
- How you ensure suitable scheduling for the “while” parts of your tests.
11.2.3 Designing for testability
code is easier to test if the following factors apply:
- The responsibilities of each function and class are clear.
- The functions are short and to the point.
- Your tests can take complete control of the environment surrounding the code being tested.
- The code that performs the particular operation being tested is close together rather than spread throughout the system.
- You thought about how to test the code before you wrote it.
一种特殊场景 One thing to watch out for is that library calls can use internal variables to store state, which then becomes shared if multiple threads use the same set of library calls. This can be a problem because it’s not immediately apparent that the code accesses shared data.
11.2.4 Multithreaded testing techniques
BRUTE-FORCE TESTING
running the code many times, possibly with many threads running at once.
对于大型 code, if the code being tested is considerably larger, the number of possible scheduling permutations is so vast that even a billion test runs might yield a low level of confidence.
缺点: might give you false confidence
例如 If the way you’ve written the test means that the problematic circumstances can’t occur, you can run the test as many times as you like and it won’t fail. 具体地 The classic example here is testing a multithreaded application on a single-processor system.
对于测试的要求 test design:
- choice of unit for the code being tested
- the design of the test harness
- the choice of testing environment
COMBINATION SIMULATION TESTING
run your code with a special piece of software that simulates the real runtime environment of the code. where the characteristics of the virtual machine and its hardware are emulated by the supervisor software. ==> guaranteed to find all the problems. 但是缺点:
- the number of combinations increases exponentially with the number of threads and the number of operations performed by each thread.
- relies on the availability of simulation software.
DETECTING PROBLEMS EXPOSED BY TESTS WITH A SPECIAL LIBRARY
通过库里的组件的表现来推断问题. 甚至某些 implementation 还提供了专门供 debug 的接口: such as mutexes and condition variables give the test writer control over which thread gets the lock when multiple threads are waiting or which thread is notified by a notify_one()
call on a condition variable.
11.2.5 Structuring multithreaded test code
拆分 identify the distinct parts of each test:
- The general setup code that must be executed before anything else.
- The thread-specific setup code that must run on each thread.
- The code for each thread that you want to run concurrently.
- The code to be run after the concurrent execution has finished, possibly including assertions on the state of the code.
以一个场景为例子: one thread calling push()
on an empty queue while another thread calls pop()
.
Listing 11.1 An example test for concurrent push()
and pop()
calls on a queue
1 | void test_concurrent_push_and_pop_on_empty_queue() |
通过一个开关 go
future 来控制 2 个测试线程的开始.
11.2.6 Testing the performance of multithreaded code
测试之前要明白想要的效果: It’s therefore worth looking at the overall design of the code before you start testing, so you know whether you’re hoping for a factor-of-24 improvement, or whether the serial portion of your code means you’re limited to a maximum of a factor of 3.
最好是画出 scalability 图: it’s best to check the performance on systems with as many different configurations as possible, so you get a picture of the scalability graph.
C++并发实战-第十与十一章-总学习笔记第8
https://www.chuxin911.com/Notes_On_C++_Concurrency_in_Action_8_20221023/