C++并发实战-第十与十一章-总学习笔记第8

C++并发实战-第十与十一章-总学习笔记第8

[TOC]
C++ 并发实战《C++ Concurrency in Action》的学习笔记 8, 记录第十与十一章的部分 Parallel algorithms & Testing and debugging multithreaded applications.
内容是:
第十章

  • Using the C++17 parallel algorithms
    第十一章
  • Concurrency-related bugs
  • Locating bugs through testing and code review
  • Designing multithreaded tests
  • Testing

Chapter 10 Parallel algorithms

10.1 Parallelizing the standard library algorithms

execution policy: is permission, not a requirement. 与单线程算法相比:

  • the requirements on the algorithm complexity have changed, and are usually slacker than the requirements for the normal serial algorithm.
    This is because parallel algorithms often do more total work in order to take advantage of the parallelism of the system.

10.2 Execution policies

three execution policies(<execution> header):

  • std::execution::sequenced_policy
  • std::execution::parallel_policy
  • std::execution::parallel_unsequenced_policy

three corresponding policy objects to pass to the algorithms

  • std::execution::seq
  • std::execution::par
  • std::execution::par_unseq

注意点:

  • You cannot define your own execution policies.
  • You cannot rely on being able to construct objects from these policy classes yourself.

10.2.1 General effects of specifying an execution policy

affects several aspects of the behavior:

  • The algorithm’s complexity
  • The behavior when an exception is thrown
  • Where, how, and when the steps of the algorithm are executed

EFFECTS ON ALGORITHM COMPLEXITY

两方面的复杂度增加:

  1. scheduling overhead of managing the parallel execution.
  2. perform more of the core operations of the algorithm (whether swaps, comparisons, or applications of a supplied function object).

一般而言增加的程度: if an algorithm specifies something will happen exactly some-expression times, or at most some-expression times, then the overload with an execution policy will slacken that requirement to O(some-expression).

EXCEPTIONAL BEHAVIOR

All the standard-supplied execution policies will call std::terminate if there are any uncaught exceptions. 唯一的例外: std::bad_alloc.

This is one of the key differences between using std::execution::seq and not providing an execution policy.

WHERE AND WHEN ALGORITHM STEPS ARE EXECUTED

The execution policy will also specify

  • whether there are any ordering constraints on how the algorithm steps are run:
  • whether or not they are run in any particular order,
  • whether or not parts of separate algorithm steps may be interleaved with each other, or run in parallel with each other,
  • and so forth.

10.2.2 std::execution::sequenced_policy

all operations be performed on the same thread + must be performed in some definite order, so they are not interleaved.(not guaranteed to be the same as that of the corresponding overload without an execution policy 例子如下.)

For example, the following call to std::for_each will populate the vector with the numbers 1-1,000, in an unspecified order. This is in contrast to the overload without an execution policy, which will store the numbers in order:

1
2
3
4
std::vector<int> v(1000);
int count=0;
std::for_each(std::execution::seq,v.begin(),v.end(),
[&](int& x){ x=++count; });

对于此 option:

cannot rely on the order of these operations.

10.2.3 std::execution::parallel_policy

表现:

  • Operations may be performed either on the thread that invoked the algorithm, or on threads created by the library.
  • Operations performed on a given thread must be performed in a definite order, and not interleaved, but the precise order is unspecified, and may vary between invocations.
  • A given operation will run on a fixed thread for its entire duration.

要求:

they must not cause data races if invoked in parallel,
and must not rely on being run on the same thread as any other operation, or indeed rely on not being run on the same thread as any other operation.

不适用的条件: It’s only where there is specific ordering between elements that is required, or unsynchronized access to shared data, that is problematic. 例如 populating a vector is UB(the variable count is modified from every invocation of the lambda, 即便 even if the library doesn’t use multiple threads for this call.).

1
2
std::for_each(std::execution::par,v.begin(),v.end(),
[&](int& x){ x=++count; });

补救措施: by making count an std::atomic<int> rather than a plain int, or by using a mutex. 但是在此例子上没什么意义: In this case, that would likely defeat the point of using the parallel execution policy, because that would serialize all the calls.

10.2.4 std::execution::parallel_unsequenced_policy

strictest requirements:

  • perform the algorithm steps on unspecified threads of execution, unordered and unsequenced with respect to one another.
    • may be interleaved with each other on a single thread.
    • may be migrated between threads.

因此只适用于 must only operate on the relevant element, or any data that can be accessed based on that element, and must not modify any state shared between threads, or between elements.

10.3 The parallel algorithms from the C++ Standard Library

std::accumulate 没有并行版 which is strictly a serial accumulation.

并行版本可能会对 iterator 类型有要求: if the “normal” algorithm allows Input Iterators or Output Iterators, then the overloads with an execution policy require Forward Iterators instead.

并行版本对 iterator 的有效性也有要求: it means that the iterators can be freely copied around, and used equivalently. Also, the requirement that incrementing a Forward Iterator does not invalidate other copies is important.

10.3.1 Examples of using parallel algorithms

CHOICE OF EXECUTION POLICY

std::execution::par use most often.
std::execution::par_unseq: 必须完全依赖算法的并行设计, you cannot use mutexes or atomic variables, or any of the other mechanisms to ensure that accesses from multiple threads are safe; instead, you must rely on the algorithm itself.

Listing 10.1 Parallel algorithms on a class with internal synchronization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class X
{
mutable std::mutex m;
int data;

public:
X() : data(0) {}
int get_value() const
{
std::lock_guard guard(m);
return data;
}
void increment()
{
std::lock_guard guard(m);
++data;
}
};
void increment_all(std::vector<X> &v)
{
std::for_each(std::execution::par, v.begin(), v.end(),
[](X &x)
{
x.increment();
});
}

Listing 10.2 Parallel algorithms on a class without internal synchronization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class Y
{
int data;

public:
Y() : data(0) {}
int get_value() const
{
return data;
}
void increment()
{
++data;
}
};
class ProtectedY
{
std::mutex m;
std::vector<Y> v;

public:
void lock()
{
m.lock();
}
void unlock()
{
m.unlock();
}
std::vector<Y> &get_vec()
{
return v;
}
};
void increment_all(ProtectedY &data)
{
std::lock_guard guard(data);
auto &v = data.get_vec();
std::for_each(std::execution::par_unseq, v.begin(), v.end(),
[](Y &y)
{
y.increment();
});
}

这种做法消除了并行带来的好处 downside is that concurrent accesses from other threads outside the parallel algorithm invocation must now wait for the entire operation to complete.

10.3.2 Counting visits

Listing 10.3 Using transform_reduce to count visits to pages of a website

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <vector>
#include <string>
#include <unordered_map>
#include <numeric>
struct log_info
{
std::string page;
time_t visit_time;
std::string browser;
// any other fields
};

extern log_info parse_log_line(std::string const &line);

using visit_map_type = std::unordered_map<std::string, unsigned long long>;

// this function is wrapper around a call to std::transform_reduce.
visit_map_type count_visits_per_page(std::vector<std::string> const &log_lines)
{
struct combine_visits
{
visit_map_type operator()(visit_map_type lhs, visit_map_type rhs) const
{
if (lhs.size() < rhs.size())
std::swap(lhs, rhs);
for (auto const &entry : rhs)
{
lhs[entry.first] += entry.second;
}
return lhs;
}
visit_map_type operator()(log_info log, visit_map_type map) const
{
++map[log.page];
return map;
}
visit_map_type operator()(visit_map_type map, log_info log) const
{
++map[log.page];
return map;
}
visit_map_type operator()(log_info log1, log_info log2) const
{
visit_map_type map;
++map[log1.page];
++map[log2.page];
return map;
}
};
return std::transform_reduce(
std::execution::par, log_lines.begin(), log_lines.end(),
visit_map_type(), combine_visits(), parse_log_line);
}

Chapter 11 Testing and debugging multithreaded applications

  • Concurrency-related bugs
  • Locating bugs through testing and code review
  • Designing multithreaded tests
  • Testing
  • Unwanted blocking
  • Race conditions

11.1.1 Unwanted blocking

several variations on this theme:

  • Deadlock
  • Livelock: The key difference with deadlock here is that the wait is not a blocking wait but an active checking loop, such as a spin lock. In serious cases, the symptoms are the same as deadlock. In not-so-serious cases, the livelock will eventually resolve because of the random scheduling.
  • Blocking on I/O or other external input: even if the waited-for input is never going to come.

11.1.2 Race conditions

Not all race conditions are problematic.

race conditions often cause the following types of problems:

  • Data races: unsynchronized concurrent access to a shared memory location.
  • Broken invariants: dangling pointers (because another thread deleted the data being accessed), random memory corruption (due to a thread reading inconsistent values resulting from partial updates), and double-free (such as when two threads pop the same value from a queue, and so both delete some associated data), among others. can be temporal- as well as value-based.
  • Lifetime issues: 可以视为上面的一种, 原因为 the thread outlives the data that it accesses.

11.2.1 Reviewing code to locate potential bugs

It’s important that the reviewer have the time to do the review properly.

One useful technique is to try to explain how it works in detail to someone else. ==> writing detailed notes can be hugely beneficial. by asking myself these questions and thinking carefully about the answers, the problem often reveals itself.

QUESTIONS TO THINK ABOUT WHEN REVIEWING MULTITHREADED CODE

  • Which data needs to be protected from concurrent access?
  • How do you ensure that the data is protected?
  • Where in the code could other threads be at this time?
  • Which mutexes does this thread hold?
  • Which mutexes might other threads hold?
  • Are there any ordering requirements between the operations done in this thread and those done in another? How are those requirements enforced?
  • Is the data loaded by this thread still valid? Could it have been modified by other threads?
  • If you assume that another thread could be modifying the data, what would that mean and how could you ensure that this never happens?

By assuming the existence of a bug related to a particular line of code, you can then act as a detective and track down the cause.

难点在于 Having a potential race condition doesn’t mean the code will fail always, just that it might fail sometimes.

策略 you can best isolate the code that’s faulty if the test fails.

有时候不一定是并发的原因: Just because a bug occurs in the multithreaded portion of your application doesn’t mean it’s automatically concurrency-related. 策略: modify the code to use a single thread for the test. 其中的一个特例 if the problem goes away on a single-core system (even with multiple threads running) but is present on multicore systems or multiprocessor systems, you have a race condition and possibly a synchronization or memory-ordering issue.

testing a concurrent queue 需要考虑的测试 cases:

  • One thread calling push() or pop() on its own to verify that the queue works at a basic level.
  • One thread calling push() on an empty queue while another thread calls pop().
  • Multiple threads calling push() on an empty queue
  • Multiple threads calling push() on a full queue
  • Multiple threads calling pop() on an empty queue
  • Multiple threads calling pop() on a full queue
  • Multiple threads calling pop() on a partially full queue with insufficient items for all threads
  • Multiple threads calling push() while one thread calls pop() on an empty queue
  • Multiple threads calling push() while one thread calls pop() on a full queue
  • Multiple threads calling push() while multiple threads call pop() on an empty queue
  • Multiple threads calling push() while multiple threads call pop() on a full queue

additional factors about the test environment:

  • What you mean by “multiple threads” in each case (3, 4, 1,024?).
  • Whether there are enough processing cores in the system for each thread to run on its own core.
  • Which processor architectures the tests should be run on.
  • How you ensure suitable scheduling for the “while” parts of your tests.

11.2.3 Designing for testability

code is easier to test if the following factors apply:

  • The responsibilities of each function and class are clear.
  • The functions are short and to the point.
  • Your tests can take complete control of the environment surrounding the code being tested.
  • The code that performs the particular operation being tested is close together rather than spread throughout the system.
  • You thought about how to test the code before you wrote it.

一种特殊场景 One thing to watch out for is that library calls can use internal variables to store state, which then becomes shared if multiple threads use the same set of library calls. This can be a problem because it’s not immediately apparent that the code accesses shared data.

11.2.4 Multithreaded testing techniques

BRUTE-FORCE TESTING

running the code many times, possibly with many threads running at once.

对于大型 code, if the code being tested is considerably larger, the number of possible scheduling permutations is so vast that even a billion test runs might yield a low level of confidence.

缺点: might give you false confidence
例如 If the way you’ve written the test means that the problematic circumstances can’t occur, you can run the test as many times as you like and it won’t fail. 具体地 The classic example here is testing a multithreaded application on a single-processor system.

对于测试的要求 test design:

  • choice of unit for the code being tested
  • the design of the test harness
  • the choice of testing environment

COMBINATION SIMULATION TESTING

run your code with a special piece of software that simulates the real runtime environment of the code. where the characteristics of the virtual machine and its hardware are emulated by the supervisor software. ==> guaranteed to find all the problems. 但是缺点:

  • the number of combinations increases exponentially with the number of threads and the number of operations performed by each thread.
  • relies on the availability of simulation software.

DETECTING PROBLEMS EXPOSED BY TESTS WITH A SPECIAL LIBRARY

通过库里的组件的表现来推断问题. 甚至某些 implementation 还提供了专门供 debug 的接口: such as mutexes and condition variables give the test writer control over which thread gets the lock when multiple threads are waiting or which thread is notified by a notify_one() call on a condition variable.

11.2.5 Structuring multithreaded test code

拆分 identify the distinct parts of each test:

  • The general setup code that must be executed before anything else.
  • The thread-specific setup code that must run on each thread.
  • The code for each thread that you want to run concurrently.
  • The code to be run after the concurrent execution has finished, possibly including assertions on the state of the code.

以一个场景为例子: one thread calling push() on an empty queue while another thread calls pop().

Listing 11.1 An example test for concurrent push() and pop() calls on a queue

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
void test_concurrent_push_and_pop_on_empty_queue()
{
threadsafe_queue<int> q; //create your empty queue as part of the general setup
std::promise<void> go, push_ready, pop_ready;
std::shared_future<void> ready(go.get_future());
std::future<void> push_done;//indicate that the threads have finished
std::future<int> pop_done;//indicate that the threads have finished
try
{
push_done = std::async(std::launch::async,
[&q, ready, &push_ready]()
{
push_ready.set_value();
ready.wait(); // each task sets its own ready signal and then waits for the general ready signal before running the test code.
q.push(42);
});
pop_done = std::async(std::launch::async,
[&q, ready, &pop_ready]()
{
pop_ready.set_value();
ready.wait(); // each task sets its own ready signal and then waits for the general ready signal before running the test code.
return q.pop();
});
push_ready.get_future().wait(); //main thread waits for the signals from both threads
pop_ready.get_future().wait(); //main thread waits for the signals from both threads
go.set_value(); //start the real test
push_done.get();
assert(pop_done.get()==42);
assert(q.empty());
}
catch (...)
{
go.set_value();// If an exception is thrown, you set the go signal to avoid any chance of a dangling thread and rethrow the exception
throw;
}
}

通过一个开关 go future 来控制 2 个测试线程的开始.

11.2.6 Testing the performance of multithreaded code

测试之前要明白想要的效果: It’s therefore worth looking at the overall design of the code before you start testing, so you know whether you’re hoping for a factor-of-24 improvement, or whether the serial portion of your code means you’re limited to a maximum of a factor of 3.

最好是画出 scalability 图: it’s best to check the performance on systems with as many different configurations as possible, so you get a picture of the scalability graph.

C++并发实战-第十与十一章-总学习笔记第8

https://www.chuxin911.com/Notes_On_C++_Concurrency_in_Action_8_20221023/

作者

cx

发布于

2022-10-23

更新于

2022-10-25

许可协议