2022-10-23发表2022-10-25更新 cx C++20 分钟读完 (大约2971个字)0次访问

C++并发实战-第十与十一章-总学习笔记第8

[TOC]
C++ 并发实战《C++ Concurrency in Action》的学习笔记 8, 记录第十与十一章的部分 Parallel algorithms & Testing and debugging multithreaded applications.
内容是:
第十章

Using the C++17 parallel algorithms
第十一章
Concurrency-related bugs
Locating bugs through testing and code review
Designing multithreaded tests
Testing

Chapter 10 Parallel algorithms

10.1 Parallelizing the standard library algorithms

execution policy: is permission, not a requirement. 与单线程算法相比:

the requirements on the algorithm complexity have changed, and are usually slacker than the requirements for the normal serial algorithm.
This is because parallel algorithms often do more total work in order to take advantage of the parallelism of the system.

10.2 Execution policies

three execution policies(<execution> header):

std::execution::sequenced_policy
std::execution::parallel_policy
std::execution::parallel_unsequenced_policy

three corresponding policy objects to pass to the algorithms

std::execution::seq
std::execution::par
std::execution::par_unseq

注意点:

You cannot define your own execution policies.
You cannot rely on being able to construct objects from these policy classes yourself.

10.2.1 General effects of specifying an execution policy

affects several aspects of the behavior:

The algorithm’s complexity
The behavior when an exception is thrown
Where, how, and when the steps of the algorithm are executed

EFFECTS ON ALGORITHM COMPLEXITY

两方面的复杂度增加:

scheduling overhead of managing the parallel execution.
perform more of the core operations of the algorithm (whether swaps, comparisons, or applications of a supplied function object).

一般而言增加的程度: if an algorithm specifies something will happen exactly some-expression times, or at most some-expression times, then the overload with an execution policy will slacken that requirement to O(some-expression).

EXCEPTIONAL BEHAVIOR

All the standard-supplied execution policies will call std::terminate if there are any uncaught exceptions. 唯一的例外: std::bad_alloc.

This is one of the key differences between using std::execution::seq and not providing an execution policy.

WHERE AND WHEN ALGORITHM STEPS ARE EXECUTED

The execution policy will also specify

whether there are any ordering constraints on how the algorithm steps are run:
whether or not they are run in any particular order,
whether or not parts of separate algorithm steps may be interleaved with each other, or run in parallel with each other,
and so forth.

10.2.2 `std::execution::sequenced_policy`

all operations be performed on the same thread + must be performed in some definite order, so they are not interleaved.(not guaranteed to be the same as that of the corresponding overload without an execution policy 例子如下.)

For example, the following call to std::for_each will populate the vector with the numbers 1-1,000, in an unspecified order. This is in contrast to the overload without an execution policy, which will store the numbers in order:

std::vector<int> v(1000);
int count=0;
std::for_each(std::execution::seq,v.begin(),v.end(),
            [&](int& x){ x=++count; });

对于此 option:

cannot rely on the order of these operations.

10.2.3 `std::execution::parallel_policy`

表现:

Operations may be performed either on the thread that invoked the algorithm, or on threads created by the library.
Operations performed on a given thread must be performed in a definite order, and not interleaved, but the precise order is unspecified, and may vary between invocations.
A given operation will run on a fixed thread for its entire duration.

要求:

they must not cause data races if invoked in parallel,
and must not rely on being run on the same thread as any other operation, or indeed rely on not being run on the same thread as any other operation.

不适用的条件: It’s only where there is specific ordering between elements that is required, or unsynchronized access to shared data, that is problematic. 例如 populating a vector is UB(the variable count is modified from every invocation of the lambda, 即便 even if the library doesn’t use multiple threads for this call.).

1 2	std::for_each(std::execution::par,v.begin(),v.end(), [&](int& x){ x=++count; });

补救措施: by making count an std::atomic<int> rather than a plain int, or by using a mutex. 但是在此例子上没什么意义: In this case, that would likely defeat the point of using the parallel execution policy, because that would serialize all the calls.

10.2.4 `std::execution::parallel_unsequenced_policy`

strictest requirements:

perform the algorithm steps on unspecified threads of execution, unordered and unsequenced with respect to one another.
- may be interleaved with each other on a single thread.
- may be migrated between threads.

因此只适用于 must only operate on the relevant element, or any data that can be accessed based on that element, and must not modify any state shared between threads, or between elements.

10.3 The parallel algorithms from the C++ Standard Library

std::accumulate 没有并行版 which is strictly a serial accumulation.

并行版本可能会对 iterator 类型有要求: if the “normal” algorithm allows Input Iterators or Output Iterators, then the overloads with an execution policy require Forward Iterators instead.

并行版本对 iterator 的有效性也有要求: it means that the iterators can be freely copied around, and used equivalently. Also, the requirement that incrementing a Forward Iterator does not invalidate other copies is important.

10.3.1 Examples of using parallel algorithms

CHOICE OF EXECUTION POLICY

std::execution::par use most often.
std::execution::par_unseq: 必须完全依赖算法的并行设计, you cannot use mutexes or atomic variables, or any of the other mechanisms to ensure that accesses from multiple threads are safe; instead, you must rely on the algorithm itself.

Listing 10.1 Parallel algorithms on a class with internal synchronization

class X
{
    mutable std::mutex m;
    int data;

public:
    X() : data(0) {}
    int get_value() const
    {
        std::lock_guard guard(m);
        return data;
    }
    void increment()
    {
        std::lock_guard guard(m);
        ++data;
    }
};
void increment_all(std::vector<X> &v)
{
    std::for_each(std::execution::par, v.begin(), v.end(),
                  [](X &x)
                  {
                      x.increment();
                  });
}

Listing 10.2 Parallel algorithms on a class without internal synchronization

class Y
{
    int data;

public:
    Y() : data(0) {}
    int get_value() const
    {
        return data;
    }
    void increment()
    {
        ++data;
    }
};
class ProtectedY
{
    std::mutex m;
    std::vector<Y> v;

public:
    void lock()
    {
        m.lock();
    }
    void unlock()
    {
        m.unlock();
    }
    std::vector<Y> &get_vec()
    {
        return v;
    }
};
void increment_all(ProtectedY &data)
{
    std::lock_guard guard(data);
    auto &v = data.get_vec();
    std::for_each(std::execution::par_unseq, v.begin(), v.end(),
                  [](Y &y)
                  {
                      y.increment();
                  });
}

这种做法消除了并行带来的好处 downside is that concurrent accesses from other threads outside the parallel algorithm invocation must now wait for the entire operation to complete.

10.3.2 Counting visits

Listing 10.3 Using transform_reduce to count visits to pages of a website

#include <vector>
#include <string>
#include <unordered_map>
#include <numeric>
struct log_info
{
    std::string page;
    time_t visit_time;
    std::string browser;
    // any other fields
};

extern log_info parse_log_line(std::string const &line);

using visit_map_type = std::unordered_map<std::string, unsigned long long>;

// this function is wrapper around a call to std::transform_reduce.
visit_map_type count_visits_per_page(std::vector<std::string> const &log_lines)
{
    struct combine_visits
    {
        visit_map_type operator()(visit_map_type lhs, visit_map_type rhs) const
        {
            if (lhs.size() < rhs.size())
                std::swap(lhs, rhs);
            for (auto const &entry : rhs)
            {
                lhs[entry.first] += entry.second;
            }
            return lhs;
        }
        visit_map_type operator()(log_info log, visit_map_type map) const
        {
            ++map[log.page];
            return map;
        }
        visit_map_type operator()(visit_map_type map, log_info log) const
        {
            ++map[log.page];
            return map;
        }
        visit_map_type operator()(log_info log1, log_info log2) const
        {
            visit_map_type map;
            ++map[log1.page];
            ++map[log2.page];
            return map;
        }
    };
    return std::transform_reduce(
        std::execution::par, log_lines.begin(), log_lines.end(),
        visit_map_type(), combine_visits(), parse_log_line);
}

Chapter 11 Testing and debugging multithreaded applications

Concurrency-related bugs
Locating bugs through testing and code review
Designing multithreaded tests
Testing

Unwanted blocking
Race conditions

11.1.1 Unwanted blocking

several variations on this theme:

Deadlock
Livelock: The key difference with deadlock here is that the wait is not a blocking wait but an active checking loop, such as a spin lock. In serious cases, the symptoms are the same as deadlock. In not-so-serious cases, the livelock will eventually resolve because of the random scheduling.
Blocking on I/O or other external input: even if the waited-for input is never going to come.

11.1.2 Race conditions

Not all race conditions are problematic.

race conditions often cause the following types of problems:

Data races: unsynchronized concurrent access to a shared memory location.
Broken invariants: dangling pointers (because another thread deleted the data being accessed), random memory corruption (due to a thread reading inconsistent values resulting from partial updates), and double-free (such as when two threads pop the same value from a queue, and so both delete some associated data), among others. can be temporal- as well as value-based.
Lifetime issues: 可以视为上面的一种, 原因为 the thread outlives the data that it accesses.

11.2.1 Reviewing code to locate potential bugs

It’s important that the reviewer have the time to do the review properly.

One useful technique is to try to explain how it works in detail to someone else. ==> writing detailed notes can be hugely beneficial. by asking myself these questions and thinking carefully about the answers, the problem often reveals itself.

QUESTIONS TO THINK ABOUT WHEN REVIEWING MULTITHREADED CODE

Which data needs to be protected from concurrent access?
How do you ensure that the data is protected?
Where in the code could other threads be at this time?
Which mutexes does this thread hold?
Which mutexes might other threads hold?
Are there any ordering requirements between the operations done in this thread and those done in another? How are those requirements enforced?
Is the data loaded by this thread still valid? Could it have been modified by other threads?
If you assume that another thread could be modifying the data, what would that mean and how could you ensure that this never happens?

By assuming the existence of a bug related to a particular line of code, you can then act as a detective and track down the cause.

难点在于 Having a potential race condition doesn’t mean the code will fail always, just that it might fail sometimes.

策略 you can best isolate the code that’s faulty if the test fails.

有时候不一定是并发的原因: Just because a bug occurs in the multithreaded portion of your application doesn’t mean it’s automatically concurrency-related. 策略: modify the code to use a single thread for the test. 其中的一个特例 if the problem goes away on a single-core system (even with multiple threads running) but is present on multicore systems or multiprocessor systems, you have a race condition and possibly a synchronization or memory-ordering issue.

testing a concurrent queue 需要考虑的测试 cases:

One thread calling push() or pop() on its own to verify that the queue works at a basic level.
One thread calling push() on an empty queue while another thread calls pop().
Multiple threads calling push() on an empty queue
Multiple threads calling push() on a full queue
Multiple threads calling pop() on an empty queue
Multiple threads calling pop() on a full queue
Multiple threads calling pop() on a partially full queue with insufficient items for all threads
Multiple threads calling push() while one thread calls pop() on an empty queue
Multiple threads calling push() while one thread calls pop() on a full queue
Multiple threads calling push() while multiple threads call pop() on an empty queue
Multiple threads calling push() while multiple threads call pop() on a full queue

additional factors about the test environment:

What you mean by “multiple threads” in each case (3, 4, 1,024?).
Whether there are enough processing cores in the system for each thread to run on its own core.
Which processor architectures the tests should be run on.
How you ensure suitable scheduling for the “while” parts of your tests.

11.2.3 Designing for testability

code is easier to test if the following factors apply:

The responsibilities of each function and class are clear.
The functions are short and to the point.
Your tests can take complete control of the environment surrounding the code being tested.
The code that performs the particular operation being tested is close together rather than spread throughout the system.
You thought about how to test the code before you wrote it.

一种特殊场景 One thing to watch out for is that library calls can use internal variables to store state, which then becomes shared if multiple threads use the same set of library calls. This can be a problem because it’s not immediately apparent that the code accesses shared data.

11.2.4 Multithreaded testing techniques

BRUTE-FORCE TESTING

running the code many times, possibly with many threads running at once.

对于大型 code, if the code being tested is considerably larger, the number of possible scheduling permutations is so vast that even a billion test runs might yield a low level of confidence.

缺点: might give you false confidence
例如 If the way you’ve written the test means that the problematic circumstances can’t occur, you can run the test as many times as you like and it won’t fail. 具体地 The classic example here is testing a multithreaded application on a single-processor system.

对于测试的要求 test design:

choice of unit for the code being tested
the design of the test harness
the choice of testing environment

COMBINATION SIMULATION TESTING

run your code with a special piece of software that simulates the real runtime environment of the code. where the characteristics of the virtual machine and its hardware are emulated by the supervisor software. ==> guaranteed to find all the problems. 但是缺点:

the number of combinations increases exponentially with the number of threads and the number of operations performed by each thread.
relies on the availability of simulation software.

DETECTING PROBLEMS EXPOSED BY TESTS WITH A SPECIAL LIBRARY

通过库里的组件的表现来推断问题. 甚至某些 implementation 还提供了专门供 debug 的接口: such as mutexes and condition variables give the test writer control over which thread gets the lock when multiple threads are waiting or which thread is notified by a notify_one() call on a condition variable.

11.2.5 Structuring multithreaded test code

拆分 identify the distinct parts of each test:

The general setup code that must be executed before anything else.
The thread-specific setup code that must run on each thread.
The code for each thread that you want to run concurrently.
The code to be run after the concurrent execution has finished, possibly including assertions on the state of the code.

以一个场景为例子: one thread calling push() on an empty queue while another thread calls pop().

Listing 11.1 An example test for concurrent push() and pop() calls on a queue

void test_concurrent_push_and_pop_on_empty_queue()
{
    threadsafe_queue<int> q; //create your empty queue as part of the general setup
    std::promise<void> go, push_ready, pop_ready;
    std::shared_future<void> ready(go.get_future()); 
    std::future<void> push_done;//indicate that the threads have finished
    std::future<int> pop_done;//indicate that the threads have finished
    try
    {
        push_done = std::async(std::launch::async,
                               [&q, ready, &push_ready]()
                               {
                                   push_ready.set_value();
                                   ready.wait(); // each task sets its own ready signal and then waits for the general ready signal before running the test code.
                                   q.push(42);
                               });
        pop_done = std::async(std::launch::async,
                              [&q, ready, &pop_ready]()
                              {
                                  pop_ready.set_value();
                                  ready.wait(); // each task sets its own ready signal and then waits for the general ready signal before running the test code.
                                  return q.pop();
                              });
        push_ready.get_future().wait(); //main thread waits for the signals from both threads
        pop_ready.get_future().wait(); //main thread waits for the signals from both threads
        go.set_value(); //start the real test
        push_done.get();
        assert(pop_done.get()==42);
        assert(q.empty());
    }
    catch (...)
    {
        go.set_value();// If an exception is thrown, you set the go signal to avoid any chance of a dangling thread and rethrow the exception
        throw;
    }
}

通过一个开关 go future 来控制 2 个测试线程的开始.

11.2.6 Testing the performance of multithreaded code

测试之前要明白想要的效果: It’s therefore worth looking at the overall design of the code before you start testing, so you know whether you’re hoping for a factor-of-24 improvement, or whether the serial portion of your code means you’re limited to a maximum of a factor of 3.

最好是画出 scalability 图: it’s best to check the performance on systems with as many different configurations as possible, so you get a picture of the scalability graph.

C++并发实战-第十与十一章-总学习笔记第8

https://www.chuxin911.com/Notes_On_C++_Concurrency_in_Action_8_20221023/

作者

发布于

2022-10-23

更新于

2022-10-25

许可协议

#C++Learning_Note Parallel_Programming

C++并发实战-第十与十一章-总学习笔记第8

Chapter 10 Parallel algorithms

10.1 Parallelizing the standard library algorithms

10.2 Execution policies

10.2.1 General effects of specifying an execution policy

EFFECTS ON ALGORITHM COMPLEXITY

EXCEPTIONAL BEHAVIOR

WHERE AND WHEN ALGORITHM STEPS ARE EXECUTED

10.2.2 `std::execution::sequenced_policy`

10.2.3 `std::execution::parallel_policy`

10.2.4 `std::execution::parallel_unsequenced_policy`

10.3 The parallel algorithms from the C++ Standard Library

10.3.1 Examples of using parallel algorithms

CHOICE OF EXECUTION POLICY

10.3.2 Counting visits

Chapter 11 Testing and debugging multithreaded applications

11.1.1 Unwanted blocking

11.1.2 Race conditions

11.2.1 Reviewing code to locate potential bugs

QUESTIONS TO THINK ABOUT WHEN REVIEWING MULTITHREADED CODE

11.2.3 Designing for testability

11.2.4 Multithreaded testing techniques

BRUTE-FORCE TESTING

COMBINATION SIMULATION TESTING

DETECTING PROBLEMS EXPOSED BY TESTS WITH A SPECIAL LIBRARY

11.2.5 Structuring multithreaded test code

11.2.6 Testing the performance of multithreaded code

作者

发布于

更新于

许可协议

目录

分类

最新文章

目录

标签

订阅更新

C++并发实战-第十与十一章-总学习笔记第8

Chapter 10 Parallel algorithms

10.1 Parallelizing the standard library algorithms

10.2 Execution policies

10.2.1 General effects of specifying an execution policy

EFFECTS ON ALGORITHM COMPLEXITY

EXCEPTIONAL BEHAVIOR

WHERE AND WHEN ALGORITHM STEPS ARE EXECUTED

10.2.2 std::execution::sequenced_policy

10.2.3 std::execution::parallel_policy

10.2.4 std::execution::parallel_unsequenced_policy

10.3 The parallel algorithms from the C++ Standard Library

10.3.1 Examples of using parallel algorithms

CHOICE OF EXECUTION POLICY

10.3.2 Counting visits

Chapter 11 Testing and debugging multithreaded applications

11.1 Types of concurrency-related bugs

11.1.1 Unwanted blocking

11.1.2 Race conditions

11.2 Techniques for locating concurrency-related bugs

11.2.1 Reviewing code to locate potential bugs

QUESTIONS TO THINK ABOUT WHEN REVIEWING MULTITHREADED CODE

11.2.2 Locating concurrency-related bugs by testing

11.2.3 Designing for testability

11.2.4 Multithreaded testing techniques

BRUTE-FORCE TESTING

COMBINATION SIMULATION TESTING

DETECTING PROBLEMS EXPOSED BY TESTS WITH A SPECIAL LIBRARY

11.2.5 Structuring multithreaded test code

11.2.6 Testing the performance of multithreaded code

作者

发布于

更新于

许可协议

目录

分类

最新文章

目录

标签

订阅更新

10.2.2 `std::execution::sequenced_policy`

10.2.3 `std::execution::parallel_policy`

10.2.4 `std::execution::parallel_unsequenced_policy`