16.4. Python程序中concurrent模块 ¶

16.4.1. concurrent模块的介绍 ¶

concurrent.futures模块提供了高度封装的异步调用接口

ThreadPoolExecutor：线程池，提供异步调用

ProcessPoolExecutor：进程池，提供异步调用

ProcessPoolExecutor 和 ThreadPoolExecutor：两者都实现相同的接口，该接口由抽象Executor类定义。

16.4.2. 基本方法 ¶

submit(fn, *args, **kwargs) :异步提交任务

map(func, *iterables, timeout=None, chunksize=1) ：取代for循环submit的操作

shutdown(wait=True) ：相当于进程池的pool.close()+pool.join()操作

wait=True，等待池内所有任务执行完毕回收完资源后才继续
wait=False，立即返回，并不会等待池内的任务执行完毕
但不管wait参数为何值，整个程序都会等到所有任务执行完毕
submit和map必须在shutdown之前

result(timeout=None) ：取得结果

add_done_callback(fn)：回调函数

16.4.3. 进程池和线程池 ¶

池的功能：限制进程数或线程数.

什么时候限制：当并发的任务数量远远大于计算机所能承受的范围,即无法一次性开启过多的任务数量我就应该考虑去限制我进程数或线程数,从保证服务器不崩.

进程池 ¶

#!/usr/bin/env python
# -*- coding:utf8 -*-
# auther; 18793
# Date：2019/12/18 9:55
# filename: task0001.py

from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Process, current_process
from time import time


def task(i):
    print("{} 在执行任务{}".format(current_process().name, i))
    time.sleep(1)


if __name__ == '__main__':
    pool = ProcessPoolExecutor(4)        # 进程池里有4个进程
    for i in range(20):                 # 20个任务
        pool.submit(task, i)             # 进程池里当前执行的任务i，池子里的4个进程一次一次执行任务

线程池 ¶

#!/usr/bin/env python
# -*- coding:utf8 -*-
# auther; 18793
# Date：2019/12/18 9:59
# filename: task0002.py

from concurrent.futures import ThreadPoolExecutor
from threading import Thread, currentThread
from time import time


def task(i):
    print("{} 在执行任务{}".format(currentThread().name, i))
    time.sleep(1)


if __name__ == '__main__':
    pool = ThreadPoolExecutor(4)        # 进程池里有4个进程
    for i in range(20):                 # 20个任务
        pool.submit(task, i)            # 进程池里当前执行的任务i，池子里的4个进程一次一次执行任务

Map的用法 ¶

#!/usr/bin/env python
# -*- coding:utf8 -*-
# auther; 18793
# Date：2019/12/18 10:02
# filename: map的用法.py
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import os, time, random


def task(n):
    print('%s is runing' % os.getpid())
    time.sleep(random.randint(1, 3))
    return n ** 2


if __name__ == '__main__':
    executor = ThreadPoolExecutor(max_workers=3)
    # for i in range(20):
    #   future=executor.submit(task,i)
    executor.map(task, range(1, 21))  # map取代了for+submit

同步和异步 ¶

理解为提交任务的两种方式

同步: 提交了一个任务,必须等任务执行完了(拿到返回值),才能执行下一行代码

异步: 提交了一个任务,不要等执行完了,可以直接执行下一行代码.

同步：相当于执行任务的串行执行

异步

#!/usr/bin/env python
# -*- coding:utf8 -*-
# auther; 18793
# Date：2019/12/18 10:04
# filename: 异步.py

from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Process, current_process
import time

n = 1


def task(i):
    global n
    print("{} 在执行任务{}".format(current_process().name, i))
    time.sleep(1)
    n += i
    return n


if __name__ == '__main__':
    pool = ProcessPoolExecutor(4)  # 进程池里有4个线程
    pool_lis = []
    for i in range(20):  # 20个任务
        future = pool.submit(task, i)  # 进程池里当前执行的任务i，池子里的4个线程一次一次执行任务
        # print(future.result()) # 这是在等待我执行任务得到的结果，如果一直没有结果，这里会导致我们所有任务编程了串行
        # 在这里就引出了下面的pool.shutdown()方法
        pool_lis.append(future)
    pool.shutdown(wait=True)  # 关闭了池的入口，不允许在往里面添加任务了，会等带所有的任务执行完，结束阻塞
    for p in pool_lis:
        print(p.result())
    print(n)  # 这里一开始肯定是拿到0的，因为我只是去告诉操作系统执行子进程的任务，代码依然会继续往下执行
    # 可以用join去解决，等待每一个进程结束后，拿到他的结果

回调函数 ¶

#!/usr/bin/env python
# -*- coding:utf8 -*-
# auther; 18793
# Date：2019/12/18 10:05
# filename: 回调函数.py

import time
from threading import Thread, currentThread
from concurrent.futures import ThreadPoolExecutor


def task(i):
    print("{} 在执行任务{}".format(currentThread().name, i))
    time.sleep(1)
    return i ** 2


# parse 就是一个回调函数
def parse(future):
    # 处理拿到的结果
    print("{} 结束了当前任务".format(currentThread().name))
    print(future.result())


if __name__ == '__main__':
    pool = ThreadPoolExecutor(4)
    for i in range(20):
        future = pool.submit(task, i)
        '''
        给当前执行的任务绑定了一个函数，在当前任务结束的时候就会触发这个函数（称之为回调函数）
        会把future对象作为参数传给函数
        注：这个称为回调函数，当前任务处理结束了，就回来调parse这个函数
        '''
        future.add_done_callback(parse)
        # add_done_callback (parse) parse是一个回调函数
        # add_done_callback () 是对象的一个绑定方法，他的参数就是一个函数

例子 ¶

#!/usr/bin/env python
# -*- coding:utf8 -*-
# auther; 18793
# Date：2020/2/11 12:08
# filename: ThreadPoolExecutor_example01.py
import concurrent.futures
import urllib.request

URLS = ['http://www.baidu.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']


def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    future_to_url = {executor.submit(load_url, url, 2): url for url in URLS}

for future in concurrent.futures.as_completed(future_to_url):
    url = future_to_url[future]
    try:
        data = future.result()
    except Exception as exc:
        print('%r generated an exception: %s' % (url, exc))
    else:
        print('%r page is %d bytes' % (url, len(data)))

"""
'http://www.baidu.com/' page is 169884 bytes
'http://www.cnn.com/' generated an exception: <urlopen error timed out>
'http://www.bbc.co.uk/' generated an exception: <urlopen error timed out>
'http://europe.wsj.com/' generated an exception: <urlopen error timed out>
'http://some-made-up-domain.com/' generated an exception: <urlopen error [Errno 11001] getaddrinfo failed>
"""