Multi-Threading in Python¶
May 2018
A reference implementation of multi threading network requests for better perfromance with python3.
TODO: Parallelize GET requests to fetch data over the network.
- Creating a Thread Worker class that extends Threadand has atask_queue- Set an action_typeto reuse the worker for different type of actions that the worker queue can be reused for.
 
- Set an 
from threading import Thread
from util.data_util import getData
class ThreadWorker(Thread):
    def __init__(self, queue):
        Thread.__init__(self)
        self.task_queue = queue
    def run(self):
        while True:
            action_type, args = self.task_queue.get()
            if action_type == 'getData':
                getData(args=args)
            self.task_queue.task_done()
- Writing the Data Util for GETrequests to fetch data- Using **kwargsfor generic named param args as a dict.
- Using an outputcontainer to return content.
 
- Using 
import urllib.request
from urllib.error import HTTPError
def getData(**kwargs):
    args = kwargs['args']
    try:
        content = urllib.request.urlopen(args['get_url']).read()
    except HTTPError:
        content = None
    args['output']['content'] = content
- Creating and starting a multi threaded task queue
from queue import Queue
task_queue = Queue()
for i in range(CPU_COUNT):
    worker = ThreadWorker(task_queue)
    worker.daemon = True
    worker.start()
- Putting tasks in task queue
output1 = {}
task_queue.put(('getData',{'get_url' : 'http://google.com','output' : output1}))
output2 = {}
task_queue.put(('getData',{'get_url' : 'http://example.com','output' : output2}))
- Waiting for tasks to finish
task_queue.join()
- Output
getData for="http://example.com" takes time=0.04711198806762695s getData for="http://example.com" takes time=0.04711198806762695s getData for="http://example.com" takes time=0.04514908790588379s getData for="http://google.com" takes time=0.1752634048461914s getData for="http://google.com" takes time=0.17326903343200684s getData for="http://google.com" takes time=0.1590442657470703s main takes time=0.2101454734802246s serial exec time would be time=0.6469497680664062s
Source code: PyUtils/MultiThread