Timing Results
Time in seconds to perform various image fetching tasks;
Task | Multi-thread multi-search (concurrent_image_search) | Multi-thread single-search (concurrent_images_download) | Single-thread single-search (download_images) | google-images-download by hardikvasa |
---|---|---|---|---|
Download 200 cat pictures | 23.6 | 22.4 | 92.7 | 148.4 |
Download 200 cat & dog pictures | 28.7 | 47.7 | 254.2 | 330.4 |
All tests were ran with the following config;
- total_images=200
- headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
- progress_bar=False
- verbose=True
Both concurrent_image_search and concurrent_images_download were ran with;
- max_image_fetching_threads=20
- image_download_timeout=3
concurrent_image_search was also ran with max_similtanous_threads=2
google-images-download was ran with the following config; arguments = {"keywords":"cat", "limit":200, "chromedriver": "chromedriver.exe", "format": "jpg", "print_urls":False}
Explanation
Understandably in all cases concurrent processing beat out single thread because they are able to download multiple images similtaneously. concurrent_image_search goes one step further with multiple search terms by running them similitaneoulsy, where the other 2 must run one after the other. What's interesting is that concurrent_image_search is slower than concurrent_images_download even though the first actually uses the second when executing. This delay is likely to do with the fact that concurrent_image_search must allocate the call to a thread handler, whereas concurrent_images_download starts immediatly.