New scrape process

February 22, 2020

The script below takes advantage of a file populated with URLs as input, replacing the $line with a URL one at a time from the said file. The first $line names the destination folder where downloads are saved, while the second instance queries the wayback machine for data.

URLs are read by a loop via an input file (input.file). The loop starts a shell built-in, while read -r line, referencing each URL in the input file. The loop then runs wayback_machine_downloader … $line” command in the background against the URL input file. The final line done < input.file provides the file to the loop. The tee command logs the stdout to a text file for any post evaluation while allowing stdout to simultaneously display on screen.

while read -r line
wayback_machine_downloader -s -d "$line"_IA -t2001 -c3 --only "/\.(hqx|sit|sitx|dd|pkg|bin|sea|cpt)$/i" "$line"
done < input.file | tee -a log.txt