Quick tip: Download all pdf files on a website

After RedHat Enterprise Linux 7 was released this week, which has a bunch of very cool features by the way, I wanted to download all the new documentations as PDF to put them on my iPad.

But right-click each of the 30 links and click “save as” definitely wasn’t the way to go. Administrators are lazy guys…

URL of the documentation page: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/

NOTE: Little disadvantage on this particular page:
The URL contains the links to the documentations of all RHEL versions, from 2.1 to 7, but only shows the version which is selected on the left. Luckily, it starts to download the files from top to bottom, and RHEL 7 links are on the top. I just watched the downloaded files and hit Ctrl+C as soon all documents i need have been downloaded.

Using “wget”, this is a simple task:

wget -e robots=off -np -nd -A ".pdf" -r -l1 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/

Options used:

  • -e robots=off (Ignore robots.txt because some pages want to deny fetching the whole page)
  • -np (Don’t follow links to parent pages)
  • -nd (Save all files in the current directory instead of creating the whole hierarchy)
  • -A “.pdf” (Only download files with PDF extension)
  • -r (Recursive)
  • -l1 (Follow links one level)
This entry was posted in Bash, CentOS, Linux, Mac, Quick tip and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *