Quick tip: Download all pdf files on a website

15Jun by Urs

After RedHat Enterprise Linux 7 was released this week, which has a bunch of very cool features by the way, I wanted to download all the new documentations as PDF to put them on my iPad.

But right-click each of the 30 links and click “save as” definitely wasn’t the way to go. Administrators are lazy guys…

URL of the documentation page: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/

NOTE: Little disadvantage on this particular page:
The URL contains the links to the documentations of all RHEL versions, from 2.1 to 7, but only shows the version which is selected on the left. Luckily, it starts to download the files from top to bottom, and RHEL 7 links are on the top. I just watched the downloaded files and hit Ctrl+C as soon all documents i need have been downloaded.

Using “wget”, this is a simple task:

wget -e robots=off -np -nd -A ".pdf" -r -l1 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/

Options used:

-e robots=off (Ignore robots.txt because some pages want to deny fetching the whole page)
-np (Don’t follow links to parent pages)
-nd (Save all files in the current directory instead of creating the whole hierarchy)
-A “.pdf” (Only download files with PDF extension)
-r (Recursive)
-l1 (Follow links one level)

Quick tip: Download all pdf files on a website

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Categories

RSS Feeds

Archives