How to alter code so it can download pdfs from other pdfs as well

Keywords: python pdf beautifulsoup


I need to code something that takes in a URL or PDF and then downloads all the PDFs on that page. So far it works when I put in a webpage but inputting PDFs doesn't work. I have very little Python experience and realize it's because BeautifulSoup only works with HTML and XML files so I was wondering if there was something that did the same thing with PDFs.

import os
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

url = input("Please enter the URL ")
folder_location = input("Please enter the folder location(ie. C:\ExampleFolder) ")

#If there is no such folder, the script will create one automatically
if not os.path.exists(folder_location):os.mkdir(folder_location)

response = requests.get(url)
soup= BeautifulSoup(response.text, "html.parser")
for link in"a[href$='.pdf']"):
    #Name the pdf files using the last portion of each link which are unique in this case
    filename = os.path.join(folder_location,link['href'].split('/')[-1])
    with open(filename, 'wb') as f: