How to filter tags using classes in Python and BeautifulSoup?

m4rk_Henry_ftw

I'm trying to scrape images from a website using the beautifulsoup HTML parser.

Every image on this site has 2 kinds of image tags. One for the thumbnail and the other for the larger sized image, which only shows up when the thumbnail is clicked and expanded. Larger tags contain a class="expanded-image" attribute.

I'm trying to parse through HTML and get the "src" attribute of an extended image that contains the source of the image.

When I try to execute the code, nothing happens. It just says the process is done without scraping any images. However, when I don't try to filter the code and just pass the tag as a parameter, it downloads all the thumbnails.

Here is my code:

import webbrowser, requests, os
from bs4 import BeautifulSoup

def getdata(url):
    r = requests.get(url)
    return r.text

htmldata = getdata('https://boards.4chan.org/a/thread/30814')
soup = BeautifulSoup(htmldata, 'html.parser')

list = []

for i in soup.find_all("img",{"class":"expanded-thumb"}):
    list.append(i['src'].replace("//","https://"))

def download(url, pathname):
    if not os.path.isdir(pathname):
        os.makedirs(pathname)

    filename = os.path.join(pathname, url.split("/")[-1])
    response = requests.get(url, stream=True)

    with open(filename, "wb") as f:
        f.write(response.content)

for a in list:
    download(a,"file")

ludwig vespers

You can run into problems using "list" as a variable name. This is a type in python. Start with this (replace TEST_4CHAN_URL with whatever thread you want) and combine the suggestions in the comments above.

import requests
from bs4 import BeautifulSoup

TEST_4CHAN_URL = "https://boards.4chan.org/a/thread/<INSERT_THREAD_ID_HERE>"

def getdata(url):
    r = requests.get(url)
    return r.text

htmldata = getdata(TEST_4CHAN_URL)
soup = BeautifulSoup(htmldata, "html.parser")

src_list = []

for i in soup.find_all("a", {"class":"fileThumb"}):
    src_list.append(i['href'].replace("//", "https://"))

print(src_list)

How to get JavaScript variables from script tags using Python and Beautifulsoup

Frank Diga Como Gnar I want to return "id" value from variable meta using beautifulsoup and python. is it possible? Also, I don't know how to find some "script" tag that contains the meta variable, since it doesn't have a unique identifier, as well as many oth

How to parse text from multiple body tags using BeautifulSoup in Python?

Edison Chan I want to parse this website : https://www.flyingv.cc/project/3724 I want to get the information in the html source code. Like this info, 2830: <span class="sharenumber" id="fb_share_span">2830</span> However, when I use BeautifulSoup to extract t

How to extract parent element's tags in xml using python and BeautifulSoup

Akash Rathor For example I have xml like this <managedObject class="New" distName="MB-85404/TB-85404/ST-4/a" version="xL20A_1911_002" operation="open"> <p name="a">320ms</p> <p name="b">enabled</p> <p name="c">640ms</p>

How to find all anchor tags in a div using Beautifulsoup in Python

Deschmitz This is how my parsed HTML looks. It's all in one table and it's repeated multiple times, I just want the attribute value hrefwith the attribute inside the div class="Special_Div_Name". Then all these divs are inside the table row and there are many

How to get the text of nested tags using Beautifulsoup in Python?

Jatin Serra after running this code section = soup.find_all('section', class_='b-branches') I understand <div class="b-branches__item"><i class="icon fa"><b>Firm</b> </i>RJT Roadlines</div> Now I just want to extract RJIT Roadlines, not... Firm so i tried for

Extract text from multiple tags such as h1 and p tags with classes using BeautifulSoup and Python

Kamikaze_goldfish I've figured out how to extract text from itempropit but I can't get text out of it <div clas="someclass">Extract This Text Here!</div>I've only pasted the part of my code that doesn't work, but will paste the whole thing if needed. I've set

How to find all anchor tags in a div using Beautifulsoup in Python

Deschmitz This is how my parsed HTML looks. It's all in one table and it's being repeated multiple times, I only want hrefthe properties whose value is inside the div class="Special_Div_Name". Then all those divs are inside table rows and have many rows. <tr>

How to find all anchor tags in a div using Beautifulsoup in Python

How to find all tags matching two values using BeautifulSoup in Python

Giangio I want to get all the values span class="last value"of the sections, however, sometimes the sections have slight changes, span class="last value empty"and my code skips the changes, I want to get all the sections that start "last value"with span class=

How to extract parent element's tags in xml using python and BeautifulSoup

How to get the text of nested tags using Beautifulsoup in Python?

How to remove html tags from string in Python using BeautifulSoup

username New to programming here :) I want to print prices from a website using BeautifulSoup. Here is my code: #!/usr/bin/env python # -*- coding: utf-8 -*- from bs4 import BeautifulSoup, SoupStrainer from urllib2 import urlopen url = "Some retailer's url"

How to remove html tags from string in Python using BeautifulSoup