Category : web-scraping

Here is my code. Basically What I am willing to do is take the HTML and parse it to get the content. async function main() { const browser = await puppeteer.launch({ headless: false, executablePath: EXECUTABLE_PATH, devtools: true, timeout: 50000, }); const page = await browser.newPage(); await page.goto(URL); // await page.screenshot({ path: "screenshot.png", fullPage: true }); ..

Read more

I’ve seen examples of Selectors like below, await page.waitForXPath("(//span[@class=’CountTitle-number’])[1]"); let elHandle = await page.$x("(//span[@class=’CountTitle-number’])[1]"); let lamudiNewPropertyCount = await page.evaluate(el => el.textContent, elHandle[0]); But it doesn’t seem to work on paths that are long, taking the same example as above, await page.waitForXPath("//div[1]/div/div[1]/div/div[3]"); let elHandle = await page.$x("//div[1]/div/div[1]/div/div[3]"); let lamudiNewPropertyCount = await page.evaluate(el => el.textContent, elHandle[0]); Appreciate ..

Read more

I am making test requests to a website with proxies to determine which one seems the best, problem is that the request sometimes hangs or takes really long time to respond even though I have set a timeout on the request. notPingedProxies is an array of proxies let request = require(‘request-promise’) function goodProxies (notPingedProxies) { ..

Read more

Traditionally, I use beautifulsoup to parse line by line. That doesnt seem to work in this case and just prints blanks for me. I want the link and the title of each job posting from bs4 import BeautifulSoup import requests import time url=’https://oysterpointrx.com/careers/’ r=requests.get(url) time.sleep(4) soup=BeautifulSoup(r.content,’html.parser’) content=soup.find_all(‘div’,class_= ‘opening’) for item in content: print(item.text) Source: Ask ..

Read more

The link below is a link to the injuries of a specific player. My problem is, that in the last cell of the javascript table, I want to get the clubs the player played for via the "img". https://www.transfermarkt.ch/emre-can/verletzungen/spieler/119296 team_list = [] url = ‘https://www.transfermarkt.ch/emre-can/verletzungen/spieler/119296/ajax/yw1/page1’ response = requests.get(url, headers={‘User-Agent’: ‘Custom5’}) injury_data = response.text soup = ..

Read more