Getting alibabas all the categories and sub-categories with puppeteer

  javascript, node.js, puppeteer, web, web-scraping

I am trying to get all the categories and the sub categories from alibaba.com using puppeteer, actually there is even a section for =>all the main categories <= on the site. Anyways as i said i am using puppeteer to scrape that site, i have made the first 2 but still struggling on the 3rd one becouse i gotta make them look like this

Agriculture & Food
  |
  |---Agriculture
       | 
       |-blabla   
       |-lbalbal
       |-balblab

So my plan was getting the first header with puppeteer then the second one and then the blabla parts and here is the code i used for that part`

const puppeteer = require('puppeteer');
const fs = require('fs');

(async () => {
    const browser = await puppeteer.launch()
    const page = await browser.newPage()

    await page.tracing.start({
      path: 'trace.json',
      categories: ['devtools.timeline']
    })

    await page.goto('https://www.alibaba.com/Products?spm=a2700.8293689-tr_TR.0.0.50b8201b0yMd8O&cd=buyhome&tracelog=footer_categories')

    const data1 = await page.$$eval('.item.util-clearfix h3', anchors => {return anchors.map(anchor => anchor.textContent)}); //header1

    const data2 = await page.$$eval('.item.util-clearfix .sub-item .sub-title a', anchors => {return anchors.map(anchor => anchor.textContent)});//header2

    const data3 = await page.$$eval('.item.util-clearfix .sub-item .sub-item-cont.util-clearfix', anchors => {
      return anchors.map(anchor=>{
        for(var i = 0;i<=anchor.length;i++){
          for(var ii= 0;ii<=anchor[i].children.length;ii++){return anchor[i].children[ii].children[0].textContent};
        }
      });
    });


    await page.tracing.stop();
    await browser.close();

fs.appendFile('data.txt', datas, (err)=>{
      if(err) throw err;
      console.log('dosya kaydedildi');
    });

oh and btw my final plan was to merge them with like getting 1 part from each of them with an algorithm and putting them into a file with the filesystem module but it seems like the data3 part is just not working and i dont know why 🙁 and if you do know a better way to solve this issue or a better way to get all the categories please help

Source: Ask Javascript Questions

LEAVE A COMMENT