Skip to content Skip to sidebar Skip to footer

How To Webscrape All Shoes On Nike Page Using Python

I am trying to webscrape all the shoes on https://www.nike.com/w/mens-shoes-nik1zy7ok. How do I scrape all the shoes including the shoes that load as you scroll down the page? The

Solution 1:

By examining the API calls made by the website you can find a cryptic URL starting with https://api.nike.com/. This URL is also stored in the INITIAL_REDUX_STATE that you already used to get the first couple of products. So, I simply extend your approach:

import requests
import json
import re

# your product page
uri = 'https://www.nike.com/w/mens-shoes-nik1zy7ok'

base_url = 'https://api.nike.com'
session = requests.Session()

def get_lazy_products(stub, products):
"""Get the lazily loaded products."""
    response = session.get(base_url + stub).json()
    next_products = response['pages']['next']
    products += response['objects']
    if next_products:
        get_lazy_products(next_products, products)
    return products

# find INITIAL_REDUX_STATE
html_data = session.get(uri).text
redux = json.loads(re.search(r'window.INITIAL_REDUX_STATE=(\{.*?\});', html_data).group(1))

# find the initial products and the api entry point for the recursive loading of additional products
wall = redux['Wall']
initial_products = re.sub('anchor=[0-9]+', 'anchor=0', wall['pageData']['next'])

# find all the products
products = get_lazy_products(initial_products, [])

# Optional: filter by id to get a list with unique products
cloudProductIds = set()
unique_products = []
for product in products:
    try:
        if not product['id'] in cloudProductIds:
            cloudProductIds.add(product['id'])
            unique_products.append(product)
    except KeyError:
        print(product)

The api also returns the total number of products, though this number seems to vary and depend on the count parameter in the api`s URL.

Do you need help parsing or aggregating the results?


Post a Comment for "How To Webscrape All Shoes On Nike Page Using Python"