Gain the knowledge on how to extract data from a Shopify ecommerce website by utilizing Python, requests, and the products.json file, or opt for an easier method and retrieve all the products from a store’s inventory with just a couple of lines of code.

Below are a few suggestions to conduct web scraping accurately:

  • Obtain authorization before scraping a website.
  • Carefully read and comprehend the website’s terms of service and robots.txt file.
  • Restrict the frequency of your scraping.
  • Employ web scraping tools that comply with the terms of service of website owners.

With the rise of JavaScript and JSON in modern website development, it’s possible to locate public APIs hidden within the page code. These APIs provide structured data that can be easily scraped without the need for custom HTML scrapers for each website.

Shopify ecommerce sites are a prime example of this technique. They use JavaScript to construct their front-end pages and rely on a public-facing API to access the required data, which is served in JSON format. One of these APIs, known as products.json, can be found at the root of every Shopify site.

Identify the website which uses Shopify as web framework.

The products.json file holds information about the entire catalogue of a Shopify site, including product names, IDs, SKUs, URLs, images, prices, descriptions, and many other values. This project aims to demonstrate how to develop a web scraper that accesses products.json and exports the entire product catalogue to a Pandas dataframe.

How to extract the product data (.json file) of Shopify Website?

[website]/products.json?limit=1000

 

To get the products data from the store, we simply need to add /products.json?limit={the number you want} to the store URL. The products data will be shown in JSON format.

If we want to view the data in a more readable format on a browser, we can use Firefox to open the hyperlink or JSON file. Then, we can select ‘Pretty print’ and easily read the formatted data.

Click on 'Raw Data' and then click on 'Pretty Print'
Click on ‘Raw Data’ and then click on ‘Pretty Print’

I used Python code to scrap the data.

Import the packages

I imported json, pandas and requests library.

import pandas as pd
import json
import requests

async def request():
async with aiohttp.ClientSession() as session:
async with session.get(url=’https://novis.com.au/products.json?limit=1900′) as resp:
html = await resp.json()
k = list()
f = openpyxl.Workbook()
sheet = f.active
sheet.append([‘Name’, ‘Barcode’, ‘Product Category’, ‘Image’, ‘Internal Reference’, ‘Sales Price’,’Product Tags’,’Option’,’Vendor’,’Description’])
products = []

‘Id’ is the barcode of the product.

‘Title’ is product name.

‘Body_html’ is product description.

‘vendor’ is the company origin of products.

 

 

We save the result to spreadsheet file.

 

 


print(“Saving to excel …”)
for i in html[‘products’]:
title = i.get(‘title’)
id1 = i.get(‘id’)
product_type = i.get(‘product_type’)
vendor=i.get(‘vendor’)
ref=i.get(‘tags’)
refine=’,’.join(map(str,ref))
for (i = 0; i < len(refine); i++)
{refine += i.tags[i];}
products.append(description)
products.append((title, id1, product_type, images,vendor,description))

Here is the result of the csv file!

By admin

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *