NAV

Datasets API
Shopee

shell

Introduction

Data API for Shopee is an API to retrieve publicly available data from multiple Shopee domains. The API can return pre-scraped data or scrape data on demand via scrape endpoint. Pre-scraped data is updated on regular basis. The API returns clean, structured JSON. Pre-scraped data can be retrieved using pagination, downloading the whole dataset via provided links, or uploading it to your cloud storage. On-demand data can be uploaded to your cloud storage in JSONL format and can also be downloaded via provided links.

On-demand data jobs can take up to 24 hours to complete, depending on the dataset size and scraping complexity.

On-Demand Data

/scrape

curl -X POST "https://api.datasets.oxylabs.io/v1/shopee/scrape?api_key=<API_KEY>&domain=mx" -H "Content-Type: application/json" -d '{"dataset": "searches", "keywords": ["shoes", "sneakers"], "destination": {"target": "GCP", "bucket": "bucket_name", "prefix": "folder_1/data"}}'

Sample response:

{
  "job_id": "cc7db3d134b6-d1bdf0c7-d038-470e-aa3b",
  "status": "Accepted"
}

If pre-scraped data is not available or is not up to date for your use case, you can request up-to-date data using the /scrape endpoint. We will scrape the data and upload it to your cloud storage in JSONL format.

Note: The data uploaded to your bucket will be gzipped.

On-demand data jobs can take up to 24 hours to complete, depending on the dataset size and scraping complexity.

Query Parameters

Parameter Description Required
domain Shopee domain. Available values: mx, sg, br, cl, th, tw, ph, my, vn, co, id.
IMPORTANT: This parameter goes in the URL, while other parameters go in the body.
Yes
dataset Dataset type. Available values: searches, category_products (for category view), products, seller_products Yes
     keywords Keywords to search for (for example: shoes, sneakers). Parameter type is an array[]. Required if dataset is searches
     categories Category IDs (Full path, for example: 834756.8349548.837124). Parameter type is an array[]. Required if dataset is category_products
     products Shop ID and Product ID combined (for example: 123456.789012 (shop_id.product_id)). Parameter type is an array[]. Required if dataset is products
     sellers Seller identificator (for example: myshop1)). Parameter type is an array[]. Required if dataset is seller_products
destination Your S3 or GCP bucket details where the dataset will be uploaded. No
     target Target cloud storage. Available values: GCP, AWS. No
     bucket Bucket name. No
     prefix Folder prefix. Dataset in JSON will be put in prefix/json/ folder. No
data_schema Instead of returning data in our default schema, we can adapt the output to match your needs. Contact your account manager to learn more. No
html true or false. If true, alongside a dataset in JSONL format, the HTMLs will be returned. No
discovery true or false. If true, the API will perform deep product discovery to find more products other than the ones visible in the default pagination. No
max_pages Maximum number of pages to scrape. By default, all visible pages will be scraped. No

/jobs

curl "https://api.datasets.oxylabs.io/v1/jobs/<JOB_ID>?api_key=<API_KEY>"

Sample response:

{
  "id": "2ad7e459-a989-4ed1-80cc-123",
  "job_type": "scrape",
  "stage": "FINISHED",
  "status": "Done",
  "request": {
    "dataset": "searches",
    "keywords": [
      "lego"
    ],
    "destination": {
      "target": "GCP",
      "bucket": "bucket_name",
      "prefix": "folder_1/data"
    },
    "target": "shopee",
    "domain": "th"
  },
  "created_at": "2024-01-30T14:34:37.319812",
  "updated_at": "2024-01-30T15:01:32.681752",
  "results": "gs://database-api/data/shopee/searches/dt=2024-10-30/run_id=5d6950a20f"
}

Jobs endpoint allows to find all jobs created by your API key. You can also check each job individually to see if it has been completed.

Usage Stats

/users/stats

curl "https://api.datasets.oxylabs.io/v1/users/stats?api_key=<API_KEY>"

Response:

{
  "data": [
    {
      "date": "2024-01-01",
      "credits_used": 3600
    },
    {
      "date": "2024-01-02",
      "credits_used": 7600
    },
    {
      "date": "2024-01-03",
      "credits_used": 720
    }
  ]
}

Query Parameters

Parameter Description Required
api_key Your API key Yes
shell