Fork me on GitHub

Scrape.js — Web Scraping Library for Node.js

Scrape.js

GitHub Repo stars NPM Downloads GitHub code size in bytes GitHub License


Scrape.js is an easy to use web scraping library for Node.js.

const data = await scrape("https://example.com");
// { url, html }

Features

Install

Install Scrape.js from NPM:

npm install @themaximalist/scrape.js

Config

Scrape.js uses Zen Rows for proxy rotation. To use it acquire a Zen Rows API key and setup the environment variable.

ZENROWS_API_KEY=abcxyz123

Scrape.js can be used without proxies, but is less effective.

Usage

Using Scrape.js is as simple as calling a function with a website URL.

const scrape = require("@themaximalist/scrape.js");
await scrape("http://example.com"); // { url, html }

You can specify additional options to scrape() for more control:

const data = await scrape("https://example.com", {
    headless: true,
    proxy: true
});
// { url, html }

API

The Scrape.js API is a simple function you call with your URL, with an optional config object.

await scrape(
    url, // URL to scrape
    {
        headless: true, // Use JavaScript headless scraping
        proxy: true, // Use proxy rotation
        method: "GET", // HTTP Request method
        timeout: 3000, // Scrape timeout in ms
        userAgent: "Mozilla/5.0...", // User Agent
    }
);

URL (required)

Options

Response

Scrape.js returns an object containing the final url and html content.

const { url, html } = await scrape("https://example.com");
console.log(url); // https://example.com/
console.log(html); // <html...

The Scrape.js API is a simple and reliable way to scrape the HTML from any website.

Debug

Scrape.js uses the debug npm module with the scrape.js namespace.

View debug logs by setting the DEBUG environment variable.

> DEBUG=scrape.js*
> node src/get_website_html.js
# debug logs

Examples

View tests to examples on how to use Scrape.js.

Projects

Scrape.js is currently used in the following projects:

License

MIT

Author

Created by The Maximalist, see our open-source projects.