Linux/Unix-mangascraper-PROSAGA-码农传奇

Mangascraper is a package used to scrape mangas. It is a solution to retrieving mangas that do not offer an API. Mangascraper can run either asynchronously, returning a Promise, or synchronously if a callback function is provided.

Installation
- npm
Sources
Usage
Configuring puppeteer
Examples
API Reference
License

Installation

npm

npm install @specify_/mangascraper

Sources

Currently, mangascraper supports 5 sources, but will support more in the future.

Source	Supported?	Uses puppeteer?	Uses axios?
MangaBox	✔️	✔️	✔️
Mangafreak	❌	—-	—-
Mangakakalot	✔️	❌	✔️
Manganato	✔️	❌	✔️
Mangahasu	✔️	❌	✔️
Mangaparkv2	✔️	✔️	❌
Mangasee	✔️	✔️	❌
Readmng	✔️	✔️	✔️
Kissmanga	❌	—-	—-

If a supported source uses axios, mangascraper will try to use axios as much as possible to save computer resources. If the network request is blocked by Cloudflare, mangascraper will resort to using puppeteer.

If a supported source uses both axios and puppeteer, it means one or more methods in the source use either axios or puppeteer. For example, Readmng uses puppeteer for search(), but uses axios for getMangaMeta() and getPages

Usage

To start using the package, import a class such as Mangakakalot from the package and use the methods to get mangas from that source.

Here’s an example:

import { Manganato } from '@specify_/mangascraper';
const manganato = new Manganato();
(async () => {
  const mangas = await manganato.search('One Piece');
  const meta = await manganato.getMangaMeta(mangas[0].url);
  console.log(meta.chapters);
})();

which outputs…

[
  {
    name: 'Chapter 1007',
    url: 'https://readmanganato.com/manga-aa951409/chapter-1007',
    views: '730,899',
    uploadDate: 2021-03-12T07:00:00.000Z
  },
  {
    name: 'Chapter 1006',
    url: 'https://readmanganato.com/manga-aa951409/chapter-1006',
    views: '364,964',
    uploadDate: 2021-03-05T07:00:00.000Z
  },
  ... and more items
]

Configuring puppeteer

Connecting to an endpoint

If you already have an existing puppeteer endpoint, mangascraper can connect to that endpoint instead and perform faster concurrent operations.

Mangascraper also includes its own puppeteer launch arguments, and it is recommended to use them for scraping to go smoothly.

import puppeteer from 'puppeteer';
import { initPuppeteer, MangaSee } from '@specify_/mangascraper';
(async () => {
  const browser = await puppeteer.launch({ ...initPuppeteer });
  const endpoint = browser.wsEndpoint();
  browser.disconnect();
  const mangasee = new MangaSee({ puppeteerInstance: { instance: 'endpoint', wsEndpoint: endpoint } });
  const mangas = await mangasee.search('Haikyu!');
})();

Since you are using your own puppeteer package, mangascraper cannot make any modificatins to the browser such as including a proxy.

const browser = await puppeteer.launch();
const mangapark = new MangaPark({
  proxy: { host: '127.0.0.1', port: 8080 },
  puppeteerInstance: { instance: 'custom', browser },
}); // ❌ Mangascraper cannot include proxy
const browser = await puppeteer.launch({ args: ['--proxy-server=127.0.0.1:8080'] });
const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); // ✔️ Our own browser instance will launch with a proxy

Because mangascraper is connecting to an existing endpoint, you must do all your browser arguments outside of mangascraper. See this for more on this.

Overriding mangascraper’s puppeteer launch arguments

If you want to override the launch arguments mangascraper uses, you can add this to any manga class such as MangaSee as long as you are using the default instance. Any other instance will require you to implement your own or inherit mangascraper’s puppeteer options with initPuppeteer

const mangasee = new MangaSee({ puppeteerInstance: { instance: 'default', launch: { ...myCustomLaunchOptions } } });

If you want to include a proxy, mangascraper will automatically put it into the launch arguments.

const manganato = new Mangahasu({
  proxy: { host: 'proxy_host', port: 8080 },
  puppeteerInstance: { instance: 'default' },
});

Using an existing puppeteer package

By using an existing puppeteer package in your app, this will enable mangascraper to use one browser instead of opening new browsers per operation. In addition, mangascraper will be able to scrape manga concurrently. With this approach, resources will be less intensive on chromium, and it can save you a lot of time if you are handling a lot of scraping operations. This is the best approach if you do not want to connect to an existing endpoint.

However, you must have puppeteer already installed.

This is the most basic setup:

import puppeteer from 'puppeteer';
import { MangaPark, initPuppeteer } from '@specify_/mangascraper';
(async () => {
  const browser = await puppeteer.launch(initPuppeteer);
  const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } });
})();

Since you are using your own puppeteer package, mangascraper cannot add any modifications to the browser such as including a proxy.

const browser = await puppeteer.launch();
const mangapark = new MangaPark({
  proxy: { host: '127.0.0.1', port: 8080 },
  puppeteerInstance: { instance: 'custom', browser },
}); // ❌ Mangascraper cannot include a proxy
const browser = await puppeteer.launch({ args: ['--proxy-server=127.0.0.1:8080'] });
const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); // ✔️ Our own browser instance will launch with a proxy

By default, mangascraper does not close the browser after the end of operation. If by any means you want to close the browser after an operation has finished. You can add the following to puppeteerInstance

puppeteerInstance: {
  instance: 'custom',
  browser: browser,
  options: {
    closeAfterOperation: true // After an operation is finished, close the browser
  }
}

However, this will prevent mangascraper from proceeding to another operation after one is finished such as this example:

const mangapark = new MangaPark({ puppeteerInstance: 'custom', browser, options: { closeAfterOperation: true } });
await mangapark
  .search('Naruto', { orderBy: 'latest_updates' })
  .then(async (mangas) => await Promise.all(mangas.map((manga) => mangapark.getMangaMeta(manga.url)))); // ❌ Browser will close after gathering results of mangas that match the title Naruto and will not gather metadata from each source.

Examples

Running asynchronously

js
const mangas = await mangahasu.search('Fairytail');
console.log(mangas);

Running synchronously

js
mangahasu.search('Fairytail', null, (err, mangas) => {
  if (err) return console.error(err);
  console.log(mangas);
});

Mangakakalot

Get a list of manga that match the title Black Clover

js
import { Mangakakalot } from '@specify_/mangascraper';

const mangakakalot = new Mangakakalot();

mangakakalot.search('Black Clover', function (err, mangas) {
  console.log(mangas);
});

Get a list of manga from the Isekai genre

js
import { Mangakakalot } from '@specify_/mangascraper';

const mangakakalot = new Mangakakalot();

mangakakalot.getMangas({ genre: 'Isekai' }, function (err, mangas) {
  console.log(mangas);
});

Get the metadata of the Jaryuu Tensei manga

js
import { Mangakakalot } from '@specify_/mangascraper';

const mangakakalot = new Mangakakalot();

mangakakalot.getMangaMeta('https://mangakakalot.com/read-qt9nz158504844280', function (err, meta) {
  console.log(meta);
});

Manganato

Get a list of manga that match the title Naruto

js
import { MangaNato } from '@specify_/mangascraper';

const manganato = new Manganato();

manganato.search('Naruto', null, function (err, mangas) {
  console.log(mangas);
});

Get a list of manga from the Romance genre that do not have the Drama genre

js
import { MangaNato } from '@specify_/mangascraper';

const manganato = new Manganato();

manganato.search(null, { genre: { include: ['Romance'], exclude: ['Drama'] } }, function (err, mangas) {
  console.log(mangas);
});

Get the metadata of the Solo Leveling manhwa

js
import { MangaNato } from '@specify_/mangascraper';

const manganato = new Manganato();

manganato.getMangaMeta('https://readmanganato.com/manga-dr980474', function (err, meta) {
  console.log(meta);
});

Simple search for manga that match the genre, which uses less compute power compared to getMangas()

js
import { MangaNato } from '@specify_/mangascraper';

const manganato = new MangaNato();

manganato.getMangasFromGenre('Comedy', {}, (err, mangas) => {
  console.log(mangas);
});

Mangahasu

Get a list of manga

js
import { Mangahasu } from '@specify_/mangascraper';

const mangahasu = new Mangahasu();

mangahasu.search(null, null, (err, mangas) => {
  console.log(mangas);
});

Get the metadata of Attack on Titan manga

js
import { Mangahasu } from '@specify_/mangascraper';

const mangahasu = new Mangahasu();

mangahasu.getMangaMeta('https://mangahasu.se/shingeki-no-kyojin-v6-p27286.html', (err, meta) => {
  console.log(meta);
});

Get pages of the chapter that is in the 1st index of the Attack on Titan chapters array.

js
import { Mangahasu } from '@specify_/mangascraper';

const mangahasu = new Mangahasu();

(async () => {
  const mangas = await mangahasu.search('Attack on Titan');
  const meta = await mangahasu.getMangaMeta(mangas[0].url);
  const pages = await mangahasu.getPages(meta.chapters[0].url);

  console.log(pages);
})();

MangaSee

Get a list of manga that match the title the melancholy of haruhi suzumiya, and as well open puppeteer in headful mode (useful for debugging);

js
import { MangaSee } from '@specify_/mangascraper';

const mangasee = new MangaSee({ debug: true }); // Opens puppeteer in headful mode

(async () => {
  const mangas = await mangasee.search('the melancholy of haruhi suzumiya');
  console.log(mangas);
})();

Get all mangas from the MangaSee directory.

js
import { MangaSee } from '@specify_/mangascraper';

const mangasee = new MangaSee();

(async () => {
  const mangas = await mangasee.directory();
  console.log(mangas);
})();

Get the metadata of the Berserk manga

js
import { MangaSee } from '@specify_/mangascraper';

const mangasee = new MangaSee();

(async () => {
  const berserk = await mangasee.getMangaMeta('https://mangasee123.com/manga/Berserk');
  console.log(berserk);
})();

Get the Chapter 363 pages of the Berserk manga

js
import { MangaSee } from '@specify_/mangascraper';

const mangasee = new MangaSee();

(async () => {
  const chapter363 = await mangasee.getPages('https://mangasee123.com/read-online/Berserk-chapter-363-index-2.html');
  console.log(chapter363);
})();

MangaPark v2

Search for a manga that matches the title noragami.

Get the first result and get the meta

Then get the pages of the latest chapter

js
import { MangaPark, initPuppeteer } from '@specify_/mangascraper';

(async () => {
  const browser = await puppeteer.launch(initPuppeteer);
  const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } });

  const mangas = await mangapark.search('noragami');
  const meta = await mangapark.getMangaMeta(mangas[0].url);
  const pages = await mangapark.getPages(meta.chapters[meta.chapters.recentlyUpdated][0].pages);

  console.log(pages);
})();

Readmng

Get 50 of the most viewed mangas

js
import { ReadMng } from '@specify_/mangascraper';

(async () => {
  const readmng = new ReadMng();

  const mangas = await readmng.search();

  console.log(mangas);
})();

MangaBox

For React JS

Get pages and display them on webpage. Do note that the getMangaMeta method of this class requires puppeteer, so if you want to get the manga meta, consider fetching to a custom API that uses the mangascraper package.

tsx
import React from 'react';
import { MangaBox } from '@specify_/mangascraper';

const mangabox = new MangaBox();

const App: React.FC = () => {
  const [pages, setPages] = React.useState<string[]>([]);

  React.useEffect(() => {
    mangabox
      .getPages('https://mangabox.org/manga/solo-leveling-manhua-manga/chapter-159/')
      .then((pages) => setPages(pages))
      .catch((e) => console.error(e));
  }, []);

  return (
    <div>
      {pages.map((page) => (
        <img src={page} />
      ))}
    </div>
  );
};

export default App;

Table of Contents