项目作者: EGGaming

项目描述 :
A package for scraping manga from various manga websites
高级语言: TypeScript
项目地址: git://github.com/EGGaming/mangascraper.git
创建时间: 2021-06-26T09:02:34Z
项目社区:https://github.com/EGGaming/mangascraper

开源协议:MIT License

下载


npm package license

Mangascraper is a package used to scrape mangas. It is a solution to retrieving mangas that do not offer an API. Mangascraper can run either asynchronously, returning a Promise, or synchronously if a callback function is provided.


Table of Contents

  1. Installation
  2. Sources
  3. Usage
  4. Configuring puppeteer
  5. Examples
  6. API Reference
  7. License

Installation

npm

  1. npm install @specify_/mangascraper

Sources

Currently, mangascraper supports 5 sources, but will support more in the future.

Source Supported? Uses puppeteer? Uses axios?
MangaBox ✔️ ✔️ ✔️
Mangafreak —- —-
Mangakakalot ✔️ ✔️
Manganato ✔️ ✔️
Mangahasu ✔️ ✔️
Mangaparkv2 ✔️ ✔️
Mangasee ✔️ ✔️
Readmng ✔️ ✔️ ✔️
Kissmanga —- —-

If a supported source uses axios, mangascraper will try to use axios as much as possible to save computer resources. If the network request is blocked by Cloudflare, mangascraper will resort to using puppeteer.

If a supported source uses both axios and puppeteer, it means one or more methods in the source use either axios or puppeteer. For example, Readmng uses puppeteer for search(), but uses axios for getMangaMeta() and getPages


Usage

To start using the package, import a class such as Mangakakalot from the package and use the methods to get mangas from that source.

Here’s an example:

  1. import { Manganato } from '@specify_/mangascraper';
  2. const manganato = new Manganato();
  3. (async () => {
  4. const mangas = await manganato.search('One Piece');
  5. const meta = await manganato.getMangaMeta(mangas[0].url);
  6. console.log(meta.chapters);
  7. })();

which outputs…

  1. [
  2. {
  3. name: 'Chapter 1007',
  4. url: 'https://readmanganato.com/manga-aa951409/chapter-1007',
  5. views: '730,899',
  6. uploadDate: 2021-03-12T07:00:00.000Z
  7. },
  8. {
  9. name: 'Chapter 1006',
  10. url: 'https://readmanganato.com/manga-aa951409/chapter-1006',
  11. views: '364,964',
  12. uploadDate: 2021-03-05T07:00:00.000Z
  13. },
  14. ... and more items
  15. ]

Configuring puppeteer

Connecting to an endpoint

If you already have an existing puppeteer endpoint, mangascraper can connect to that endpoint instead and perform faster concurrent operations.

Mangascraper also includes its own puppeteer launch arguments, and it is recommended to use them for scraping to go smoothly.

  1. import puppeteer from 'puppeteer';
  2. import { initPuppeteer, MangaSee } from '@specify_/mangascraper';
  3. (async () => {
  4. const browser = await puppeteer.launch({ ...initPuppeteer });
  5. const endpoint = browser.wsEndpoint();
  6. browser.disconnect();
  7. const mangasee = new MangaSee({ puppeteerInstance: { instance: 'endpoint', wsEndpoint: endpoint } });
  8. const mangas = await mangasee.search('Haikyu!');
  9. })();

Since you are using your own puppeteer package, mangascraper cannot make any modificatins to the browser such as including a proxy.

  1. const browser = await puppeteer.launch();
  2. const mangapark = new MangaPark({
  3. proxy: { host: '127.0.0.1', port: 8080 },
  4. puppeteerInstance: { instance: 'custom', browser },
  5. }); // ❌ Mangascraper cannot include proxy
  6. const browser = await puppeteer.launch({ args: ['--proxy-server=127.0.0.1:8080'] });
  7. const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); // ✔️ Our own browser instance will launch with a proxy

Because mangascraper is connecting to an existing endpoint, you must do all your browser arguments outside of mangascraper. See this for more on this.

Overriding mangascraper’s puppeteer launch arguments

If you want to override the launch arguments mangascraper uses, you can add this to any manga class such as MangaSee as long as you are using the default instance. Any other instance will require you to implement your own or inherit mangascraper’s puppeteer options with initPuppeteer

  1. const mangasee = new MangaSee({ puppeteerInstance: { instance: 'default', launch: { ...myCustomLaunchOptions } } });

If you want to include a proxy, mangascraper will automatically put it into the launch arguments.

  1. const manganato = new Mangahasu({
  2. proxy: { host: 'proxy_host', port: 8080 },
  3. puppeteerInstance: { instance: 'default' },
  4. });

Using an existing puppeteer package

By using an existing puppeteer package in your app, this will enable mangascraper to use one browser instead of opening new browsers per operation. In addition, mangascraper will be able to scrape manga concurrently. With this approach, resources will be less intensive on chromium, and it can save you a lot of time if you are handling a lot of scraping operations. This is the best approach if you do not want to connect to an existing endpoint.

However, you must have puppeteer already installed.

This is the most basic setup:

  1. import puppeteer from 'puppeteer';
  2. import { MangaPark, initPuppeteer } from '@specify_/mangascraper';
  3. (async () => {
  4. const browser = await puppeteer.launch(initPuppeteer);
  5. const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } });
  6. })();

Since you are using your own puppeteer package, mangascraper cannot add any modifications to the browser such as including a proxy.

  1. const browser = await puppeteer.launch();
  2. const mangapark = new MangaPark({
  3. proxy: { host: '127.0.0.1', port: 8080 },
  4. puppeteerInstance: { instance: 'custom', browser },
  5. }); // ❌ Mangascraper cannot include a proxy
  6. const browser = await puppeteer.launch({ args: ['--proxy-server=127.0.0.1:8080'] });
  7. const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); // ✔️ Our own browser instance will launch with a proxy

By default, mangascraper does not close the browser after the end of operation. If by any means you want to close the browser after an operation has finished. You can add the following to puppeteerInstance

  1. puppeteerInstance: {
  2. instance: 'custom',
  3. browser: browser,
  4. options: {
  5. closeAfterOperation: true // After an operation is finished, close the browser
  6. }
  7. }

However, this will prevent mangascraper from proceeding to another operation after one is finished such as this example:

  1. const mangapark = new MangaPark({ puppeteerInstance: 'custom', browser, options: { closeAfterOperation: true } });
  2. await mangapark
  3. .search('Naruto', { orderBy: 'latest_updates' })
  4. .then(async (mangas) => await Promise.all(mangas.map((manga) => mangapark.getMangaMeta(manga.url)))); // ❌ Browser will close after gathering results of mangas that match the title Naruto and will not gather metadata from each source.

Examples



Running asynchronously

js const mangas = await mangahasu.search('Fairytail'); console.log(mangas);



Running synchronously

js mangahasu.search('Fairytail', null, (err, mangas) => { if (err) return console.error(err); console.log(mangas); });



Mangakakalot

Get a list of manga that match the title Black Clover

js import { Mangakakalot } from '@specify_/mangascraper'; const mangakakalot = new Mangakakalot(); mangakakalot.search('Black Clover', function (err, mangas) { console.log(mangas); });

Get a list of manga from the Isekai genre

js import { Mangakakalot } from '@specify_/mangascraper'; const mangakakalot = new Mangakakalot(); mangakakalot.getMangas({ genre: 'Isekai' }, function (err, mangas) { console.log(mangas); });

Get the metadata of the Jaryuu Tensei manga

js import { Mangakakalot } from '@specify_/mangascraper'; const mangakakalot = new Mangakakalot(); mangakakalot.getMangaMeta('https://mangakakalot.com/read-qt9nz158504844280', function (err, meta) { console.log(meta); });



Manganato

Get a list of manga that match the title Naruto

js import { MangaNato } from '@specify_/mangascraper'; const manganato = new Manganato(); manganato.search('Naruto', null, function (err, mangas) { console.log(mangas); });

Get a list of manga from the Romance genre that do not have the Drama genre

js import { MangaNato } from '@specify_/mangascraper'; const manganato = new Manganato(); manganato.search(null, { genre: { include: ['Romance'], exclude: ['Drama'] } }, function (err, mangas) { console.log(mangas); });

Get the metadata of the Solo Leveling manhwa

js import { MangaNato } from '@specify_/mangascraper'; const manganato = new Manganato(); manganato.getMangaMeta('https://readmanganato.com/manga-dr980474', function (err, meta) { console.log(meta); });

Simple search for manga that match the genre, which uses less compute power compared to getMangas()

js import { MangaNato } from '@specify_/mangascraper'; const manganato = new MangaNato(); manganato.getMangasFromGenre('Comedy', {}, (err, mangas) => { console.log(mangas); });



Mangahasu

Get a list of manga

js import { Mangahasu } from '@specify_/mangascraper'; const mangahasu = new Mangahasu(); mangahasu.search(null, null, (err, mangas) => { console.log(mangas); });

Get the metadata of Attack on Titan manga

js import { Mangahasu } from '@specify_/mangascraper'; const mangahasu = new Mangahasu(); mangahasu.getMangaMeta('https://mangahasu.se/shingeki-no-kyojin-v6-p27286.html', (err, meta) => { console.log(meta); });

Get pages of the chapter that is in the 1st index of the Attack on Titan chapters array.

js import { Mangahasu } from '@specify_/mangascraper'; const mangahasu = new Mangahasu(); (async () => { const mangas = await mangahasu.search('Attack on Titan'); const meta = await mangahasu.getMangaMeta(mangas[0].url); const pages = await mangahasu.getPages(meta.chapters[0].url); console.log(pages); })();


MangaSee

Get a list of manga that match the title the melancholy of haruhi suzumiya, and as well open puppeteer in headful mode (useful for debugging);

js import { MangaSee } from '@specify_/mangascraper'; const mangasee = new MangaSee({ debug: true }); // Opens puppeteer in headful mode (async () => { const mangas = await mangasee.search('the melancholy of haruhi suzumiya'); console.log(mangas); })();

Get all mangas from the MangaSee directory.

js import { MangaSee } from '@specify_/mangascraper'; const mangasee = new MangaSee(); (async () => { const mangas = await mangasee.directory(); console.log(mangas); })();

Get the metadata of the Berserk manga

js import { MangaSee } from '@specify_/mangascraper'; const mangasee = new MangaSee(); (async () => { const berserk = await mangasee.getMangaMeta('https://mangasee123.com/manga/Berserk'); console.log(berserk); })();

Get the Chapter 363 pages of the Berserk manga

js import { MangaSee } from '@specify_/mangascraper'; const mangasee = new MangaSee(); (async () => { const chapter363 = await mangasee.getPages('https://mangasee123.com/read-online/Berserk-chapter-363-index-2.html'); console.log(chapter363); })();



MangaPark v2

Search for a manga that matches the title noragami.

Get the first result and get the meta

Then get the pages of the latest chapter

js import { MangaPark, initPuppeteer } from '@specify_/mangascraper'; (async () => { const browser = await puppeteer.launch(initPuppeteer); const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); const mangas = await mangapark.search('noragami'); const meta = await mangapark.getMangaMeta(mangas[0].url); const pages = await mangapark.getPages(meta.chapters[meta.chapters.recentlyUpdated][0].pages); console.log(pages); })();



Readmng

Get 50 of the most viewed mangas

js import { ReadMng } from '@specify_/mangascraper'; (async () => { const readmng = new ReadMng(); const mangas = await readmng.search(); console.log(mangas); })();



MangaBox

For React JS

Get pages and display them on webpage. Do note that the getMangaMeta method of this class requires puppeteer, so if you want to get the manga meta, consider fetching to a custom API that uses the mangascraper package.

tsx import React from 'react'; import { MangaBox } from '@specify_/mangascraper'; const mangabox = new MangaBox(); const App: React.FC = () => { const [pages, setPages] = React.useState<string[]>([]); React.useEffect(() => { mangabox .getPages('https://mangabox.org/manga/solo-leveling-manhua-manga/chapter-159/') .then((pages) => setPages(pages)) .catch((e) => console.error(e)); }, []); return ( <div> {pages.map((page) => ( <img src={page} /> ))} </div> ); }; export default App;


API Reference


License

Distributed under MIT © Joseph Marbella