项目作者: WileyLabs

项目描述 :
Extract content from HTML data block script elements
高级语言: JavaScript
项目地址: git://github.com/WileyLabs/data-block-extract.git
创建时间: 2018-10-17T20:49:23Z
项目社区:https://github.com/WileyLabs/data-block-extract

开源协议:MIT License

下载


HTML5 data block extractor

Did you know you can embed “raw” data into an HTML <script> tag? The HTML5
spec calls them data blocks.

They look like this…

  1. <script type="application/ld+json">
  2. {
  3. "@context": "http://schema.org/",
  4. "type": "SoftwareApplication",
  5. "name": "data block extractor",
  6. "alternateName": "data-block-extractor"
  7. }
  8. </script>

This little command-line script takes a URL or a file, and extracts the content
from any data blocks found in that file.

Currently, it just dumps them to standard out and only looks for
application/ld+json, but it’s a start anyway!

Install

For now, you have to npm i -g from inside a clone of this repository. At
somepoint, we’ll get this up on NPM for easier re-use.

However, once you’ve done that, you can run data-block-extractor anywhere to
use this awesomeness!

Usage

  1. $ data-block-extract http://bestbuy.com/

Three results are (currently) found at BestBuy, which
looks like this in the output:

  1. {"@context" : "http://schema.org","@type" : "WebSite","name" : "Best Buy","url" : "http://www.bestbuy.com/"}
  2. {"@context": "http://schema.org","@type": "Organization","name": "Best Buy","url": "http://www.bestbuy.com/","sameAs": ["http://www.facebook.com/bestbuy","https://twitter.com/BestBuy","https://plus.google.com/+BestBuy","https://www.instagram.com/bestbuy/","https://www.youtube.com/user/bestbuy","https://www.linkedin.com/company/best-buy","https://pinterest.com/BestBuy"],"contactPoint": [{"@type": "ContactPoint","telephone": "+1-888-237-8289","contactType": "customer service","contactOption": "TollFree","availableLanguage": ["English","Spanish"]}, {"@type": "ContactPoint","telephone": "+1-888-574-1301","contactType": "credit card support","contactOption": "TollFree","availableLanguage": ["English","Spanish"]}]}
  3. {"@context":"http://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"https://www-ssl.bestbuy.com/","name":"Best Buy"}}]}

At the moment there is no additional parsing, but that’s coming!

License

MIT