Refine by Language

Refine by Category

Web Content Extracting Projects

rg3 / youtube-dl

Command-line program to download videos from and other video sites

Python     26959   today

soimort / you-get

⏬ Dumb downloader that scrapes the web

Python     13463   yesterday

codelucas / newspaper

News, full-text, and article metadata extraction in Python 3

Python     4748   8 days ago

grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python

Python     2578   1 months ago

scrapy / scrapely

A pure-python HTML screen-scraping library

Python     1247   1 months ago

buriy / python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

Python     1213   5 months ago

miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.

Python     992   3 months ago

cantino / ruby-readability

Port of arc90's readability project to Ruby

Ruby     811   %d years ago

documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs

Ruby     736   4 months ago

jaimeiniesta / metainspector

Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...

Ruby     734   26 days ago

essence / essence

A library for extracting web media.

PHP     591   24 days ago

bndr / node-read

Get Readable Content from any page. Based on Arc90's readability project using cheerio engine.

JavaScript     589   2 months ago

jeckman / youtube-downloader

PHP script for downloading videos from youtube; also parsing youtube feed into RSS enclosures for podcatchers

PHP     459   2 days ago

datalib / libextract

Extract data from websites using basic statistical magic

Python     430   %d years ago

fent / node-ytdl-core

Youtube downloader in javascript.

JavaScript     428   today

alir3z4 / html2text

Convert HTML to Markdown-formatted text.

Python     380   2 days ago

gottfrois / link_thumbnailer

Ruby gem that generates thumbnail images from a given URL. Much like popular social website with link preview.

Ruby     361   23 days ago

coleifer / micawber

a small library for extracting rich content from urls

Python     350   21 days ago

michaelhelmick / lassie

Web Content Retrieval for Humans™

Python     350   25 days ago

wikiteam / wikiteam

Tools for downloading and preserving wikis

Python     162   3 months ago

mpratt / embera

A Oembed consumer library, that gives you information about urls. It helps you replace urls to youtube or vimeo for example, with their html embed code.

PHP     145   3 months ago

mauricesvay / imageresolver

ImageResolver.js does its best to determine the main image on a URL without loading all images.

JavaScript     121   4 months ago

vinta / haul

An Extensible Image Crawler

Python     96   6 months ago