Refine by Language

Refine by Category

Web Content Extracting Projects

rg3 / youtube-dl

Command-line program to download videos from and other video sites

Python     28372   today

soimort / you-get

⏬ Dumb downloader that scrapes the web

Python     14238   3 days ago

codelucas / newspaper

News, full-text, and article metadata extraction in Python 3

Python     5003   4 days ago

grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python

Python     2653   3 months ago

scrapy / scrapely

A pure-python HTML screen-scraping library

Python     1274   3 months ago

buriy / python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

Python     1242   6 months ago

miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.

Python     1059   29 days ago

cantino / ruby-readability

Port of arc90's readability project to Ruby

Ruby     810   %d years ago

jaimeiniesta / metainspector

Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...

Ruby     744   3 months ago

documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs

Ruby     740   5 months ago

essence / essence

A library for extracting web media.

PHP     599   3 months ago

bndr / node-read

Get Readable Content from any page. Based on Arc90's readability project using cheerio engine.

JavaScript     589   4 months ago

jeckman / youtube-downloader

PHP script for downloading videos from youtube; also parsing youtube feed into RSS enclosures for podcatchers

PHP     481   23 days ago

fent / node-ytdl-core

Youtube downloader in javascript.

JavaScript     460   today

datalib / libextract

Extract data from websites using basic statistical magic

Python     435   %d years ago

alir3z4 / html2text

Convert HTML to Markdown-formatted text.

Python     393   3 days ago

gottfrois / link_thumbnailer

Ruby gem that generates thumbnail images from a given URL. Much like popular social website with link preview.

Ruby     368   1 months ago

coleifer / micawber

a small library for extracting rich content from urls

Python     359   27 days ago

michaelhelmick / lassie

Web Content Retrieval for Humans™

Python     356   9 days ago

wikiteam / wikiteam

Tools for downloading and preserving wikis

Python     176   2 months ago

mpratt / embera

A Oembed consumer library, that gives you information about urls. It helps you replace urls to youtube or vimeo for example, with their html embed code.

PHP     148   5 months ago

mauricesvay / imageresolver

ImageResolver.js does its best to determine the main image on a URL without loading all images.

JavaScript     122   5 months ago

vinta / haul

An Extensible Image Crawler

Python     101   7 months ago