Refine by Language

Refine by Category

Web Content Extracting Projects


rg3 / youtube-dl

Command-line program to download videos from YouTube.com and other video sites

Python     26959   today


soimort / you-get

⏬ Dumb downloader that scrapes the web

Python     13463   yesterday


codelucas / newspaper

News, full-text, and article metadata extraction in Python 3

Python     4748   8 days ago


grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python

Python     2578   1 months ago


scrapy / scrapely

A pure-python HTML screen-scraping library

Python     1247   1 months ago


buriy / python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

Python     1213   5 months ago


miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.

Python     992   3 months ago


cantino / ruby-readability

Port of arc90's readability project to Ruby

Ruby     811   %d years ago


documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs

Ruby     736   4 months ago


jaimeiniesta / metainspector

Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...

Ruby     734   26 days ago


essence / essence

A library for extracting web media.

PHP     591   24 days ago


bndr / node-read

Get Readable Content from any page. Based on Arc90's readability project using cheerio engine.

JavaScript     589   2 months ago


jeckman / youtube-downloader

PHP script for downloading videos from youtube; also parsing youtube feed into RSS enclosures for podcatchers

PHP     459   2 days ago


datalib / libextract

Extract data from websites using basic statistical magic

Python     430   %d years ago


fent / node-ytdl-core

Youtube downloader in javascript.

JavaScript     428   today


alir3z4 / html2text

Convert HTML to Markdown-formatted text.

Python     380   2 days ago


gottfrois / link_thumbnailer

Ruby gem that generates thumbnail images from a given URL. Much like popular social website with link preview.

Ruby     361   23 days ago


coleifer / micawber

a small library for extracting rich content from urls

Python     350   21 days ago


michaelhelmick / lassie

Web Content Retrieval for Humans™

Python     350   25 days ago


wikiteam / wikiteam

Tools for downloading and preserving wikis

Python     162   3 months ago


mpratt / embera

A Oembed consumer library, that gives you information about urls. It helps you replace urls to youtube or vimeo for example, with their html embed code.

PHP     145   3 months ago


mauricesvay / imageresolver

ImageResolver.js does its best to determine the main image on a URL without loading all images.

JavaScript     121   4 months ago


vinta / haul

An Extensible Image Crawler

Python     96   6 months ago