Refine by Language

Refine by Category

Web Content Extracting Projects

rg3 / youtube-dl

Command-line program to download videos from and other video sites

Python     26221   today

soimort / you-get

⏬ Dumb downloader that scrapes the web

Python     13079   today

codelucas / newspaper

News, full-text, and article metadata extraction in Python 3

Python     4632   8 days ago

grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python

Python     2511   7 days ago

scrapy / scrapely

A pure-python HTML screen-scraping library

Python     1224   2 months ago

buriy / python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

Python     1191   3 months ago

miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.

Python     957   1 months ago

cantino / ruby-readability

Port of arc90's readability project to Ruby

Ruby     809   %d years ago

documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs

Ruby     734   3 months ago

jaimeiniesta / metainspector

Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...

Ruby     730   3 months ago

bndr / node-read

Get Readable Content from any page. Based on Arc90's readability project using cheerio engine.

JavaScript     586   1 months ago

essence / essence

A library for extracting web media.

PHP     585   8 months ago

jeckman / youtube-downloader

PHP script for downloading videos from youtube; also parsing youtube feed into RSS enclosures for podcatchers

PHP     448   today

datalib / libextract

Extract data from websites using basic statistical magic

Python     420   %d years ago

fent / node-ytdl-core

Youtube downloader in javascript.

JavaScript     398   today

alir3z4 / html2text

Convert HTML to Markdown-formatted text.

Python     369   today

gottfrois / link_thumbnailer

Ruby gem that generates thumbnail images from a given URL. Much like popular social website with link preview.

Ruby     359   2 months ago

michaelhelmick / lassie

Web Content Retrieval for Humans™

Python     347   4 months ago

coleifer / micawber

a small library for extracting rich content from urls

Python     343   13 days ago

wikiteam / wikiteam

Tools for downloading and preserving wikis

Python     152   2 months ago

mpratt / embera

A Oembed consumer library, that gives you information about urls. It helps you replace urls to youtube or vimeo for example, with their html embed code.

PHP     143   2 months ago

mauricesvay / imageresolver

ImageResolver.js does its best to determine the main image on a URL without loading all images.

JavaScript     117   3 months ago

vinta / haul

An Extensible Image Crawler

Python     90   4 months ago