Refine by Language

Refine by Category

HTML/XML Processing Projects


cheeriojs / cheerio

Fast, flexible, and lean implementation of core jQuery designed specifically for the server.

JavaScript     13099   20 days ago


sparklemotion / nokogiri

Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

Ruby     4510   10 days ago


martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON

Python     2561   4 months ago


jch / html-pipeline

HTML processing filters and utilities

Ruby     1809   4 days ago


leizongmin / js-xss

Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist

JavaScript     1579   today


inikulin / parse5

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.

JavaScript     1458   24 days ago


fb55 / htmlparser2

forgiving html and xml parser

JavaScript     1376   7 months ago


mozilla / bleach

An easy, HTML5, whitelisting HTML sanitizer.

Python     1351   16 days ago


xhtml2pdf / xhtml2pdf

HTML/CSS to PDF converter.

Python     1273   7 days ago


gawel / pyquery

A jquery-like library for python

Python     1262   9 months ago


mathiasbynens / he

A robust HTML entity encoder/decoder written in JavaScript.

JavaScript     1154   3 months ago


yorickpeterse / oga

Oga is an XML/HTML parser written in Ruby.

Ruby     1134   1 months ago


lxml / lxml

The lxml XML toolkit for Python

Python     1037   yesterday


technosophos / querypath

QueryPath is a PHP library for manipulating XML and HTML. It is designed to work not only with local files, but also with web services and database resources.

PHP     738   2 months ago


isaacs / sax-js

A sax style parser for JS

JavaScript     723   23 days ago


flavorjones / loofah

HTML/XML manipulation and sanitization based on Nokogiri

Ruby     627   4 months ago


html5lib / html5lib-python

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

Python     615   3 months ago


ohler55 / ox

Ruby Optimized XML Parser

Ruby     558   11 days ago


kurtmckee / feedparser

Parse feeds in Python

Python     521   1 months ago


masterminds / html5-php

An HTML5 parser and serializer for PHP.

PHP     436   25 days ago


stchris / untangle

Converts XML to Python objects

Python     259   20 days ago


empact / roxml

ROXML is a module for binding Ruby classes to XML. It supports custom mapping and bidirectional marshalling between Ruby and XML using annotation-style class methods, via Nokogiri or LibXML.

Ruby     178   13 days ago


pallets / markupsafe

Implements a XML/HTML/XHTML Markup safe string for Python.

Python     176   12 days ago


scrapy / cssselect

working with DOM tree with CSS selectors

Python     158   14 days ago


dam5s / happymapper

Object to XML mapping library, using Nokogiri (Fork from John Nunemaker's Happymapper)

Ruby     104   1 months ago


mbklein / equivalent-xml

Easy equivalency tests for Nokogiri and Oga XML

Ruby     85   3 months ago


matiasb / demiurge

PyQuery-based scraping micro-framework.

Python     56   7 months ago


alir3z4 / python-sanitize

Bringing sanity to world of messed-up data.

Python     35   %d years ago


compileinc / hodor

Simple lxml wrapper group results from structured pages with pagination and grouping 🕷

Python     16   5 days ago


jurismarches / chopper

Tool we used every day at work. It permit to extract a part of webpage with applied css rules keeping html correct.

Python     13   5 months ago