HTML/XML Processing Projects

cheeriojs / cheerio

Fast, flexible, and lean implementation of core jQuery designed specifically for the server.

JavaScript     11952   yesterday

sparklemotion / nokogiri

Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

Ruby     4368   22 days ago

martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON

Python     2401   2 months ago

jch / html-pipeline

HTML processing filters and utilities

Ruby     1776   15 days ago

inikulin / parse5

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.

JavaScript     1374   1 months ago

leizongmin / js-xss

Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist

JavaScript     1366   2 days ago

mozilla / bleach

An easy, HTML5, whitelisting HTML sanitizer.

Python     1265   today

fb55 / htmlparser2

forgiving html and xml parser

JavaScript     1256   3 months ago

xhtml2pdf / xhtml2pdf

HTML/CSS to PDF converter.

Python     1227   1 months ago

gawel / pyquery

A jquery-like library for python

Python     1181   6 months ago

yorickpeterse / oga

Oga is an XML/HTML parser written in Ruby.

Ruby     1109   8 days ago

mathiasbynens / he

A robust HTML entity encoder/decoder written in JavaScript.

JavaScript     984   3 months ago

lxml / lxml

The lxml XML toolkit for Python

Python     970   2 days ago

technosophos / querypath

QueryPath is a PHP library for manipulating XML and HTML. It is designed to work not only with local files, but also with web services and database resources.

PHP     709   1 months ago

isaacs / sax-js

A sax style parser for JS

JavaScript     694   26 days ago

flavorjones / loofah

HTML/XML manipulation and sanitization based on Nokogiri

Ruby     612   22 days ago

html5lib / html5lib-python

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

Python     587   5 days ago

ohler55 / ox

Ruby Optimized XML Parser

Ruby     536   2 days ago

kurtmckee / feedparser

Parse feeds in Python

Python     446   9 days ago

masterminds / html5-php

An HTML5 parser and serializer for PHP.

PHP     412   19 days ago

stchris / untangle

Converts XML to Python objects

Python     227   2 months ago

empact / roxml

ROXML is a module for binding Ruby classes to XML. It supports custom mapping and bidirectional marshalling between Ruby and XML using annotation-style class methods, via Nokogiri or LibXML.

Ruby     179   %d years ago

pallets / markupsafe

Implements a XML/HTML/XHTML Markup safe string for Python.

Python     156   11 days ago

scrapy / cssselect

working with DOM tree with CSS selectors

Python     148   2 months ago

dam5s / happymapper

Object to XML mapping library, using Nokogiri (Fork from John Nunemaker's Happymapper)

Ruby     98   2 months ago

mbklein / equivalent-xml

Easy equivalency tests for Nokogiri and Oga XML

Ruby     83   3 months ago

matiasb / demiurge

PyQuery-based scraping micro-framework.

Python     53   3 months ago

alir3z4 / python-sanitize

Bringing sanity to world of messed-up data.

Python     30   %d years ago

compileinc / hodor

Simple lxml wrapper group results from structured pages with pagination and grouping 🕷

Python     14   2 months ago