Refine by Language

Refine by Category

Text Data Processing Projects


chriso / validator.js

String validation

JavaScript     8004   today


openexchangerates / accounting.js

A lightweight JavaScript library for number, money and currency formatting - fully localisable, zero dependencies.

JavaScript     3398   9 days ago


seatgeek / fuzzywuzzy

Fuzzy String Matching in Python

Python     3357   2 months ago


danielstjules / stringy

A PHP string manipulation library with multibyte support

PHP     1873   16 days ago


luminosoinsight / python-ftfy

Given Unicode text, make its representation consistent and possibly less broken.

Python     1727   1 months ago


jprichardson / string.js

Extra JavaScript string methods.

JavaScript     1475   7 months ago


cocur / slugify

Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte.

PHP     1052   6 days ago


samg / diffy

Easy Diffing in Ruby

Ruby     853   yesterday


dabeaz / ply

Python Lex-Yacc

Python     815   2 months ago


chardet / chardet

Python 2/3 compatible character encoding detector.

Python     753   19 days ago


kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.

Ruby     572   9 days ago


jbroadway / urlify

PHP port of URLify.js from the Django project. Transliterates non-ascii characters for use in URLs.

PHP     513   6 months ago


seamusabshere / fuzzy_match

Find a needle (a document or record) in a haystack using string similarity and (optionally) regular expression rules. Uses Dice's Coefficient (aka Pair Similiarity) and Levenshtein Distance internally.

Ruby     481   5 months ago


lxneng / xpinyin

translate chinese hanzi to pinyin by python

Python     445   4 months ago


jdataview / jbinary

High-level API for working with binary data.

JavaScript     394   11 months ago


un33k / python-slugify

Returns unicode slugs

Python     371   2 months ago


dimka665 / awesome-slugify

Python flexible slugify function

Python     342   4 months ago


ztane / python-levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Python     331   8 months ago


selvinortiz / flux

Fluent regular expressions in PHP

PHP     316   12 months ago


j2a / pytils

Russian-specific string utils

Python     249   4 months ago


mozilla / unicode-slugify

A slugifier that works in unicode

Python     244   3 months ago


mikeemoo / colorjizz-php

ColorJizz is a PHP library for manipulating and converting colors.

PHP     225   5 months ago


kiyoka / fuzzy-string-match

fuzzy string matching library for ruby

Ruby     203   3 days ago


jpmckinney / tf-idf-similarity

Ruby gem to calculate the similarity between texts using tf*idf

Ruby     140   4 months ago


dbalatero / levenshtein-ffi

Fast string edit distance computation, using the Damerau-Levenshtein algorithm.

Ruby     130   %d years ago


moskytw / uniout

Never see escaped bytes in output.

Python     128   4 months ago


sensiolabs / ansi-to-html

An ANSI to HTML5 converter

PHP     105   2 months ago


hoaproject / ustring

The Hoa\Ustring library.

PHP     97   2 months ago


cjheath / treetop

A Ruby-based parsing DSL based on parsing expression grammars.

Ruby     89   11 months ago


colinsurprenant / hotwater

Fast Ruby FFI string edit distance algorithms

Ruby     76   %d years ago


postmodern / raingrams

A flexible and general-purpose ngrams library written in Ruby. Raingrams supports ngram sizes greater than 1, text/non-text grams, multiple parsing styles and open/closed vocabulary models.

Ruby     66   %d years ago


wharris / esmre

Python extension module for accelerating regular expressions using libesm

Python     60   %d years ago


vinta / pangu.py

Paranoid text spacing in Python

Python     59   %d years ago


nicolas-grekas / patchwork-utf8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

PHP     57   %d years ago


schneems / going_the_distance

Distance Measurements are Awesome!

Ruby     55   10 months ago


kzykhys / text

Text - Simple 1 Class Text Manipulation Library

PHP     46   %d years ago


talyssonoc / commonregexruby

Find a lot of kinds of common information in a string. CommonRegex port for Ruby

Ruby     42   %d years ago


avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror

Python     38   4 days ago


reddavis / n-gram

N-Gram generator in Ruby - http://en.wikipedia.org/wiki/N-gram

Ruby     31   %d years ago


reddavis / tf-idf

Term Frequency - Inverse Document Frequency in Ruby

Ruby     31   %d years ago


brianhempel / fuzzy_tools

Fuzzy document finding in Ruby

Ruby     15   %d years ago


famished-tiger / rley

An Earley parser written in Ruby

Ruby     14   1 months ago


tkellen / ruby-ngram

Break words and phrases into ngrams.

Ruby     7   %d years ago