Text Data Processing Projects

chriso / validator.js

String validation

JavaScript     7590   20 days ago

openexchangerates / accounting.js

A lightweight JavaScript library for number, money and currency formatting - fully localisable, zero dependencies.

JavaScript     3325   2 days ago

seatgeek / fuzzywuzzy

Fuzzy String Matching in Python

Python     3208   1 months ago

danielstjules / stringy

A PHP string manipulation library with multibyte support

PHP     1807   5 days ago

luminosoinsight / python-ftfy

Given Unicode text, make its representation consistent and possibly less broken.

Python     1682   2 days ago

jprichardson / string.js

Extra JavaScript string methods.

JavaScript     1446   5 months ago

cocur / slugify

Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte.

PHP     1012   yesterday

samg / diffy

Easy Diffing in Ruby

Ruby     839   10 days ago

dabeaz / ply

Python Lex-Yacc

Python     779   13 days ago

chardet / chardet

Python 2/3 compatible character encoding detector.

Python     718   yesterday

kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.

Ruby     568   23 days ago

jbroadway / urlify

PHP port of URLify.js from the Django project. Transliterates non-ascii characters for use in URLs.

PHP     507   4 months ago

seamusabshere / fuzzy_match

Find a needle (a document or record) in a haystack using string similarity and (optionally) regular expression rules. Uses Dice's Coefficient (aka Pair Similiarity) and Levenshtein Distance internally.

Ruby     468   3 months ago

lxneng / xpinyin

translate chinese hanzi to pinyin by python

Python     431   2 months ago

jdataview / jbinary

High-level API for working with binary data.

JavaScript     383   9 months ago

un33k / python-slugify

Returns unicode slugs

Python     352   today

dimka665 / awesome-slugify

Python flexible slugify function

Python     337   2 months ago

selvinortiz / flux

Fluent regular expressions in PHP

PHP     315   10 months ago

ztane / python-levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Python     310   6 months ago

mozilla / unicode-slugify

A slugifier that works in unicode

Python     245   27 days ago

j2a / pytils

Russian-specific string utils

Python     245   2 months ago

mikeemoo / colorjizz-php

ColorJizz is a PHP library for manipulating and converting colors.

PHP     224   3 months ago

kiyoka / fuzzy-string-match

fuzzy string matching library for ruby

Ruby     199   1 months ago

jpmckinney / tf-idf-similarity

Ruby gem to calculate the similarity between texts using tf*idf

Ruby     131   2 months ago

dbalatero / levenshtein-ffi

Fast string edit distance computation, using the Damerau-Levenshtein algorithm.

Ruby     128   %d years ago

moskytw / uniout

Never see escaped bytes in output.

Python     125   2 months ago

sensiolabs / ansi-to-html

An ANSI to HTML5 converter

PHP     106   %d years ago

hoaproject / ustring

The Hoa\Ustring library.

PHP     96   today

cjheath / treetop

A Ruby-based parsing DSL based on parsing expression grammars.

Ruby     89   9 months ago

colinsurprenant / hotwater

Fast Ruby FFI string edit distance algorithms

Ruby     75   %d years ago

postmodern / raingrams

A flexible and general-purpose ngrams library written in Ruby. Raingrams supports ngram sizes greater than 1, text/non-text grams, multiple parsing styles and open/closed vocabulary models.

Ruby     66   %d years ago

vinta /

Paranoid text spacing in Python

Python     57   %d years ago

wharris / esmre

Python extension module for accelerating regular expressions using libesm

Python     56   %d years ago

nicolas-grekas / patchwork-utf8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

PHP     56   11 months ago

schneems / going_the_distance

Distance Measurements are Awesome!

Ruby     54   8 months ago

kzykhys / text

Text - Simple 1 Class Text Manipulation Library

PHP     45   %d years ago

talyssonoc / commonregexruby

Find a lot of kinds of common information in a string. CommonRegex port for Ruby

Ruby     40   %d years ago

reddavis / tf-idf

Term Frequency - Inverse Document Frequency in Ruby

Ruby     31   %d years ago

reddavis / n-gram

N-Gram generator in Ruby -

Ruby     30   %d years ago

avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror

Python     29   1 months ago

brianhempel / fuzzy_tools

Fuzzy document finding in Ruby

Ruby     15   %d years ago

tkellen / ruby-ngram

Break words and phrases into ngrams.

Ruby     7   %d years ago

famished-tiger / rley

An Earley parser written in Ruby

Ruby     5   9 days ago