Refine by Language

Refine by Category

Text Data Processing Projects

chriso / validator.js

String validation

JavaScript     8394   yesterday

seatgeek / fuzzywuzzy

Fuzzy String Matching in Python

Python     3518   10 days ago

openexchangerates / accounting.js

A lightweight JavaScript library for number, money and currency formatting - fully localisable, zero dependencies.

JavaScript     3476   today

danielstjules / stringy

A PHP string manipulation library with multibyte support

PHP     1930   2 months ago

luminosoinsight / python-ftfy

Given Unicode text, make its representation consistent and possibly less broken.

Python     1758   3 months ago

jprichardson / string.js

Extra JavaScript string methods.

JavaScript     1500   3 days ago

cocur / slugify

Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte.

PHP     1086   6 days ago

samg / diffy

Easy Diffing in Ruby

Ruby     866   1 months ago

dabeaz / ply

Python Lex-Yacc

Python     848   18 days ago

chardet / chardet

Python 2/3 compatible character encoding detector.

Python     784   2 months ago

kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.

Ruby     590   17 days ago

jbroadway / urlify

PHP port of URLify.js from the Django project. Transliterates non-ascii characters for use in URLs.

PHP     512   8 months ago

seamusabshere / fuzzy_match

Find a needle (a document or record) in a haystack using string similarity and (optionally) regular expression rules. Uses Dice's Coefficient (aka Pair Similiarity) and Levenshtein Distance internally.

Ruby     489   7 months ago

lxneng / xpinyin

translate chinese hanzi to pinyin by python

Python     460   6 months ago

jdataview / jbinary

High-level API for working with binary data.

JavaScript     400   %d years ago

un33k / python-slugify

Returns unicode slugs

Python     383   3 months ago

dimka665 / awesome-slugify

Python flexible slugify function

Python     354   6 months ago

ztane / python-levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Python     349   10 months ago

selvinortiz / flux

Fluent regular expressions in PHP

PHP     320   %d years ago

j2a / pytils

Russian-specific string utils

Python     251   6 months ago

mozilla / unicode-slugify

A slugifier that works in unicode

Python     248   5 months ago

mikeemoo / colorjizz-php

ColorJizz is a PHP library for manipulating and converting colors.

PHP     227   7 months ago

kiyoka / fuzzy-string-match

fuzzy string matching library for ruby

Ruby     206   2 months ago

jpmckinney / tf-idf-similarity

Ruby gem to calculate the similarity between texts using tf*idf

Ruby     142   5 months ago

dbalatero / levenshtein-ffi

Fast string edit distance computation, using the Damerau-Levenshtein algorithm.

Ruby     133   %d years ago

moskytw / uniout

Never see escaped bytes in output.

Python     129   6 months ago

sensiolabs / ansi-to-html

An ANSI to HTML5 converter

PHP     105   4 months ago

hoaproject / ustring

The Hoa\Ustring library.

PHP     101   2 days ago

cjheath / treetop

A Ruby-based parsing DSL based on parsing expression grammars.

Ruby     92   %d years ago

colinsurprenant / hotwater

Fast Ruby FFI string edit distance algorithms

Ruby     76   %d years ago

postmodern / raingrams

A flexible and general-purpose ngrams library written in Ruby. Raingrams supports ngram sizes greater than 1, text/non-text grams, multiple parsing styles and open/closed vocabulary models.

Ruby     67   %d years ago

wharris / esmre

Python extension module for accelerating regular expressions using libesm

Python     66   %d years ago

vinta /

Paranoid text spacing in Python

Python     60   %d years ago

nicolas-grekas / patchwork-utf8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

PHP     59   %d years ago

schneems / going_the_distance

Distance Measurements are Awesome!

Ruby     55   11 months ago

avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror

Python     52   2 months ago

kzykhys / text

Text - Simple 1 Class Text Manipulation Library

PHP     45   %d years ago

talyssonoc / commonregexruby

Find a lot of kinds of common information in a string. CommonRegex port for Ruby

Ruby     44   %d years ago

reddavis / tf-idf

Term Frequency - Inverse Document Frequency in Ruby

Ruby     32   %d years ago

reddavis / n-gram

N-Gram generator in Ruby -

Ruby     31   %d years ago

famished-tiger / rley

An Earley parser written in Ruby

Ruby     16   today

brianhempel / fuzzy_tools

Fuzzy document finding in Ruby

Ruby     15   %d years ago

tkellen / ruby-ngram

Break words and phrases into ngrams.

Ruby     8   %d years ago