Refine by Language

Refine by Category

Text Data Processing Projects

chriso / validator.js

String validation

JavaScript     7778   7 days ago

openexchangerates / accounting.js

A lightweight JavaScript library for number, money and currency formatting - fully localisable, zero dependencies.

JavaScript     3358   8 days ago

seatgeek / fuzzywuzzy

Fuzzy String Matching in Python

Python     3283   24 days ago

danielstjules / stringy

A PHP string manipulation library with multibyte support

PHP     1827   14 days ago

luminosoinsight / python-ftfy

Given Unicode text, make its representation consistent and possibly less broken.

Python     1695   5 days ago

jprichardson / string.js

Extra JavaScript string methods.

JavaScript     1457   6 months ago

cocur / slugify

Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte.

PHP     1031   29 days ago

samg / diffy

Easy Diffing in Ruby

Ruby     845   1 months ago

dabeaz / ply

Python Lex-Yacc

Python     795   13 days ago

chardet / chardet

Python 2/3 compatible character encoding detector.

Python     734   8 days ago

kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.

Ruby     571   22 days ago

jbroadway / urlify

PHP port of URLify.js from the Django project. Transliterates non-ascii characters for use in URLs.

PHP     510   5 months ago

seamusabshere / fuzzy_match

Find a needle (a document or record) in a haystack using string similarity and (optionally) regular expression rules. Uses Dice's Coefficient (aka Pair Similiarity) and Levenshtein Distance internally.

Ruby     472   4 months ago

lxneng / xpinyin

translate chinese hanzi to pinyin by python

Python     440   3 months ago

jdataview / jbinary

High-level API for working with binary data.

JavaScript     386   10 months ago

un33k / python-slugify

Returns unicode slugs

Python     359   16 days ago

dimka665 / awesome-slugify

Python flexible slugify function

Python     343   3 months ago

ztane / python-levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Python     318   7 months ago

selvinortiz / flux

Fluent regular expressions in PHP

PHP     314   10 months ago

j2a / pytils

Russian-specific string utils

Python     248   3 months ago

mozilla / unicode-slugify

A slugifier that works in unicode

Python     244   2 months ago

mikeemoo / colorjizz-php

ColorJizz is a PHP library for manipulating and converting colors.

PHP     223   4 months ago

kiyoka / fuzzy-string-match

fuzzy string matching library for ruby

Ruby     201   2 months ago

jpmckinney / tf-idf-similarity

Ruby gem to calculate the similarity between texts using tf*idf

Ruby     136   3 months ago

dbalatero / levenshtein-ffi

Fast string edit distance computation, using the Damerau-Levenshtein algorithm.

Ruby     129   %d years ago

moskytw / uniout

Never see escaped bytes in output.

Python     126   3 months ago

sensiolabs / ansi-to-html

An ANSI to HTML5 converter

PHP     106   19 days ago

hoaproject / ustring

The Hoa\Ustring library.

PHP     97   12 days ago

cjheath / treetop

A Ruby-based parsing DSL based on parsing expression grammars.

Ruby     89   10 months ago

colinsurprenant / hotwater

Fast Ruby FFI string edit distance algorithms

Ruby     75   %d years ago

postmodern / raingrams

A flexible and general-purpose ngrams library written in Ruby. Raingrams supports ngram sizes greater than 1, text/non-text grams, multiple parsing styles and open/closed vocabulary models.

Ruby     66   %d years ago

wharris / esmre

Python extension module for accelerating regular expressions using libesm

Python     59   %d years ago

vinta /

Paranoid text spacing in Python

Python     58   %d years ago

nicolas-grekas / patchwork-utf8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

PHP     56   %d years ago

schneems / going_the_distance

Distance Measurements are Awesome!

Ruby     55   8 months ago

kzykhys / text

Text - Simple 1 Class Text Manipulation Library

PHP     46   %d years ago

talyssonoc / commonregexruby

Find a lot of kinds of common information in a string. CommonRegex port for Ruby

Ruby     40   %d years ago

avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror

Python     35   2 months ago

reddavis / tf-idf

Term Frequency - Inverse Document Frequency in Ruby

Ruby     31   %d years ago

reddavis / n-gram

N-Gram generator in Ruby -

Ruby     31   %d years ago

brianhempel / fuzzy_tools

Fuzzy document finding in Ruby

Ruby     15   %d years ago

famished-tiger / rley

An Earley parser written in Ruby

Ruby     13   18 days ago

tkellen / ruby-ngram

Break words and phrases into ngrams.

Ruby     7   %d years ago