ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Post New Requests Here

IDEA: Calculate string-similarity

(1/3) > >>

HelmutWe:
Not a new idea, certainly. And I doubt, whether this can be coded in a few hours. The input would be any two strings. The output a number, e.g. between 0 (zero) and 1 (one). Lots of possibilities for commercial use. My idea on how to do this is very old, more than 20 years, but my coding abilities are limited. And the power of computers was very limited too, when I was trying to code my method in VB 6.0. There are n! x m! calculations to be done. n = length of string 1, m = length of string 2.
I started this when I was studying phonetics and came across neurolinguistics. Expandable into many more areas than just alphabetical languages. But that would be a start. Anyone interested?

Ath:
I'm triggered, but I don't quite understand yet how/what to calculate giving a result between 0 and 1 :huh:

For string-similarity, a technique called 'soundex' was popular, a few decades ago, but I'm not sure if that's what you're looking for?

Ath:
I've been reading up on the subject of string similarity, and it seems that the Jaro-Winkler distance/proximity algorithmw is something that would fit here.

I'll post a tool in a short while.

HelmutWe:
@Both of you: The only method I have tested so far was the Lowenstein-distance. Yet my method seemed much better to me, though only in principle. The disadvantage is the extraordinary number of calculations necessary. It may be a primitive way of pattern recognition combined with very limited abilities in coding. Nevertheless, IÂ´ll try to sum up my thouths and post later what it is all about. May last quite some time as I am not a native speaker nor writer of English.

Ath:
Well, here's version 1.0.0 of StringSimilarity. It supports both Lowenstein (distance only, integer value) (Damerauâ€“Levenshtein as it's called in the interwebs) and Jaro-Winkler (distance and proximity, between 0 and 1) algorithms, with optional non-case-sensitive comparison (works for Jaro-Winkler only at the moment).

I'll be posting this sooner or later, and in a somewhat more polished form, as my NANY 2019 entry, for now it's still a bit rough around the edges.

Feedback highly appreciated.

Requirements:
- .NET runtime 4.5.2 or newer (Windows 7 with SP1, Windows 8.1 and Windows 10 should all provide that)

Installation:
- Unpack the zip file in a non-OS protected folder (not in Program Files or Windows directories that is)
- Run the .exe

Uninstallation:
- Close the application
- Remove all files

/Edit:
Disclaimer:
- Algorithms sourced on the interwebs