topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday March 29, 2024, 6:11 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: IDEA: Calculate string-similarity  (Read 6682 times)

HelmutWe

  • Supporting Member
  • Joined in 2018
  • **
  • Posts: 62
    • View Profile
    • Read more about this member.
    • Donate to Member
IDEA: Calculate string-similarity
« on: October 24, 2018, 03:41 AM »
Not a new idea, certainly. And I doubt, whether this can be coded in a few hours. The input would be any two strings. The output a number, e.g. between 0 (zero) and 1 (one). Lots of possibilities for commercial use. My idea on how to do this is very old, more than 20 years, but my coding abilities are limited. And the power of computers was very limited too, when I was trying to code my method in VB 6.0. There are n! x m! calculations to be done. n = length of string 1, m = length of string 2.
I started this when I was studying phonetics and came across neurolinguistics. Expandable into many more areas than just alphabetical languages. But that would be a start. Anyone interested?
And the best that you can hope for is to die in your sleep (Schlitz/Rogers)

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #1 on: October 24, 2018, 06:18 AM »
I'm triggered, but I don't quite understand yet how/what to calculate giving a result between 0 and 1 :huh:

For string-similarity, a technique called 'soundex' was popular, a few decades ago, but I'm not sure if that's what you're looking for?

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #2 on: October 24, 2018, 07:36 AM »
I've been reading up on the subject of string similarity, and it seems that the Jaro-Winkler distance/proximity algorithmw is something that would fit here.

I'll post a tool in a short while.

HelmutWe

  • Supporting Member
  • Joined in 2018
  • **
  • Posts: 62
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #3 on: October 24, 2018, 09:24 AM »
@Both of you: The only method I have tested so far was the Lowenstein-distance. Yet my method seemed much better to me, though only in principle. The disadvantage is the extraordinary number of calculations necessary. It may be a primitive way of pattern recognition combined with very limited abilities in coding. Nevertheless, I´ll try to sum up my thouths and post later what it is all about. May last quite some time as I am not a native speaker nor writer of English.
And the best that you can hope for is to die in your sleep (Schlitz/Rogers)

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #4 on: October 24, 2018, 01:48 PM »
Well, here's version 1.0.0 of StringSimilarity. It supports both Lowenstein (distance only, integer value) (Damerau–Levenshtein as it's called in the interwebs) and Jaro-Winkler (distance and proximity, between 0 and 1) algorithms, with optional non-case-sensitive comparison (works for Jaro-Winkler only at the moment).

I'll be posting this sooner or later, and in a somewhat more polished form, as my NANY 2019 entry, for now it's still a bit rough around the edges.

Feedback highly appreciated.

Requirements:
- .NET runtime 4.5.2 or newer (Windows 7 with SP1, Windows 8.1 and Windows 10 should all provide that)

Installation:
- Unpack the zip file in a non-OS protected folder (not in Program Files or Windows directories that is)
- Run the .exe

Uninstallation:
- Close the application
- Remove all files

/Edit:
Disclaimer:
- Algorithms sourced on the interwebs
« Last Edit: October 25, 2018, 03:04 AM by Ath »

HelmutWe

  • Supporting Member
  • Joined in 2018
  • **
  • Posts: 62
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #5 on: October 24, 2018, 02:43 PM »
Sorry. None of my three protection systems lets that pass.
And the best that you can hope for is to die in your sleep (Schlitz/Rogers)

KodeZwerg

  • Honorary Member
  • Joined in 2018
  • **
  • Posts: 718
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #6 on: October 24, 2018, 03:55 PM »

KodeZwerg

  • Honorary Member
  • Joined in 2018
  • **
  • Posts: 718
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #7 on: October 24, 2018, 03:58 PM »
Feedback highly appreciated.
From programmers point of view:  :Thmbsup: :Thmbsup: :Thmbsup:
My own tries with hash values went all wrong.

If open source, i like to have a look  :D

HelmutWe

  • Supporting Member
  • Joined in 2018
  • **
  • Posts: 62
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #8 on: October 24, 2018, 07:17 PM »
My tools may be wrong. In fact, there were only two out of three. Windows Defender and Malwarebytes. Kaspersky did not intervene. But in Malwarebytes I found no option to let it pass. At least not immediately. Thank you for your interest. :)
And the best that you can hope for is to die in your sleep (Schlitz/Rogers)

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #9 on: October 25, 2018, 03:12 AM »
Just added a disclaimer I should have included earlier :-[
Searched and found the algorithms and included them in a C#/WinForms app (would have liked to do it in Java, but creating a GUI in/with Java is such a PITA).

Doesn't do anything other than calculate the values, no internet related stuff, no file-I/O (yet, have to add saving/loading settings & position on screen).

KodeZwerg

  • Honorary Member
  • Joined in 2018
  • **
  • Posts: 718
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #10 on: October 25, 2018, 04:18 AM »
If aint open source, can you please release another copy with two bigger text boxes where i can enter/paste multiple lines of text in to check them.
(like two small notepad views that can be compared)
Plus maby able to add option to check two files?

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #11 on: October 25, 2018, 05:43 AM »
two bigger text boxes where i can enter/paste multiple lines of text
add option to check two files
I'll put that on the TODO list for the NANY release.

Also I'll check the source licenses, and at least link to them, later.

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: IDEA: Calculate string-similarity
« Reply #12 on: November 11, 2018, 05:24 AM »
I have now released this for NANY 2019, over here