1
Post New Requests Here / comparing two big different lists of strings/filenames
« Last post by compn on Today at 12:26 AM »i have a list of filenames, and my friend has a list of filenames. we are trying to organize and sort and compare our lists of filenames. but the two lists are too different for most string comparing tools.
they have different filenames even if the files themselves are the same. some have spaces "the quick brown fox" while the other has . "the.quick.brown.fox". some have extra words that dont have to be compared and should be ignored, such as "the" "webrip" "dvdrip" ".avi" ".mkv" ".mp4" "x264" "xvid". said another way, an exclusion list of words to ignore.
these lists are just examples of the kinds of filenames we want to compare. i just grabbed these lists at random from the internet.
list3 has the year at the start of the string, making it more complicated to compare it to strings where the year is later in the string.
e.g.
airplane - (1977)
should match
1977.airplane
on the plus side, the strings being titles, will generally have the words in the same order. so for my case, it will always be comparing the same words in the same order.
example:
big trouble in little china - trailer.mp4
would 90% match to
big.trouble.in.little.china - review.mp4
1984 - big trouble in little china - interview.mp4
(best movie ever) dir john carpenter - big trouble in little china- subscribe-to-my-youtube-channel (xxexamplereviewsxx)
and it would match maybe 30% to
big trouble 2002 tim allen.mp4
and it would match also with maybe 10% (just "trouble")
The Trouble with Time Travel
an option to ignore any partial matches under 10% score or missing words? or maybe bonus percentage points for the words in the right order "big trouble in little china" but a lower score (or exclusion of a match) for "china has little trouble in big birthrates" because the strings arent in the same order?
for how the tool should look, i assume the master list on the left and then the compare list on the right . maybe with color codes on matched strings. with real time editing and addition/apply of words to ignore . i'll try to come up with a mock screenshot , but if you have a better idea i'm all for it.
option to copy and paste the lists in, or select .txt files.
as for output, ability to clear matched strings from both sides or only one side. ability to save non-matched strings on left/right or both lists, or only matched strings left/right or both.
they have different filenames even if the files themselves are the same. some have spaces "the quick brown fox" while the other has . "the.quick.brown.fox". some have extra words that dont have to be compared and should be ignored, such as "the" "webrip" "dvdrip" ".avi" ".mkv" ".mp4" "x264" "xvid". said another way, an exclusion list of words to ignore.
these lists are just examples of the kinds of filenames we want to compare. i just grabbed these lists at random from the internet.
list3 has the year at the start of the string, making it more complicated to compare it to strings where the year is later in the string.
e.g.
airplane - (1977)
should match
1977.airplane
on the plus side, the strings being titles, will generally have the words in the same order. so for my case, it will always be comparing the same words in the same order.
example:
big trouble in little china - trailer.mp4
would 90% match to
big.trouble.in.little.china - review.mp4
1984 - big trouble in little china - interview.mp4
(best movie ever) dir john carpenter - big trouble in little china- subscribe-to-my-youtube-channel (xxexamplereviewsxx)
and it would match maybe 30% to
big trouble 2002 tim allen.mp4
and it would match also with maybe 10% (just "trouble")
The Trouble with Time Travel
an option to ignore any partial matches under 10% score or missing words? or maybe bonus percentage points for the words in the right order "big trouble in little china" but a lower score (or exclusion of a match) for "china has little trouble in big birthrates" because the strings arent in the same order?
for how the tool should look, i assume the master list on the left and then the compare list on the right . maybe with color codes on matched strings. with real time editing and addition/apply of words to ignore . i'll try to come up with a mock screenshot , but if you have a better idea i'm all for it.
option to copy and paste the lists in, or select .txt files.
as for output, ability to clear matched strings from both sides or only one side. ability to save non-matched strings on left/right or both lists, or only matched strings left/right or both.