51
Post New Requests Here / Re: comparing two big different lists of strings/filenames
« Last post by compn on May 17, 2024, 02:23 AM »super crash!
works ok with list1 / list2 / list3
but trying my lists, i get a crash. my lists may have unicode (or just non english codepage) characters in the filenames. i dont need to compare these filenames, and would rather have them ignored.
attached is a sample filename which crashes
running sed "s/[^\x00-\x7F]//g" list.txt
seems to fix the issue and allows me to use the program on my list.
as for matching... well. there are problems. the year has to be matched. but i do like that years are sometimes matched because sometimes movie years will be incorrect from filename to filename. a release date vs a production date or theatrical release date. so i dont know what i want here. i've seen years be off by 3 or even 8 years from what imdb claims. not sure this can be fixed, its an imdb issue
but these didnt match. the following might be useful for fuzzy match testing?
list1
Child's Play [1988]
Child's Play 2 [1990]
Child's Play 3 [1991]
Child's Play Sidney Lumet, 1972
Necronomicon
Necronomicon: Book of Dead [1994]
The Necronomicon [2009]
RiffTrax - The Psychotronic Man (1979) mp4
The Psychotronic Man - The Psychotronic Man 1980 Movie FULL HD (480p_25fps_H264-128kbit_AAC) srt
The Phantom Creeps [1939]
ROTOR_divx avi
list2
01 Child’s Play (1988) 3 Commentaries.mkv
03 Childs Play 3 (1991).mkv
Necronomicon (1993).mkv
Psychotronic Man.mp4
R.O.T.O.R..mkv
Phantom Creeps.mp4
(its why i wanted to ignore words like "the". because sometimes they use the "the" and sometimes "not".
works ok with list1 / list2 / list3
but trying my lists, i get a crash. my lists may have unicode (or just non english codepage) characters in the filenames. i dont need to compare these filenames, and would rather have them ignored.
attached is a sample filename which crashes
running sed "s/[^\x00-\x7F]//g" list.txt
seems to fix the issue and allows me to use the program on my list.
as for matching... well. there are problems. the year has to be matched. but i do like that years are sometimes matched because sometimes movie years will be incorrect from filename to filename. a release date vs a production date or theatrical release date. so i dont know what i want here. i've seen years be off by 3 or even 8 years from what imdb claims. not sure this can be fixed, its an imdb issue
Alice
First list:
Alice [1982]
Alice [1991]
Second list:
Alice (1988).mkv
Alice in Wonderland
First list:
Alice in Wonderland [1999]
Alice in Wonderland [2010]
Second list:
Alice in Wonderland (1903).mkv
All Quiet on the Western Front
First list:
All Quiet on the Western Front [1979]
Second list:
All Quiet on the Western Front (1930).mkv
but these didnt match. the following might be useful for fuzzy match testing?
list1
Child's Play [1988]
Child's Play 2 [1990]
Child's Play 3 [1991]
Child's Play Sidney Lumet, 1972
Necronomicon
Necronomicon: Book of Dead [1994]
The Necronomicon [2009]
RiffTrax - The Psychotronic Man (1979) mp4
The Psychotronic Man - The Psychotronic Man 1980 Movie FULL HD (480p_25fps_H264-128kbit_AAC) srt
The Phantom Creeps [1939]
ROTOR_divx avi
list2
01 Child’s Play (1988) 3 Commentaries.mkv
03 Childs Play 3 (1991).mkv
Necronomicon (1993).mkv
Psychotronic Man.mp4
R.O.T.O.R..mkv
Phantom Creeps.mp4
(its why i wanted to ignore words like "the". because sometimes they use the "the" and sometimes "not".