topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Wednesday April 24, 2024, 5:38 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - vevola [ switch to compact view ]

Pages: prev1 [2] 3 4 5next
26
Post New Requests Here / [Request] Tell me who said what first!
« on: July 26, 2011, 06:59 AM »
After a great experience with DonationCoder, I'm posting another request.

I have a series of transcribed conversations. Each text file has a series of lines which begin with an initial and a semicolon which correspond to who says what. I would like to see what words are used by one speaker before the the other speaker uses them, as well as other things like frequency and collocation.

So here's an example:

A: So, I really like all those dresses, especially this red and that green thing there.
B: Yeah, the red one is nice.
A: Which one are you gonna buy?
B: I'll get the red one.

Here's what I want to be able to get.

For A:
- [What words said first by A:]
   "red" was said first by A:
- [Collocation first occurrence]
   the first time A: said "red" was in line 1
- [Frequency for A:]
   A: said "red" a total of 1 times
- [Collocation for A:]
   B: said "red" in lines 2, 4
- [Frequency for B:]
   B: repeated "red" a total of 2 times
- [Collocation for B:]
   B: said "red" in line 2

For B:
- [What words said first by B:]
   "one" was said first by B:
- [Collocation first occurrence]
   the first time B: said "one" was in line 2
- [Frequency for B:]
   B: said "one" a total of 2 times
- [Collocation for B:]
   B: said "one" in lines 2, 4
- [Frequency for A:]
   A: repeated "one" a total of 1 times
- [Collocation for A:]
   A: said "one" in line 3

My conversations have 3 speakers though, which might make it trickier.

How I see this happening: If it's possible to isolate all lines which begin with A: or B:, I imagine it's relatively easy to make a word list which includes word frequency and collocation. Then you'd have to compare two of these lists (like A+B, B+C, A+C) and compare the line numbers of the first occurrence in each speaker by seeing which number is smaller (e.g. First occurrence "red": A: line 1; B: line 2 --> 1 is less than 2, hence A: said "red" before B).

Any suggestions? Volunteers? :)


27
@IainB
Yikes! I've been playing around with Qiqqa, but there seems to be a lot of glitches! It's uploading papers even when I asked not to, there's no way to stop any type of operation, and well... I think I'm sticking to Mendeley and skwire's app!

BTW, @skwire I donated some $$ to you. It's not a lot, just a symbolic gesture. I encourage others to donate to coders as well! Thanks!

28
@vevola: You might be interested in this.
I see you use Foxit and the "reference management" programme Mendeley Desktop.
I have recently started using another reference management programme called Qiqqa, having tried Zotero and Mendelay - I found the latter two did not meet my requirements.

Thanks! I'll try it out!

29
You say it's not a polished app, I say it does what I asked for! :)

It would be nice to choose where to save the final results for example, and maybe even keep a record of which files have already been scanned (maybe by having the app look only at files after the soonest modified file from the previous scan - dunno if that makes sense) and just update the results file. Having the possibility to exclude somehow those false-positives from the scan I think would be nice too.

But like I said, it's usable this way, at least for me and Suntsu!

If you ever come to Germany, look me up!


30
Would it be possible to choose where to save the final txt files? Even better, would it be possible to open Explorer with those files highlighted? tia!

ps
would it help make the app better if i sent you some examples of false negatives?

31
@skwire btw, are you going to post it on your software website?

32
kudos to skwire! I'm happy it wasn't just for me!

33
you may be better off using something like File Hound to search through your PDFs.
I'm pretty sure Hound uses pdftotext too.  That probably means it ignores image files as being empty of text, which should at least make the search hands-off.

I'm not sure exactly how it works, but what I've noticed in some of my PDFs: it's a scan that the library makes (image) and then they add some type of "copyright" stamp which is text. So some PDFs would be principally images, but with just some text.

34

EDIT: Actually it is much better than what I wrote. There are false-positives, but probably not 1:3.

I think that unless you have any other tweaks, this may do! Thanks!

35
Thanks again!

Better: 225 vs. 1205

But I also see many false-positives (about 1/3).

36
So I temporarily moved the larger PDFs and did a search of only under 60MB and it worked (didn't try it with the very last version.)

However, the lists produced don't seem reliable: non-searchable 817; searchable 616.

(The ratio should be less).

37
Nope. I still get errors, and indeed eventually it crashes.

The largest of my PDFs are 340MB, 210MB, 140MB, and all the other are under 100MB. Maybe exclude all PDFs over 100MB?

38
Thanks for the update!

I tried running it and after a couple of minutes I get the following error:
Error: Memory limit reached (see #MaxMem in the help file). The current thread
will exit.
Line#
—> 083: Return,str

BTW, the folder I'm searching has about 4GB of PDFs, dunno if that means anything.

EDIT: I thought the program stopped, but it's still running the background. I imagine the error was for a particular file, and not for the entire program. Any ideas of what it is? Will it show up in the final log? We'll see!  ;)
Thanks again!

39
I know how it feels, no worries, and thanks!

40
I use two programs for viewing and handling PDFs. One is a simple viewer and annotator (Foxit), and here you can search for a word or a phrase in a single PDF or all PDFs in a Folder. The other is called Mendeley Desktop, which is more of an "iTunes for PDFs" (made for academia).

But if a PDF is an image, obviously it won't show up in the results. If I have a list, I could OCR and convert them.

So, no, I don't have to open each file one at a time to perform a search. Both programs do an indexing of the texts of the collections.

41
not sure, but something like this might be what you're looking for (?)
http://www.wolosoft.com/en/copypath/

42
Well, I'm a GUI kinda-guy! It would be great to choose which folders I want to search in, and then have a list that I could order in terms of filename or folder so it would be easier to work with...

43
Here's an example of a PDF image (non-searchable text)

Other example are all the downloadable ebooks and pages from Google Books (either the free versions, or using Google Book Downloader for Greasemonkey from http://book.huhiho.com/).

Thanks!

44
I have a huge number of PDFs, and often I perform text searches across my collection. But I just realized: some of these PDFs are images!

I'd like to be able to know which PDFs are images so that I can convert them to searchable text. But there are so many of them in my folder, I'd have to open them one by one and check manually.

Would it be possible to have a little program that would check in certain folders and tell me which PDFs are images (even with a certain degree of certainty)?

TIA!!

45
Hmmm...!

I guess the specific reason why I was looking for something like this was because with all these extensions and addons in Firefox, I was trying to figure out what hotkeys were used (not just by FF but also by the addons), for what, and which ones were not.

thanks for the info guys!

46
I dunno if there's already something like this out there (I wasn't able to find anything), but I was looking for a simple program that would find and list all hotkeys used by programs (Windows, Office, Firefox, etc.) and be able to print them (not necessarily modify).

Anyone?  :-[

47
Thanks! I'd love to see what you have!

I remember there being a really simple tool like this many years ago, but unfortuantely I can't find it anymore...

48
Image Manager Shootout / Re: Xnview vs. Irfan View
« on: February 17, 2009, 05:49 PM »
I'd be interested in hearing your thoughts regarding this topic now, since these posts date back to 2005. Both apps have changed. What do you think now?

49
Mmmm... not really...

So this is the deal (and I'll try to explain myself better): next to the START button, pretend you have two or three windows open. every time you you repeatedly click on one of them the window pops up and minimizes again and again.
So now I want to be able to do the same with my tabs in Firefox (I hope that sort of explained it)...

Any ideas?

50
General Software Discussion / Tagging within a single file
« on: January 27, 2009, 10:58 PM »
I was wondering if there's such a program that would allow me to tag chunks of text, for example, within a single file. I know there are a lot of programs out there that allow you to tag files, but I haven't been able to this. The main idea would be to allow multiple "highlighting" of text chunks through tagwords.

Say I have a text file of some sort and I want to highlight and tag all the verbs, nouns, adjectives; then I want to tag all phrases referring to men, all of them referring to women, etc and so on. At the end I could just look up the tagged chunks for whatever I need.

I hope that made sense! :)

Pages: prev1 [2] 3 4 5next