Welcome Guest.   Make a donation to an author on the site September 01, 2014, 08:55:45 AM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
The N.A.N.Y. Challenge 2014! Download dozens of custom programs!
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: [Request] Tell me who said what first!  (Read 2960 times)
vevola
Charter Member
***
Posts: 88


VeVoLa

View Profile Give some DonationCredits to this forum member
« on: July 26, 2011, 06:59:58 AM »

After a great experience with DonationCoder, I'm posting another request.

I have a series of transcribed conversations. Each text file has a series of lines which begin with an initial and a semicolon which correspond to who says what. I would like to see what words are used by one speaker before the the other speaker uses them, as well as other things like frequency and collocation.

So here's an example:

Quote
A: So, I really like all those dresses, especially this red and that green thing there.
B: Yeah, the red one is nice.
A: Which one are you gonna buy?
B: I'll get the red one.

Here's what I want to be able to get.

For A:
- [What words said first by A:]
   "red" was said first by A:
- [Collocation first occurrence]
   the first time A: said "red" was in line 1
- [Frequency for A:]
   A: said "red" a total of 1 times
- [Collocation for A:]
   B: said "red" in lines 2, 4
- [Frequency for B:]
   B: repeated "red" a total of 2 times
- [Collocation for B:]
   B: said "red" in line 2

For B:
- [What words said first by B:]
   "one" was said first by B:
- [Collocation first occurrence]
   the first time B: said "one" was in line 2
- [Frequency for B:]
   B: said "one" a total of 2 times
- [Collocation for B:]
   B: said "one" in lines 2, 4
- [Frequency for A:]
   A: repeated "one" a total of 1 times
- [Collocation for A:]
   A: said "one" in line 3

My conversations have 3 speakers though, which might make it trickier.

How I see this happening: If it's possible to isolate all lines which begin with A: or B:, I imagine it's relatively easy to make a word list which includes word frequency and collocation. Then you'd have to compare two of these lists (like A+B, B+C, A+C) and compare the line numbers of the first occurrence in each speaker by seeing which number is smaller (e.g. First occurrence "red": A: line 1; B: line 2 --> 1 is less than 2, hence A: said "red" before B).

Any suggestions? Volunteers? smiley

Logged
skwire
Moderator
*****
Posts: 4,036



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #1 on: July 26, 2011, 08:03:07 AM »

Are the match words ("red" and "one" in your examples) provided by the user? 
Logged

vevola
Charter Member
***
Posts: 88


VeVoLa

View Profile Give some DonationCredits to this forum member
« Reply #2 on: July 26, 2011, 08:11:10 AM »

Are the match words ("red" and "one" in your examples) provided by the user? 
Are the match words ("red" and "one" in your examples) provided by the user? 

No, That was just as an example! smiley

The text files are a lot longer (about 2000 lines).
Logged
skwire
Moderator
*****
Posts: 4,036



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #3 on: July 26, 2011, 08:21:04 AM »

So you want a report detailing EVERY word in your conversation file?   huh
Logged

vevola
Charter Member
***
Posts: 88


VeVoLa

View Profile Give some DonationCredits to this forum member
« Reply #4 on: July 26, 2011, 10:17:17 AM »

Every word would be ok too. I'm not sure which words to exclude as of yet, so all words might be easier.
« Last Edit: July 26, 2011, 10:21:20 AM by vevola » Logged
skwire
Moderator
*****
Posts: 4,036



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #5 on: July 26, 2011, 10:35:32 AM »

First you say that the words you want stats for (the match words) are not user provided but now you say that all words would be okay, too.  I'm confused.  You sound as if you're not certain you want all words but you've also said you're not going to provide which words to gather data on.  So, how is the program supposed to determine which words to gather data on?
Logged

Edvard
Coding Snacks Author
Charter Honorary Member
***
Posts: 2,535



View Profile Give some DonationCredits to this forum member
« Reply #6 on: July 27, 2011, 11:28:58 PM »

Most deposition and courtroom transcribing softwares do this, and can provide an optional alphabetically-sorted word list with page references as an appendix.
They typically exclude conjunctions and the like, e.g. "and", "from", "to", etc.
RealLegal.com's products do this, and produce .PTX files that can be read and printed with the E-Transcript Viewer.

I think that's something like what OP wants, but instead of a static document generator or reader, it should be an interface that allows the user to extract word lists dynamically sortable by first occurrence, speaker, and frequency; perhaps even generate a report based on the sorting criteria.
I'm sure some of the heavyweight legal software like IproTech, CTSummation, Concordance, or CaseMap have methods for this, but they also tend to be huge, expensive, and only useful to lawyers and paralegals.

Skwire, if you can pull this off, I believe you'll be tapping a market bigger than you know...
Logged

All children left unattended will be given a mocha and a puppy.
Ath
Supporting Member
**
Posts: 2,201



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #7 on: July 28, 2011, 03:14:12 AM »

This would be a job that should IMHO be done in a full OO language like Java/JRE or C#/.NET, and it sure would take some time, but it's quite doable.

I'd be curious what the OP is going to be using it for Wink that might influence the implementation.
Logged

kyrathaba
N.A.N.Y. Organizer
Honorary Member
**
Posts: 3,010



while(! dead_horse){beat}

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #8 on: July 28, 2011, 07:18:11 AM »

It's basically a string-manipulation/regular-expressions task.  C# is quite capable in this area, but as Ath said, it would take some time to code it.
Logged

Win 7 Home Premium 64bit-SP1 AMD Athlon II X2 220 Socket AM3 (938) @ 2.1GHz 6GB RAM Firefox 26.0
_________________________________________________________________________________________

I'm fighting against patent trolls. Join me and tell your representative to support the #SHIELDAct: https://eff.org/r.b6JJ /via @EFF

My DC page: http://kyrathaba.dcmembers.com | My blog: http://williambryanmiller.com/ | Proofreading Service: http://bit.ly/1fQSqQP

skwire
Moderator
*****
Posts: 4,036



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #9 on: July 28, 2011, 07:34:45 AM »

It's basically a string-manipulation/regular-expressions task.

Yep, that's all it really is.  I would daresay that almost any language could handle this.
Logged

skwire
Moderator
*****
Posts: 4,036



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #10 on: July 28, 2011, 03:31:02 PM »

FWIW, I've been working offline with vevolva regarding this and do have a working prototype.

« Last Edit: July 28, 2011, 03:38:35 PM by skwire » Logged

Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.039s | Server load: 0.02 ]