topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday April 18, 2024, 6:18 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: [Request] Tell me who said what first!  (Read 9395 times)

vevola

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 104
  • VeVoLa
    • View Profile
    • Donate to Member
[Request] Tell me who said what first!
« on: July 26, 2011, 06:59 AM »
After a great experience with DonationCoder, I'm posting another request.

I have a series of transcribed conversations. Each text file has a series of lines which begin with an initial and a semicolon which correspond to who says what. I would like to see what words are used by one speaker before the the other speaker uses them, as well as other things like frequency and collocation.

So here's an example:

A: So, I really like all those dresses, especially this red and that green thing there.
B: Yeah, the red one is nice.
A: Which one are you gonna buy?
B: I'll get the red one.

Here's what I want to be able to get.

For A:
- [What words said first by A:]
   "red" was said first by A:
- [Collocation first occurrence]
   the first time A: said "red" was in line 1
- [Frequency for A:]
   A: said "red" a total of 1 times
- [Collocation for A:]
   B: said "red" in lines 2, 4
- [Frequency for B:]
   B: repeated "red" a total of 2 times
- [Collocation for B:]
   B: said "red" in line 2

For B:
- [What words said first by B:]
   "one" was said first by B:
- [Collocation first occurrence]
   the first time B: said "one" was in line 2
- [Frequency for B:]
   B: said "one" a total of 2 times
- [Collocation for B:]
   B: said "one" in lines 2, 4
- [Frequency for A:]
   A: repeated "one" a total of 1 times
- [Collocation for A:]
   A: said "one" in line 3

My conversations have 3 speakers though, which might make it trickier.

How I see this happening: If it's possible to isolate all lines which begin with A: or B:, I imagine it's relatively easy to make a word list which includes word frequency and collocation. Then you'd have to compare two of these lists (like A+B, B+C, A+C) and compare the line numbers of the first occurrence in each speaker by seeing which number is smaller (e.g. First occurrence "red": A: line 1; B: line 2 --> 1 is less than 2, hence A: said "red" before B).

Any suggestions? Volunteers? :)


skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #1 on: July 26, 2011, 08:03 AM »
Are the match words ("red" and "one" in your examples) provided by the user? 

vevola

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 104
  • VeVoLa
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #2 on: July 26, 2011, 08:11 AM »
Are the match words ("red" and "one" in your examples) provided by the user? 
Are the match words ("red" and "one" in your examples) provided by the user? 

No, That was just as an example! :)

The text files are a lot longer (about 2000 lines).

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #3 on: July 26, 2011, 08:21 AM »
So you want a report detailing EVERY word in your conversation file?   :huh:

vevola

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 104
  • VeVoLa
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #4 on: July 26, 2011, 10:17 AM »
Every word would be ok too. I'm not sure which words to exclude as of yet, so all words might be easier.
« Last Edit: July 26, 2011, 10:21 AM by vevola »

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #5 on: July 26, 2011, 10:35 AM »
First you say that the words you want stats for (the match words) are not user provided but now you say that all words would be okay, too.  I'm confused.  You sound as if you're not certain you want all words but you've also said you're not going to provide which words to gather data on.  So, how is the program supposed to determine which words to gather data on?

Edvard

  • Coding Snacks Author
  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 3,017
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #6 on: July 27, 2011, 11:28 PM »
Most deposition and courtroom transcribing softwares do this, and can provide an optional alphabetically-sorted word list with page references as an appendix.
They typically exclude conjunctions and the like, e.g. "and", "from", "to", etc.
RealLegal.com's products do this, and produce .PTX files that can be read and printed with the E-Transcript Viewer.

I think that's something like what OP wants, but instead of a static document generator or reader, it should be an interface that allows the user to extract word lists dynamically sortable by first occurrence, speaker, and frequency; perhaps even generate a report based on the sorting criteria.
I'm sure some of the heavyweight legal software like IproTech, CTSummation, Concordance, or CaseMap have methods for this, but they also tend to be huge, expensive, and only useful to lawyers and paralegals.

Skwire, if you can pull this off, I believe you'll be tapping a market bigger than you know...

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #7 on: July 28, 2011, 03:14 AM »
This would be a job that should IMHO be done in a full OO language like Java/JRE or C#/.NET, and it sure would take some time, but it's quite doable.

I'd be curious what the OP is going to be using it for ;) that might influence the implementation.

kyrathaba

  • N.A.N.Y. Organizer
  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 3,200
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #8 on: July 28, 2011, 07:18 AM »
It's basically a string-manipulation/regular-expressions task.  C# is quite capable in this area, but as Ath said, it would take some time to code it.

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #9 on: July 28, 2011, 07:34 AM »
It's basically a string-manipulation/regular-expressions task.

Yep, that's all it really is.  I would daresay that almost any language could handle this.

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: [Request] Tell me who said what first!
« Reply #10 on: July 28, 2011, 03:31 PM »
FWIW, I've been working offline with vevolva regarding this and do have a working prototype.

2011-07-28_153647.png[Request] Tell me who said what first!
« Last Edit: July 28, 2011, 03:38 PM by skwire »