Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • December 05, 2016, 02:40:31 PM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: IDEA: get info from webpage into a text file - I'll give 10 dollars for this  (Read 3609 times)

DRCross

  • Participant
  • Joined in 2006
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Hey all,

I've got an easy project I would like to get done but I'm having trouble implementing it. I'll donate $10 (negotiable) to whomever helps me.

There is a webpage (www.bebo.com if you know it) and I want to get friends names from my friends profiles from the page and stored into a text file. (Im going to use this data to create some graphs of showing friends connections from it. I have the graphing tools ready to go). The webpage has 10 friends per page and the friends name is in a format: <friendsname> so all the script has to do is to start at the top of the page, working its way down copying data between the < >'s and appending it to a file.

The text file should be something like this:

<UserName>          // The profile owners name. His friends are on the following lines.
<friendname1>
<friendname2>
<friendname3>      //   ...and so on....
         


easy huh?!

The way I was thinking of Implementing it is using find in firefox to search for "<" once it finds the first "<" it "control + right-arrows" until it finds the ">" and appends this to a file. It does this process 9 times and waits for the user to change to the next friends page.

If you are interested, there is a tiny bit more to that that could can help me out (very very basic formatting) with and the money's yours.

DC

« Last Edit: April 07, 2006, 08:51:27 AM by brotherS »

noth(a)nk.you

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 78
  • More than meets the eye.
    • View Profile
    • Donate to Member
Here's some stupid C++ that should get it done:

Source
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

string parseline( string& line );

int main()
{
    ifstream source( "source.txt", ios::in );    //source file
    ofstream parsed( "parsed.txt", ios::out );   //output file

    if( !parsed || !source )                     //checks existance
    {
         cerr << "File wasn't opened\n";
         exit(1);
    }

    cout << "Processing..." << endl;

    string line, parsedline;                     //temporary variables

    while( !source.eof() )                       //stops at end of file
    {
        getline( source, line );                 //gets a line at a time
        parsedline = parseline( line );          //parses it
        parsed << parsedline << endl;            //then writes to parsed.txt
    }

    source.close();                              //wrap it up!
    parsed.close();

    cout << "done!" << endl;
    //system("Pause");
}

string parseline( string& line )                 //parses a line passed
{
       string lineout = "";                      //will be the output
       int i = 0;                                //counter
       bool writeflag = 0;                       //controls stop and start write
       while( i < line.length() )
       {
              if ( writeflag )                   //if we should be writing
              {
                   lineout += line[i];           //append each character
                   if ( line[++i] == 62 )        //if the character is a '>'
                      return lineout;            //stop writing
              }
              else if( line[i++] == 60 )         //if the character is a '<'
                   writeflag = true;             //start writing
       }
       return lineout;
}


Try it on a copy of your data--this is my first time writing a program like this  ;)

Edit: I should clarify, all the language in that code gets confusing.  Put the "tagged" info into a file source.txt and create an empty file parsed.txt -- both in the same directory as the program.  Then run it, and check parsed.txt to see if that's what you wanted!
« Last Edit: April 07, 2006, 09:41:13 AM by noth(a)nk.you »

noth(a)nk.you

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 78
  • More than meets the eye.
    • View Profile
    • Donate to Member
More clarification.

I'm thinking that you'd copy the info into a text file, source.txt, like this example:
source.txt
<UserName1>
<friendname1>
<friendname2>
<friendname3>

Hello world!

<UserName2>
<friendname1>
<friendname2> anything else on the line
<friendname3> <even in tags>

----

<UserName3>
<friendname1>
<friendname2>
<friendname3>


Running this program will then replace whatever parsed.txt you have in the same directory with
parsed.txt
UserName1
friendname1
friendname2
friendname3



UserName2
friendname1
friendname2
friendname3



UserName3
friendname1
friendname2
friendname3


How's that sound?

DRCross

  • Participant
  • Joined in 2006
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Good stuff mate, thanks a lot. I hadnt figured you'd do it in c++. It was then I realised I had posted my request on the wrong site, it was meant to go to a forum on autohotkey.com. But how and ever, I like your solution and I know a tiny bit about c++. These are some small issues that remain. Please excuse my stupidity but how would I set the output in the parsed.txt to equal:

edge:{source: "UserName" target: "FriendName1"}


This probably means setting the Username to a variable for the duration of the time it takes to process one page. The Username is always the first name on the page. Below is a copy of one page of the format that the source.txt will look like. And will probably help you understand what Im talking about.

Spoiler
bebo   
          bobbydays    Search    My Account    Sign Out         
   Homepage   My Friends   Add Friends   Mail   School   College   
   All   White Board   Photos   Blog      Friends      Quiz   Comments

Dylan Cross <dylan.cross>
Page 1 of 4     1  2  3  4  Next >>>

   
Geraldine O'Connell <gerbeenie>
Female, 21
Hometown Rathfarnham
 
Send a Message »
   
At th mo im in college, 3rd yr Clinical Measurement, DIT Kevin St. On placement now in gross hospitals! saggy boobs. Fellow Clins will know what I'm talkin bout!

   
Laura Boland <laura-boland>
Female, 21
 
 
Send a Message »
   

   
Will Conlon <willconlon>
Male
Hometown Kildare (in general) and occasionally Galway
 
Send a Message »
   
Jesus died and rose from the dead in 3 days. It took Jack Bauer less than an hour. And he's done it twice.

   
Dave . <davkavo2>
Male, 21
Hometown Clonee, Meath
 
Send a Message »
   
Here i am

   
Sarah Dalton <sadge1>
Female, 20
Hometown kildare
 
Send a Message »
   
.....yes sir i can hardcore!....

   
Greaney Clare <greaneyc>
Female, 21
Hometown Ballygreaney ( and yes it is a place)
 
Send a Message »
   
Only Known as Greaney to most, infact to most i dont even have a first name ( have yet to decide if that is a good thing, most famous people are only known by one name so i think ill go with its a...

   
Danilo Vukanic <mr-sassy-pants>
Male
 
 
Send a Message »
   
Jesus titty fuckin christ!

   
Eanna O'Shea <eannaoshea>
Male, 22
Hometown Baronstown West
 
Send a Message »
   
Here's a kiss! be thankful your gettin one, ya ug-o! Well, i'm fourth year of graphic design in Mountjoy square. Just finished thesis....woohoo. I love to travel, drink, listen to music, blah, sp...

   
Michele Macari <micstallion>
Male, 23
Hometown dunlaoghaire
 
Send a Message »
   

   
Niamh O d <niamhie21>
Female, 21
 
 
Send a Message »
   
im so not interesting, you should just leave my page right now. i said go. now. anyway you all know me, piss head student, bar worker, rent a secretary, student neurophysiologial technolgest (for...
Page 1 of 4     1  2  3  4  Next >>>

NOTE: Some of Dylan friends may not be listed above. If a members is not part of a School or College and has not selected to have an accessible homepage then they can be viewed by direct friends only.
            

HelpTermsPrivacySafety TipsTestimonialsContactAboutOur Blog   
©2006 Bebo.com, LLC
 

bebo10:152:1144437081149



You'll notice that the extra braces around the "next page" link can screw up the program. If you could sort this out it would be great, because this thing will be copying a few thousand pages and it would save me a lot of labour! If not its ok.

BTW My mapping (aiSee.com) needs two things to be defined for it to work. One is each node, in this case represents one person and the code is as follows:

node: { title: "Username"}
I have to have a copy defining every user name like above. But only once! Duplicates throw up errors!

The second thing the mapping s/ware needs is the target and source link, in this case if the two people are friends. This is the code format of the graph, and at this stage you know everything about my project!


graph:{

node: {title: "UserName1"} //Every UserName must be defined only once
node: {title: "UserName2"}
node: {title: "UserName3"}
node: {title: "UserName4"}

edge: {source:"UserName1" target "Username3"}  //Each link from one person to another
edge: {source:"UserName2" target "Username4"} // (Overlapping in this case doesnt matter)
edge: {source:"UserName2" target "Username1"}

}



I dont mean to annoy you with doing extra stuff, If you like doing this sort of stuff for the kicks and are interested in what I am doing, thats great. You will save me months of research! I'll paypal over the money right now as soon as I figure out how.


noth(a)nk.you

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 78
  • More than meets the eye.
    • View Profile
    • Donate to Member
Just made some time today to work on your project again.  I think it's near completion, I'm just having trouble on one final detail.

Right now, it takes source.txt and creates two text files: nodes.txt and edges.txt.  The only feature I wanted to code in (but could't figure out) was to take those two files and automatically put them together inside the graph:{} syntax as file graph.txt.  As it is now, this is something you'd have to do manually (but should not be difficult).

Here's an example from your submission in the post above:

nodes.txt
node: {title: "dylan.cross"}


edges.txt
edge: {source: "dylan.cross" target: "gerbeenie"}
edge: {source: "dylan.cross" target: "laura-boland"}
edge: {source: "dylan.cross" target: "willconlon"}
edge: {source: "dylan.cross" target: "davkavo2"}
edge: {source: "dylan.cross" target: "sadge1"}
edge: {source: "dylan.cross" target: "greaneyc"}
edge: {source: "dylan.cross" target: "mr-sassy-pants"}
edge: {source: "dylan.cross" target: "eannaoshea"}
edge: {source: "dylan.cross" target: "micstallion"}
edge: {source: "dylan.cross" target: "niamhie21"}


I think that the comments in the code be fairly clear on each individual piece, but here's a note on the usage.

At the top of main, you'll see the line: const string breaktxt = "bebo";.  This is the text that indicates a page break (can be easily substituted to anything else you find that's better), and usually would create a new node.  The instance it would not is in the case of a new node being identical to the last, in which case the program simply continues making edges.

Give it a try on the data you have now, and let me know how it works.  If you have a specific problem, it might be helpful to have a larger example of what you'd be sending the program.

So, without further ado, here's the source:

Source
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

string getname( string& line );

//©noth(a)nk.you, 2006

int main()
{
    const string breaktxt = "bebo";

    const int timer = clock();

    ifstream source( "source.txt", ios::in );    //source file
    ofstream nodes( "nodes.txt", ios::out ),     //node temp. file
             edges( "edges.txt", ios::out );     //edge temp. file

    if( !source || !nodes || !edges )            //checks existance
    {
         cerr << "File wasn't opened\n";
         exit(1);
    }

    cout << "Processing..." << endl;

    string line, username, nodename;             //temporary variables

    bool findnode = 1;

    while( !source.eof() )                       //stops at end of file
    {
        getline( source, line );                 //gets a line at a time

        username = getname( line );              //parses it

        if ( username.empty() )
           continue;

        if ( line.find(breaktxt) != string::npos )
           findnode = 1;

        if ( findnode )                          //write node to file
        {
           if ( nodename == username )           //no node for same user
              continue;
           nodename = username;
           nodes << "node: {title: \"" << nodename << "\"}\n";
           findnode = 0;
        }

        else                                     //write edge to file
           edges << "edge: {source: \"" << nodename << "\" target: \""
                 << username << "\"}\n";
    }

    const int nodel = nodes.tellp(), edgel = edges.tellp();

    //wrap up the files
    source.close(), nodes.close(), edges.close();

    cout << "Done!\nIt took " << (clock() - timer)/1000. << "s" << endl;

    system("PAUSE");
}

string getname( string& line )                   //parses a line passed
{
       string temp = "";                         //will be the output
       int i = 0;                                //counter
       bool writeflag = 0;                       //controls stop and start write
       while( i < line.length() )
       {
              if ( writeflag )                   //if we should be writing
              {
                   temp += line[i];              //append each character
                   if ( line[++i] == 62 )        //if the character is a '>'
                      return temp;               //stop writing
              }
              else if( line[i++] == 60 )         //if the character is a '<'
                   writeflag = true;             //start writing
       }
       return temp;
}


Cheers!