topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Tuesday March 19, 2024, 5:06 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: MwImporter php script - Batch import html site into MediaWiki - v2.0 - 5/26/10  (Read 28878 times)

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Official web page here: https://www.donation...wimporter/index.html

JavaJones and I have been working on an Open Source php script thats aid in batch converting and importing an entire directory of html files and images into a MediaWiki site.

Useful if you want to convert your static page site into MediaWiki.

This builds on a number of existing tools, including HTML WikiConverter perl scripts, and the php importing tools that come with MediaWiki.

What it adds is a bunch of nice helper functions that facilitate massaging the html prior to conversion, and wiki text post conversion, handling filename clashes, relative links between pages, and the handling of recursive directories of both static pages and images, using php classes that are easy to extend.

We will be posting a release for anyone who might find this useful soon, though i have my doubts as to whether this isn't the empty set (let me know if not!).



NOTE: I should add that really this is a generic set of php classes for "converting/importing" a recursive directory of files from one format into another, which includes derived classes specifically for converting and importing from html files into a MediaWiki site; but the base classes could serve as a useful starting point for anyone who wants php code to recursively discover and batch process/convert a directory of files from any format to any other format, with helper functions for handling commandline options, temp files, file matching patterns, etc.



DOWNLOAD v2.0 (5/26/10):
https://www.donation...orter/MwImporter.zip

LICENSE: Open Source

AUDIENCE: This code is intended for experienced users who aren't afraid of getting their hands dirty; if you are expecting a super friendly idiot-proof tool you need to look elsewhere (bearing in mind there is nothing else at the current time that will do this stuff).
« Last Edit: June 02, 2010, 05:23 AM by mouser »

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
Re: MwImporter scripts - Batch import an html site into MediaWiki
« Reply #1 on: May 12, 2010, 07:17 PM »
Don't forget it can import image maps, too (targeting the MW imagemap plugin). Of course not many people use image maps these days I suppose... :D

- Oshyan

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Ok finished up adding the features i wanted to add -- you can now import a large deep directory of pages and images, and it will import properly all of them, creating unique names when the file and page titles are not globally unique (nesc. due to flatness of a wiki), and fixing up all links between pages.

Hopefully this will be useful for someone who is interested in migrating a site from static to wiki format.  I will upload soon.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
i have uploaded a first version.. i don't anticipate much use for this, but it's there for those who want to try it.

RayOfLight

  • Participant
  • Joined in 2010
  • *
  • Posts: 3
    • View Profile
    • Donate to Member
This will be very usefull for us. We will be adding .chm files to our wiki after converting them to html.
I'll try to figure out how it works and then test it...crossing my fingers :D

I'm reading in on it and have already some questions.
First one, what does: --mw_dircat_sepdir
It's not (yet) in the help file

Also, because we're not running the wiki on our own server, I'm searching for a way to set everything up in a way that I can ask our webhost to run the necessary command to start it...if that is at all possible...
« Last Edit: August 06, 2010, 06:07 AM by RayOfLight »

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
I think i need to upload the latest version with some recent changes.

Here is new readme lines for mw_dircat_sepdir and others:

--mw_category CATEGORYSTR : adds [[CATEGORY CATEGORYSTR]] tag to EVERY page
--mw_dircat_top STR: if specified then this STR is used as the first part of any directory-based category below
--mw_dircat_subdir: if specified, adds a [[CATEGORY something]] where something is the most recent subdirectory as the category (or $mw_dircat_top if at top)
--mw_dircat_fullpath: if specified, adds the full path like [[CATEGORY a.b.c]] for subdirectory depth
--mw_dircat_sepdir : if specified then adds a separate [[CATEGORY a]] [[CATEGORY b]] like string for each subdirectory as a separate category

i will upload latest version over the weekend.

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
Very glad to see someone has a use for this! Hopefully once we publicize it more widely (and document it more fully), it will be of even broader interest.

RayOfLight, your use sounds very similar to what this was originally developed for. I was converting from Dr. Explain based help files to a wiki to allow for collaborative editing. And for that it worked pretty well, though I ultimately had to trim down the import quite a lot for other reasons. But the system is capable of quite complex (and customizable) import if desired, including image maps.

- Oshyan

RayOfLight

  • Participant
  • Joined in 2010
  • *
  • Posts: 3
    • View Profile
    • Donate to Member
I'm still working on a way to set this up so my webhost can do with only one command.
It's one of many things on the todo-list for our fresh started wiki...it's more work than we anticipated...

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
i suggest that you set up a version on your local pc and test it there.

doesn't matter if it's windows or linux or whatever, setup mediawiki and php, etc on your local machine, and then you can experiment with importing on your local machine at your leisure.

in fact, if you do that, then once you get the pages imported successfully you might be able to export the sql database from your local pc and import it into the server version of your wiki without running the script on your server, maybe.

RayOfLight

  • Participant
  • Joined in 2010
  • *
  • Posts: 3
    • View Profile
    • Donate to Member
That might be an idea too...maybe it's best if I look into that...it's been a while that I set up local php and so on.

steppin

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 2
    • View Profile
    • Donate to Member
Hi mouser - I'm interested in running your "batch html site into MediaWiki" script on my web server, but I am running into difficulties. In August you suggested to RayOfLight that he'd be better off running it on his home computer. Is there a technical reason why this script would not run on a commercially hosted web server (Yahoo, in my case)? When I attempt to run it, it seems unable to find the necessary MW files, specifically commandLine.inc in the /maintenance directory, even though the mwdir variable is pointed directly to the right place. Is there something obvious I'm missing about why precisely doing this on a web server is not going to be possible? Thanks in advance for your thoughts; would love to get this working for converting a bunch of my genealogy web pages.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
if your web server lets you run the php files from a commandline then it should work.
you say it can "find" the necesary MW files.. can you clarify if you are sure it can't fine them.. or if maybe the host is not letting them run?

steppin

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 2
    • View Profile
    • Donate to Member
Hi Mouser - thanks for the reply. Let me rephrase my initial question. Is it possible to modify this script so that I do not need to use the commandline to do so? My web service provider does not (as far as I can tell) allow commandline access. So I want to run it just from clicking on the script or pointing to the url where the script is found. I've been trying to modify where the script gets the information it needs, so that it does not need any input from the commandline's initial command. But maybe there's some obvious reason I'm missing that just makes this impossible. The exact error message I received was: FATAL ERROR: Could not find mediawiki file (please specify -mwdir= option): http://www.MYHOST.ne...ance/commandLine.inc. But that URL is accurately where that file is stored. Thanks for any thoughts.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Is it possible to modify this script so that I do not need to use the commandline to do so? My web service provider does not (as far as I can tell) allow commandline access. So I want to run it just from clicking on the script or pointing to the url where the script is found.


i think you are onto the problem.. and your question is the right one..

it's been a while since i worked on mwimporter but my memory is that 1) i always intended to make it so you could do this 2) at some point i realized it wasn't going to be so simple to it.  but i can't remember if #2 is really true or why i have the vague memory of thinking it was going to be tricky.  it seems like it should be doable.

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
I'd like to have a non-commandline version of this available at some point. Perhaps after UQ mouser? :D

- Oshyan

nminh09

  • Participant
  • Joined in 2011
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Hi all,
Please help me step by step to use MWImporter tool. I have directory of html files that link together, I want convert and import them into my MediaWiki.
Thanks so much
NGUYEN MINH

nminh09

  • Participant
  • Joined in 2011
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
P/S: mail E-mail: xxxxxxxxxxx

[email removed because we don't want you to end up get spammed by automated email scrapers that scour the internet just looking for emails -- mouser]
« Last Edit: August 21, 2011, 11:17 PM by mouser »

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Hi nguyen,

Sorry for the delay in responding.  It's been a while since I've used mwimporter, so I'm not going to be able to help too much.  Have you managed to get it working a little bit? Where are you stuck?

nminh09

  • Participant
  • Joined in 2011
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Thanks Mouser for responding, I try to run MWImporter tool after install perl cpan modules (I've read README.txt), but it don't respond anything (pause), I don't know why,
I have one big directory of html files, very hard for me to convert it into my wiki. I need a help. Can you show me the way to use MWimporter clearly? Thanks a lot.
Have a nice day.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member