topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Tuesday December 3, 2024, 3:07 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: IDEA: DFT to GEDCOM converter  (Read 22758 times)

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
IDEA: DFT to GEDCOM converter
« on: December 16, 2007, 02:08 AM »
A GEDCOM is a file that contains family tree information. GEDCOM files can be loaded into most genealogy software.
Here is the GEDCOM format:
http://en.wikipedia..../wiki/GEDCOM#Example

DFTCOM2 is a java program that converts a GEDCOM into a data file and shows the family tree in an applet.
http://www.dftcom2.co.uk/
Here is the DFT data file format:
http://www.dftcom2.co.uk/web/dftdbs.zip

Between 2003 and just last month, there have been many posts on the DFTCOM2 forum from people who have lost their original GEDCOM files and need a program to convert their DFTs to GEDCOM. However, the DFTCOM2 project was discontinued and no one ever made the converter.

This is a fairly simple project; it is mostly parsing information from one format to another and adding relationships between individuals. I was almost able to make the converter myself, but the numbers that determine the relationships are confusing. However, completing this project wouldn't take more than a day and would help a lot of people recover their family trees.

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #1 on: December 16, 2007, 08:11 PM »
Hrmm sounds like an interesting little project. I've spent a good deal of my career writing parsers and hacking data formats.. Can you give me info on what you learned? I have some familiarity with the GEDCOM format already..
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #2 on: December 16, 2007, 09:37 PM »
Wow that would be great if we could finally get this done  ;D

I have the java source code for the DFTCOM2 applet viewer. It loads the information from the DFT data file into arrays. I have figured out how to read most of the information from the arrays, although some of it isn't parsed correctly for the GEDCOM format (ie: the dob is mixed with the place of birth) and the variables that link family members are pretty confusing.

If you know java, we could probably edit the DFTCOM2 program to parse the information from the loaded data file and save each individual and family in the GEDCOM format. I can send you a copy of the source code if you want to take a look at it.

Thanks,
agentsteal

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #3 on: December 16, 2007, 10:11 PM »
I already started to look at the applet source by using jad and I'm starting to mess around with reading the example file you posted although I'm doing it in C++. I do a lot of work in Java, but I'm happier in C++ because I find Java way too limiting, maybe it is because I'm forced to use it at work  :D.. If I write something I'd prefer to do a drag and drop WTL app in C++ so people can just drag a bunch of DFT files and drop them on the GUI and have it just spit out the same file names with GEDCOM extensions.. Who knows, let me play around a bit and get familiar with the data structures and then get back to you. I'd like to get my feet wet a bit before I jump in  ;)
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #4 on: December 18, 2007, 10:15 AM »
Well.. I have the framework coded up and in a command line app, and so far I have it reading all the data in the file and storing it in internal data containers, so the next step will be to generate the GEDCOM file from the data, but I think I should have something "testable" soon..
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #5 on: December 18, 2007, 03:53 PM »
That's great thanks!  ;D

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #6 on: December 24, 2007, 07:14 AM »
Hey hows it going is the program almost ready?

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #7 on: December 26, 2007, 03:12 PM »
Things are going pretty well, ran into a little snag with high ASCII characters in text strings, but I have seen this kind of thing before when I use to work a lot with JSP and people would cut and paste from MS Word and the apostrophe char (0x92) would get used instead of the char at (0x47), so I have to work around that issue. But I have not had time to work on it lately with the holiday and I've had an intense cold kicking my butt for the last week; but I will be back to it soon and I'm committed to finishing and having a working prototype for you to test with at least by the first week of 2008 :)

'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #8 on: January 02, 2008, 11:49 AM »
Things are going pretty well now, except with the cold ( it is still here ), but I'm not sure how all of the data is linked. Did you learn anything about that from the java code?
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #9 on: January 03, 2008, 02:32 PM »
As I said in my first post, the variables that link the individuals is the part I couldn't figure out.

I'm looking at the java source code... maybe this could help?
Code: Java [Select]
  1. public void loadAncestors(int i, int j)
  2.     {
  3.         if(j >= 1)
  4.         {
  5.             nodeList[treeNode].prefix = 0;
  6.             nodeList[treeNode].indiRef = i;
  7.             nodeList[treeNode].repaint();
  8.             treeNode++;
  9.             int k = j / 2;
  10.             loadAncestors(mData[pData[i].famc].husb, k);
  11.             loadAncestors(mData[pData[i].famc].wife, k);
  12.         }
  13.     }

Thanks again for working on this project

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #10 on: January 03, 2008, 04:17 PM »
Thanks.. I didn't know how much you had figured out, figured I would pick your brain. That little section is pretty helpful. I thought that is how it was being done, where the famc is an index into the family array, but I wasn't getting it to work. I must be off somewhere, I'll have to go back and check how I am setting up the indexes. I just ported over the function to translate the birth/death info string, which had some tricky bits to it. The part where he is representing the month in base 16 was interesting.. I'm getting there.. :)
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #11 on: January 08, 2008, 04:25 PM »
The DFT to GEDCOM converter is pretty much done, the command line version anyway. It supports conversion from dta files or dftdbs.zip files, so there is no need to extract the dta files from the zip archives is you don't want to. I did find some problems. Like most compilers it looks like some data is lost in the compile process. The DFT compiler only supports a limited set of GEDCOM tags and the rest of the data it just ignores, so that data is gone. The DFT compiler also combines some data like death/birth/baptized/married info into one string and I have done my best to parse it apart, by trying to match the date string with a regular expression that you can override from the command line, I get it right most of the time, but I'm sure a few oddball entries will get past. It also looks like the DFT compiler ignores multiple NOTE entries in individuals, and just drops the data.

An example of a string with birth/death/baptism/marriage data in it looks like this:

b: 11 JUL 1866|c: 5 AUG 1866, Gringley on the Hill, Notts|d: 1 APR 1956, Sherwood Hospital, Nottingham - i: Beckingham, Notts

and in GEDCOM it needs to look like this:

1 BIRT
2 DATE 11 JUL 1866
1 CHR
2 DATE 5 AUG 1866
2 PLAC Gringley on the Hill, Notts
1 DEAT
2 DATE 1 APR 1956
2 PLAC Sherwood Hospital, Nottingham
1 BURI
2 PLAC Beckingham, Notts

This is a well behaved string, but not all strings conform, so some hand editing of the GEDCOM file might have to occur in the situations where the app can't figure it out.

You can also try to help it by modifying how dates are matched by specifying a date format string on the command line in regular expression format, because anything that is not a date is a place. The problem really happens with strings like this:

@info Housewife|b: FEBRUARY 22, 1922, Cork, Ireland|d: MAY 1967, New York USA

the first date is a problem, because of the use of a comma in the date. Everywhere else the comma is used to separate date and place.

My sample set is pretty small too.. I have the file you pointed me to, and a gedcom file I compiled and then de-compiled to test, so once people start using the program chances are there will be things I will need to fix here and there.

I need to test a few more things, but it looks pretty good for posting a link for you tomorrow so you can download the command line tool and test it out.
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #12 on: January 08, 2008, 07:54 PM »
That's great! Thank you so much for making this program.

I thought that is how it was being done, where the famc is an index into the family array, but I wasn't getting it to work. I must be off somewhere, I'll have to go back and check how I am setting up the indexes.
So did you figure out how to do this?

The part where he is representing the month in base 16 was interesting..
What do you mean base 16?

The DFT compiler only supports a limited set of GEDCOM tags and the rest of the data it just ignores, so that data is gone.
Really? Like which tags?

The DFT compiler also combines some data like death/birth/baptized/married info into one string and I have done my best to parse it apart, by trying to match the date string with a regular expression that you can override from the command line, I get it right most of the time, but I'm sure a few oddball entries will get past.
Yea when I was looking at the source code I noticed that some of the information gets combined. A regex match would probably work without too many errors unless someone was deliberately trying to mess with the program.

My sample set is pretty small too.. I have the file you pointed me to, and a gedcom file I compiled and then de-compiled to test, so once people start using the program chances are there will be things I will need to fix here and there.
If you want I could send you the 19 DFTs I'm trying to convert. I don't have the original GEDCOM files though...

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #13 on: January 08, 2008, 09:07 PM »
That's great! Thank you so much for making this program.
No problem.. I learned a lot doing it, that was the idea :)

So did you figure out how to do this?
Yes, basically the dta file data is stored in order of how it is indexed, so as long as you read it in and store the individual and family position, the references between individual and family work out correctly.

What do you mean base 16?
Where decimal is base 10 ( our natural number system ) you have 10 unique characters, computers are binary ( which is base 2, on or off), base 16 is binary shorthand (hexadecimal) and has 16 characters 0-9 A-F. So he wanted to represent a month and only use one character. So he stored it in base 16 which would give him 12 unique chars plus a few, and then converting back you could use something like the strtol C runtime function with the radix of 16 to get the original number back:

http://en.wikipedia.org/wiki/Hexadecimal
http://www.cplusplus.../cstdlib/strtol.html

Really? Like which tags?

This is from the help file that comes with the compiler:
The Dynamic Family Tree Compiler uses the following subset of the records defined by The GEDCOM Standard - Release 5.5 (Electronic Version).

Unless explicitly listed below, all other records are ignored.
Spoiler
INDIVIDUAL RECORD

0   @XREF:INDI@ INDI

1   NAME <NAME PERSONAL>
      2   NPFX <NAME_PIECE_PREFIX>
      2   GIVN <NAME_PIECE_GIVEN>
      2   NICK <NAME_PIECE_NICKNAME>
      2   SPFX <NAME_PIECE_SURNAME_PREFIX>
      2   SURN <NAME_PIECE_SURNAME>
      2   NSFX <NAME_PIECE_SUFFIX>

1   SEX <SEX_VALUE>

1   BIRT
      2   DATE <DATE_VALUE>
      2   PLAC <PLAC_VALUE>

1   BAPM
      2   DATE <DATE_VALUE>
      2   PLAC <PLAC_VALUE>

1   CHR
      2   DATE <DATE_VALUE>
      2   PLAC <PLAC_VALUE>

1   DEAT
      2   DATE <DATE_VALUE>
      2   PLAC <PLAC_VALUE>

1   b BURI
      2   DATE <DATE_VALUE>
      2   PLAC <PLAC_VALUE>

1   OCCU <OCCUPATION>

1   TITL <NOBILITY_TYPE_TITLE>

1   FAMC @<XREF:FAM>@
   1   FAMS @<XREF:FAM>@

1   NOTE @<XREF:NOTE>@

1   NOTE [<SUBMITTER_TEXT> | <NULL>]
      2   [CONC | CONT] <SUBMITTER_TEXT>]

1   OBJE
      2   FORM <MULTIMEDIA FORMAT>
      2   TITL <DESCRIPTIVE TITLE>
      2   FILE <FILE REFERENCE>


FAMILY RECORD

0   @XREF:FAM@ FAM

1   MARR [Y | <NULL>]
      2   DATE <DATE_VALUE>
      2   PLAC <PLAC_VALUE>

1   HUSB @<XREF:INDI>@
   1   WIFE @<XREF:INDI>@
   1   CHIL @<XREF:INDI>@

NOTE RECORD

0   @XREF:NOTE@ NOTE <SUBMITTER_TEXT>
   1   [CONC | CONT] <SUBMITTER_TEXT>



A regex match would probably work without too many errors unless someone was deliberately trying to mess with the program.

The problem is there can be either a date or a place, or both. Trying to figure out which is which is the tough part. Date seems to always come before place, never the other way around, and they are always separated by commas; but commas can be in dates. It is definitely not an exact science :)

If you want I could send you the 19 DFTs I'm trying to convert. I don't have the original GEDCOM files though...

Since the app is pretty much ready, I'll just PM you a link to the download and you can process the files and let me know what does and does not work.. Probably send you the link before I knock off for the night.
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #14 on: January 09, 2008, 09:52 AM »
Wow amazing job this program looks so cool! I haven't really gotten a chance to test it yet... tonight I will try converting some files and comparing them to the original GEDCOMS. I tried converting a couple of my DFT files and it seems to have worked... except why is the Found Families is less than the Families? Anyway thanks again the program looks great!

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #15 on: January 09, 2008, 09:59 AM »
why is the Found Families is less than the Families?

I noticed this also, every file seems to have a blank family which needs to be read so the index is correct, but it has no values. I don't completely understand why it is there..
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #16 on: January 09, 2008, 10:30 AM »
I think I found a problem... notes aren't working with any of my DFTs.

--edit: Actually a lot of stuff isn't converting... I don't think names are converting either.
« Last Edit: January 09, 2008, 10:33 AM by agentsteal »

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #17 on: January 09, 2008, 10:41 AM »
Can you send me the problem DFT file?
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

agentsteal

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 75
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #18 on: January 09, 2008, 09:17 PM »
okay I just loaded a converted gedcom into Family Tree Maker and it worked, but it had thousands of errors, mostly these three:

Tag: DATE, subordinated to wrong item, line ignored.

Tag: DATE, invalid date: (±) 1786, line ignored.

Tag: CHIL, adding this individual as a child will exceed the maximum of 99 parents for an individual, parents are ignored.

ChalkTrauma

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 116
  • ::41554D::
    • View Profile
    • DreamCycle Studios
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #19 on: June 08, 2009, 09:35 AM »
Thanks to a gentle nudge from Mouser  :D I'm posting the link to where I have this app posted:

http://www.dreamcycle.net/dft2ged/

Here is a little info on what it does:

DFT2GED is a simple command line utility that takes a dft database and converts it back to GEDCOM format. It will read both DTA files as well as the zip file archives the DftCom2 compiler creates. The DFT files do not contain the original GEDCOM information. Like many other compilers, data is lost in the process. The DftCom2 compiler only supports a limited set of GEDCOM tags and ignores the rest, it also combines some date and place information, making it very hard to programmatically parse apart. I have done my best to have the program do most of the work, but some hand editing of the GEDCOM output may be necessary.


If you find yourself in the situation where you have a bunch of DFT data and need to get it back in GEDCOM format, this little app may do the trick..
'Behold! It is not over unknown seas but back over well-known years that your quest must go; back to the bright strange things of infancy and the quick sun-drenched glimpses of magic that old scenes brought to wide young eyes.'

Dellji

  • Participant
  • Joined in 2015
  • *
  • Posts: 1
    • View Profile
    • Donate to Member
Re: IDEA: DFT to GEDCOM converter
« Reply #20 on: January 19, 2015, 06:55 AM »
Could the person that had the source code for DFT recompile or modify it so it works with the current Java & browsers?