ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Advice on manipulating a flat file

(1/1)

dluby:
Hi,

I have been given the 'pleasure' of masking data in a text file that is 160MB in size with 1.25M records.  The format is (yes the delimiter is ¦¦):

SRCE_CUST_NO¦¦FIRST_NAME¦¦LAST_NAME¦¦ADDR_1¦¦ADDR_2¦¦ADDR_3¦¦ADDR_4¦¦ADDR_5¦¦ADDR_6¦¦ADDR_7¦¦POST_CDE¦¦DOB¦¦MARITAL_STA¦¦STAFF¦¦EMPLR_STA

Can anybody recommend the easiest way to mask\amend certain columns for all the records.  So for example I'd like to replace the FIRST_NAME, LAST_NAME and address columns with dummy data (preferably with sequential numbering but not essential)?

So it would end up like this:

SRCE_CUST_NO¦¦FIRST_NAME¦¦LAST_NAME¦¦ADDR_1¦¦ADDR_2¦¦ADDR_3¦¦ADDR_4¦¦ADDR_5¦¦ADDR_6¦¦ADDR_7¦¦POST_CDE¦¦DOB¦¦MARITAL_STA¦¦STAFF¦¦EMPLR_STA
12343¦¦F_NAME1¦¦LAST_NAME1¦¦Address 1¦¦Address 1¦¦Address 1¦¦Address 1¦¦Address 1¦¦Address 1¦¦Address 1¦¦P_CODE1,25/05/1967¦¦MARRIED¦¦NULL
12343¦¦F_NAME2¦¦LAST_NAME2¦¦Address 2¦¦Address 2¦¦Address 2¦¦Address 2¦¦Address 2¦¦Address 2¦¦Address 2¦¦P_CODE2,02/08/1998¦¦SINGLE¦¦NULL

I tried using Excel 2010 but it can only load 1048576 records.

Thanks

tomos:
CS-Calc claims to be able to work with 12 million rows, might be worth a try (if that's a possibility)

mouser:
A simple regex script (python, perl, etc.) would make quick work of it.

dluby:
A colleague put together a SQL SSIS package to this this for me so problem resolved. Thanks

Navigation

[0] Message Index

Go to full version