Main Area and Open Discussion > General Software Discussion
Advice on manipulating a flat file
(1/1)
dluby:
Hi,
I have been given the 'pleasure' of masking data in a text file that is 160MB in size with 1.25M records. The format is (yes the delimiter is ¦¦):
SRCE_CUST_NO¦¦FIRST_NAME¦¦LAST_NAME¦¦ADDR_1¦¦ADDR_2¦¦ADDR_3¦¦ADDR_4¦¦ADDR_5¦¦ADDR_6¦¦ADDR_7¦¦POST_CDE¦¦DOB¦¦MARITAL_STA¦¦STAFF¦¦EMPLR_STA
Can anybody recommend the easiest way to mask\amend certain columns for all the records. So for example I'd like to replace the FIRST_NAME, LAST_NAME and address columns with dummy data (preferably with sequential numbering but not essential)?
So it would end up like this:
SRCE_CUST_NO¦¦FIRST_NAME¦¦LAST_NAME¦¦ADDR_1¦¦ADDR_2¦¦ADDR_3¦¦ADDR_4¦¦ADDR_5¦¦ADDR_6¦¦ADDR_7¦¦POST_CDE¦¦DOB¦¦MARITAL_STA¦¦STAFF¦¦EMPLR_STA
12343¦¦F_NAME1¦¦LAST_NAME1¦¦Address 1¦¦Address 1¦¦Address 1¦¦Address 1¦¦Address 1¦¦Address 1¦¦Address 1¦¦P_CODE1,25/05/1967¦¦MARRIED¦¦NULL
12343¦¦F_NAME2¦¦LAST_NAME2¦¦Address 2¦¦Address 2¦¦Address 2¦¦Address 2¦¦Address 2¦¦Address 2¦¦Address 2¦¦P_CODE2,02/08/1998¦¦SINGLE¦¦NULL
I tried using Excel 2010 but it can only load 1048576 records.
Thanks
tomos:
CS-Calc claims to be able to work with 12 million rows, might be worth a try (if that's a possibility)
mouser:
A simple regex script (python, perl, etc.) would make quick work of it.
dluby:
A colleague put together a SQL SSIS package to this this for me so problem resolved. Thanks
Navigation
[0] Message Index
Go to full version