topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday December 5, 2024, 1:50 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: delete rows with duplicate cells  (Read 5011 times)

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,824
    • View Profile
    • Donate to Member
delete rows with duplicate cells
« on: November 04, 2013, 04:48 PM »
hello

in an excel file, I want to delete the rows that contain duplicate cells in the column H

is there any solution for this?

thanks!

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,964
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #1 on: November 04, 2013, 05:40 PM »
There seems to be a couple of options here:
http://duckduckgo.co...+cells+in+one+column
Tom

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,649
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #2 on: November 05, 2013, 06:59 AM »
Shorter answer (less reading)

Highlight that data you want to work with, and click the Remove Duplicates option on the Data tab of the ribbon (07 and up). Then select the column(s) you want to search for duplicates.

...I just discovered this option a while back ... Right after spending an entire day manually sorting down a 7,000 item inventory dump. :wallbash:

Target

  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 1,832
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #3 on: November 05, 2013, 03:41 PM »
lots of answers to this one (use the inbuil functionality, or formulas and filtering, or a macro, or...) but it depends on what you want to do

the version of excel you're using is relevant as well

Highlight that data you want to work with, and click the Remove Duplicates option on the Data tab of the ribbon (07 and up). Then select the column(s) you want to search for duplicates.

worth noting that this only works in excel 2007 and above


kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,824
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #4 on: November 06, 2013, 04:39 AM »
unfortunately I don't have excel, I use libreoffice latest version

is there a solution for that software?

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,964
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #5 on: November 06, 2013, 04:49 AM »
unfortunately I don't have excel, I use libreoffice latest version

my reaction to this veered between want to cry and wanting to laugh, and other emotions I wont mention :-\

First rule of asking for help (I should know - I'm no expert, so I need to ask for help often), anyways, the first rule is:
give as much information as possible - make it easy for those trying to help.

BTW did you try a search for this new combination?
Tom

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,824
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #6 on: November 06, 2013, 04:53 AM »
I tried a search without luck

maybe I should obtain an evaluation version of excel to do it?

Target

  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 1,832
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #7 on: November 06, 2013, 05:46 PM »
I'm not familiar with the functionality in Libre office but it appears the following formula will work (in both Libre office AND excel)

    =countif($a$2:a3,a3)

enter this formula wherever you like, though I would suggest next to the range in question would be most appropriate.

You will need to modify the range '$a$2:A3' to cover the columns you want to test (note that the absolute reference must to point to the first cell in the range), and of course the reference value as well.

the formula will count the number of occurrences of the reference cell, then you can use the filter function to filter out all the single occurrences and delete the rest

TaoPhoenix

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 4,642
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #8 on: November 06, 2013, 07:34 PM »
Shorter answer (less reading)

Highlight that data you want to work with, and click the Remove Duplicates option on the Data tab of the ribbon (07 and up). Then select the column(s) you want to search for duplicates.

...I just discovered this option a while back ... Right after spending an entire day manually sorting down a 7,000 item inventory dump. :wallbash:

Yikes!

Well this prob and many not dis-similar ones can be tackled with "intermediate skillz"! "Not all these steps all apply, but in holistic theme they do".

1. So the offending Duplicate Column is Column H. So during all this, save version controlled copies so if you mess it up learning you can go back! Copy 1a sort by col H.
2. Create a new column, whether it is Col I or Col AA or whatever, and do something like "if (Cell to my left) = (cell to my left and up 1), then "dup", else "clear".

-- You can't quote literary and quote an excel command but that's close enough to give the gist.
3. So then hopefully the first of anything always goes through, and all the dups get tagged as Dup. So then you save 1a at that point. Then in a fresh new copy 1b or 2a or whatever, you SORT BY DUP. So ALL the dupes float to the top!
4. Delete the dupes. Even on my aging comp or the one at old work, fifteen seconds because after about ten scrolls it starts flying by! Save that copy.
5. Create a fresh new copy. 3a, or whatever. But Bump the Version Number. Because by now that should have totally nuked all the dupes from one column. You can repeat as needed for other columns. But that's the method.
6. Save a final "Production Copy" after you've fixed the usual junk that just happened to catch your eye, irrelevant to this process.

Hope that helps!

I'm not a total "expert", I don't script Excel much, but I came to believe in it after my last job and now I am suspicious of anything "manual"!

Bonus - Rules of thumb

A. The first three letters of RIS(C) (No Chip). Reduced Instruction Set Winz! Fatality! (Mortak Kombat I miss you 1995!) So even if you can't totally kill the problem, there's a darn good chance you can take the manual part down from say 23 actions X 1000 lines down to say 7 actions X 1000 lines.

B. An unbelievable incredible number of things can be done with Chained Logic. So (If you've earned the right!) tell your boss you need two hours to think, because if the manual version of something takes 4-7 days but a five column logic chain can do it in twelve hours, think about it! The secret is heuristic rules. "Precisely why am I doing this?" If it's just a total crap data set you're a bit stuck, but if the action is suspiciously familiar, then do junk like "Length-mid-characters 5-8;;if they are such and such, flag to just kill".

(Totally not valid Excel Syntax, but 15 min will give you the right one.) Point is, For ex, if your correct data HAS to be 1112223333 for say a phone number but it all comes in 111-222-3333, characters 4 and 8 "cannot be dashes", so set up your five-seven column logic chain to kill them.

Whew! Enough for now! Just be Werry Werry Leery of manual junk in Excel/clones! :)


Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,649
    • View Profile
    • Donate to Member
Re: delete rows with duplicate cells
« Reply #9 on: November 07, 2013, 06:49 AM »
If it's just a total crap data set you're a bit stuck

It was. I had to go through and normalize the data to clean up 20 years of previous sins, because we were/are migrating to a new BI suite that is much less tolerant of user input shenanigans. I streamlined as much as I could but there is not much point in spending the time to automate something that will only be done once, as any logic bugg will be missed until it's (in production) to late.

However, as a rule, I too hate anything manual as it flies in the face of the whole point of a computer ... Which is to eliminate mundane and repetitive tasks. :)