Excel - Delete matching cells

Asked By gary on 14-Jan-12 12:26 PM
Col A has 360,000 cells.
Col B has 240,000 cells.

A                                    B
0000000021957	0000000022002
0000000022002	0000000032002
0000000031957	0000000032003
0000000032002	0000000042002
0000000032003	0000000052002
0000000042002	0000000052003
0000000052002	0000000062002
0000000052003	0000000102002
0000000061967	0000000121996
0000000061968	0000000142002
0000000062002	0000000152002
0000000081963	0000000162002
0000000102002	0000000481994
0000000121996	0000000481995
0000000142002	0000000481996
0000000152002	0000000481997
0000000162002	0000000481998
0000000341991	0000000481999
0000000401961	0000000482000

How can I delete the cells in Col A whose contents match cells in Col
B?




Gord Dibben replied to gary on 14-Jan-12 02:03 PM
One method.

Insert a column left of Column A
Insert a row at top
Add titles in A1:C1..........will need for filtering


In A2 enter   =COUNTIF($C$2:$C$20,B2)<>0

D-click on fill handle to copy down to bottom of Column B

Select Columns A and B only

Data>Filter>Autofilter.

Filter for True on Column A

Select from A2 to bottom of Column B.

F5>Special>Visible Cells>OK

Edit>Clear>Contents

Remove Filter.

Select A and B then F5>Special>Blanks>OK

Edit>Delete>Shift Cells Up

Done


Gord
Gord Dibben replied to Gord Dibben on 14-Jan-12 02:19 PM
You posted in the Programming group so I guess VBA is in order.

Sub test()
For Each cell In Range("A1:A360000")
If WorksheetFunction.CountIf(Range("B1:B240000"), _
cell.Value) <> 0  Then
cell.ClearContents
End If
Next
End Sub

You can get rid of the blanks after running.


Gord
gary replied to Gord Dibben on 14-Jan-12 05:19 PM
=A0 =A0 =A0 =A0 =A0 =A0 =A0B

Hi Gord,

How long should your macro run?  (it is been running for more than 2
hours).  Is there any way to determine its progress?
Ron Rosenfeld replied to gary on 14-Jan-12 05:31 PM
Here is another macro, using the AdvancedFilter.
Please do this on a copy of your data.

You will need to set ws to the proper worksheet.  I used Sheet2.

=============================
Option Explicit
Sub PruneColA()
Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim c As Range
Dim rCrit As Range
Dim i As Long
Dim v As Variant
Set ws = Worksheets("Sheet2")
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
Set rCrit = .UsedRange.Resize(rowsize:=2, columnsize:=1).Offset _
(columnoffset:=.UsedRange.Columns.Count + 2)
End With

Application.ScreenUpdating = False

rCrit(1).ClearContents
rCrit(2) = "=countif(" & rColB.Address & "," & rColA(2).Address(False, False) & ")>0"

With rColA
.AdvancedFilter Action:=xlFilterInPlace, criteriarange:=rCrit
End With

rCrit.EntireColumn.Delete

On Error Resume Next
rColA.Offset(rowoffset:=1).Resize(rowsize:=rColA.Rows.Count - 1) _
.SpecialCells(xlCellTypeVisible).ClearContents
On Error GoTo 0

i = 0
ReDim v(1 To WorksheetFunction.CountA(rColA))
For Each c In rColA
c.EntireRow.RowHeight = 15
If c.Value <> "" Then
i = i + 1
v(i) = c.Text
End If
Next c

rColA.ClearContents
Set rColA = rColA.Resize(rowsize:=UBound(v))
rColA = WorksheetFunction.Transpose(v)

Application.ScreenUpdating = True

End Sub
===================================
Gord Dibben replied to gary on 14-Jan-12 06:01 PM
I have no idea how long it should take but 2 hours+ sounds a little
much.

I ran it on about 100 rows which is somewhat smaller than the range
you have.  Took a second.

See Ron's macro..........proably much faster than a loop.

Did you try the manual method?


Gord
gary replied to Ron Rosenfeld on 14-Jan-12 06:15 PM
te:
=A0 =A0 =A0 =A0 =A0 =A0 =A0B
=3D=3D=3D=3D=3D
xlUp))
xlUp))
fset _
, False) & ")>0"
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D- Hide quoted text -

How long should your macro run?
Is there any way to determine its progress?
gary replied to Gord Dibben on 14-Jan-12 07:12 PM
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0B
l
-

In your =3DCount formula, I changed $C$20 to $C$239820  (which is the
number of cells in my spreadsheet)

When copying that formula down to the bottom of Col B, it is already
taken 30 minutes and "Calculating" is at 4%..






.
gary replied to Ron Rosenfeld on 14-Jan-12 10:33 PM
te:
=A0 =A0 =A0 =A0 =A0 =A0 =A0B
=3D=3D=3D=3D=3D
xlUp))
xlUp))
fset _
, False) & ")>0"
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D- Hide quoted text -


Your macro (using the Advanced Filter) is getting Run-time Error
'1004'  of "AdvancedFilter method of Range clsss failed".

Note: I have set ws =3D Worksheets("Sheet1")
GS replied to gary on 14-Jan-12 11:19 PM
One way...

Option Explicit

Sub StripDupes()
Dim i&, j&, lRows1&, lRows2&, lLastRow& 'as long
Dim vRng As Variant
lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
If lRows1 > lRows2 Then lLastRow = lRows1 Else lLastRow = lRows2
vRng = Range("A1:B" & lLastRow)
For i = UBound(vRng) To 1 Step -1
If Not vRng(i, 1) = "" Then
For j = 1 To UBound(vRng)
If vRng(i, 1) = vRng(j, 2) Then
Cells(i, 1).Delete shift:=xlUp: Exit For
End If
Next 'j
End If
Next 'i
End Sub

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc
GS replied to GS on 14-Jan-12 11:22 PM
You can speed it up by turning ScreenUpdating off...

Application.ScreenUpdating = False
Application.ScreenUpdating = True

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc
Ron Rosenfeld replied to gary on 14-Jan-12 11:52 PM
It is not possible to say how long the macro would take to run as that is dependent not only on the code, but also on the characteristics of your particular machine and environment.

I rewrote the macro in a manner which should significantly improve the speed.  The re-written macro, run against the data that you provided which I copied and repeated down to 600,000 rows, took about five minutes to run.

However, I only disabled screen updating as there is nothing else on that worksheet of mine.  If you have calculations which refer to the cells that are being altered, event triggered macros, and other calls on the resources, that can slow things down.  All of those things can be disabled if they are an issue, but for now let us see if we cannot get something running.

The way it has been rewritten, there is no way to determine its progress.  The implementation of AdvancedFilter is an Excel feature, not VBA.  Although it should execute much more quickly than looping through the rows of cells, I do not know of any way to monitor its progress.

However, there are issues with speed, function, and the AdvancedFilter in Excel especially when dealing with large data bases.  I have developed a method which I think should run MUCH more quickly, but it is hard to test on a small database.  It does assume that there are no duplicates in Column A, or if there are that you only want to display unique values.  Is that a valid assumption?

The method needs a little refinement but with your database duplicated down to about 500,000 rows, it runs in less than five seconds.  Of course, there are only seven entries in ColA that do not appear in Colb.  I have not idea how it would run with a different data set.  But try it and let me know.

Also, with this method, it would be possible to keep track of where it is.

Note the comment at the beginning about setting a reference.  It will NOT run if that reference is not set.  If this will be distributed, we can use late binding, but not tonight.

===================================
Option Explicit
Sub PruneColA()
'Requires setting reference (tools/references) to
'  Microsoft Scripting Runtime

Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim dColA As Dictionary, dColB As Dictionary
Dim i As Long
Dim d As Variant

Set dColA = New Dictionary
Set dColB = New Dictionary
Set ws = Worksheets("Sheet2")
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
End With

vColB = rColB
vColA = rColA

For i = LBound(vColB, 1) + 1 To UBound(vColB, 1)
With dColB
If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i

For i = LBound(vColA, 1) + 1 To UBound(vColA, 1)
If Not dColB.Exists(Key:=vColA(i, 1)) Then
With dColA
If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1)
End With
End If
Next i

ReDim vColA(1 To dColA.Count)
i = 0
For Each d In dColA
i = i + 1
vColA(i) = dColA(d)
Next d

rColA.Offset(rowoffset:=1).ClearContents
Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffset:=1)
rColA = WorksheetFunction.Transpose(vColA)

End Sub
+++++++++++++++++++++++++++++++
gary replied to Ron Rosenfeld on 15-Jan-12 12:32 AM
te:
dependent not only on the code, but also on the characteristics of your pa=
rticular machine and environment.
eed. =A0The re-written macro, run against the data that you provided which =
I copied and repeated down to 600,000 rows, took about five minutes to run.
worksheet of mine. =A0If you have calculations which refer to the cells th=
at are being altered, event triggered macros, and other calls on the resour=
ces, that can slow things down. =A0All of those things can be disabled if t=
hey are an issue, but for now let us see if we cannot get something running.
=A0The implementation of AdvancedFilter is an Excel feature, not VBA. =A0A=
lthough it should execute much more quickly than looping through the rows o=
f cells, I do not know of any way to monitor its progress.
Excel especially when dealing with large data bases. =A0I have developed a=
method which I think should run MUCH more quickly, but it is hard to test =
on a small database. =A0It does assume that there are no duplicates in Colu=
mn A, or if there are that you only want to display unique values. =A0Is th=
at a valid assumption?
wn to about 500,000 rows, it runs in less than five seconds. =A0Of course, =
there are only seven entries in ColA that do not appear in Colb. =A0I have =
not idea how it would run with a different data set. =A0But try it and let =
me know.
.
OT run if that reference is not set. =A0If this will be distributed, we can =
use late binding, but not tonight.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
xlUp))
xlUp))
(i, 1), Item:=3DvColB(i, 1)
=3DvColA(i, 1), Item:=3DvColA(i, 1)


Run-time Error '13'
Type Mismatch in

rColA =3D WorksheetFunction.Transpose(vColA)
gary replied to Ron Rosenfeld on 15-Jan-12 01:08 AM
te:
dependent not only on the code, but also on the characteristics of your pa=
rticular machine and environment.
eed. =A0The re-written macro, run against the data that you provided which =
I copied and repeated down to 600,000 rows, took about five minutes to run.
worksheet of mine. =A0If you have calculations which refer to the cells th=
at are being altered, event triggered macros, and other calls on the resour=
ces, that can slow things down. =A0All of those things can be disabled if t=
hey are an issue, but for now let us see if we cannot get something running.
=A0The implementation of AdvancedFilter is an Excel feature, not VBA. =A0A=
lthough it should execute much more quickly than looping through the rows o=
f cells, I do not know of any way to monitor its progress.
Excel especially when dealing with large data bases. =A0I have developed a=
method which I think should run MUCH more quickly, but it is hard to test =
on a small database. =A0It does assume that there are no duplicates in Colu=
mn A, or if there are that you only want to display unique values. =A0Is th=
at a valid assumption?
wn to about 500,000 rows, it runs in less than five seconds. =A0Of course, =
there are only seven entries in ColA that do not appear in Colb. =A0I have =
not idea how it would run with a different data set. =A0But try it and let =
me know.
.
OT run if that reference is not set. =A0If this will be distributed, we can =
use late binding, but not tonight.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
xlUp))
xlUp))
(i, 1), Item:=3DvColB(i, 1)
=3DvColA(i, 1), Item:=3DvColA(i, 1)

I have deleted the duplicates in col A and in Col B.

When I re-ran your macro, I still got the Run-time Error '13'
Type Mismatch in

rColA =3D WorksheetFunction.Transpose(vColA)
Ron Rosenfeld replied to gary on 15-Jan-12 07:39 AM
The presence or absence of duplicates is irrelevant to this error.  I cannot be sure, but in older versions of Excel (prior to 2003) there were limits in the size of an array that you could use worksheetfunction.transpose.  I thought it had been removed, but perhaps not.

How long did the macro run before hitting that error?  Because that is at the end and the next step would be quick.

Try this variation where transposing is not required:

==================================
Option Explicit
Sub PruneColA()
'Requires setting reference (tools/references) to
'  Microsoft Scripting Runtime

Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim dColA As Dictionary, dColB As Dictionary
Dim i As Long
Dim d As Variant

Set dColA = New Dictionary
Set dColB = New Dictionary
Set ws = Worksheets("Sheet2")
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
End With

vColB = rColB
vColA = rColA

For i = LBound(vColB, 1) + 1 To UBound(vColB, 1)
With dColB
If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i

For i = LBound(vColA, 1) + 1 To UBound(vColA, 1)
If Not dColB.Exists(Key:=vColA(i, 1)) Then
With dColA
If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1)
End With
End If
Next i

ReDim vColA(1 To dColA.Count, 1 To 1)
i = 0
For Each d In dColA
i = i + 1
vColA(i, 1) = dColA(d)
Next d

rColA.Offset(rowoffset:=1).ClearContents
Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffset:=1)

rColA = vColA

End Sub
===================================
Ron Rosenfeld replied to gary on 15-Jan-12 07:42 AM
As I wrote, previously, there are problems with AdvancedFilter in very large databases in certain environments.  This may or may not have been resolved in Excel 2010

Also see my latest posting from today.
Ron Rosenfeld replied to gary on 15-Jan-12 07:54 AM
Gary,

If the last version I provided, which does NOT use TRANSPOSE, does not work, we will have to transfer the data to ColA using a loop.  It may be that we are running into one of the limitations of VBA and Excel.
Ron Rosenfeld replied to Ron Rosenfeld on 15-Jan-12 08:19 AM
Well, in Excel 2007 I just made up a test case of 500,000 entries in col a and col b.  There were about 238,000 in col a that were not in col b and the last version worked fine and executed in about 1 minute.
Ron Rosenfeld replied to Ron Rosenfeld on 15-Jan-12 08:43 AM
Empirically testing this theory in my Excel 2007 reveals that worksheetfunction.transpose works OK with 2^16 elements, but returns the type mismatch error with 2^17 elements, so that is probably why you ran into that error.  Avoiding transpose, as I did, should fix it (unless we run into a different limit).
gary replied to Ron Rosenfeld on 15-Jan-12 09:34 AM
were limits in the size of an array that you could use worksheetfunction.tr=
anspose. =A0I thought it had been removed, but perhaps not.
nction.transpose works OK with 2^16 elements, but returns the type mismatch=
error with 2^17 elements, so that is probably why you ran into that error.=
=A0Avoiding transpose, as I did, should fix it (unless we run into a diffe=
rent limit).

Hi Ron,

I ran without transpose and it finished in about 1 minute!

In Col A, a cell contains 0507811951990
That value is not in Col B.
Your macro is displaying it as 5.07811E+11
How can I get it to be displayed as 0507811951990    ?
gary replied to Ron Rosenfeld on 15-Jan-12 10:44 AM
Hi Ron,

My spreadsheet has:

A                                B
0000000021957	0000000022002
0000000022002	0000000032002
0000000031957	0000000032003
0000000032002	0000000042002
0000000032003	0000000052002
0000000042002	0000000052003
0000000052002	0000000062002
0000000052003	0000000102002
0000000061967	0000000121996
0000000061968	0000000142002
0000000062002	0000000152002
0000000081963	0000000162002

Your macro (without Transpose) returns this:

0000000021957
22002
31957
61967
61968
81963
341991
401961
431978
482010
482011

In my spreadsheet:
A2 contains 0000000022002
B1 contains 0000000022002

But your macro results contains 22002
Why are the leading zeroes being dropped?
Ron Rosenfeld replied to gary on 15-Jan-12 02:03 PM
I am glad to hear that the macro is working and not taking hours :-)
The leading zero's are being dropped because Excel is trying to be helpful and interpreting the data as numeric.  We have two choices to change this and retain the speed:
We can format the column as text.
We can custom format the column to "0000000000000"  (thirteen zero's)
The latter retains the numeric characteristics; the former does not, but some Excel functions will still interpret this as a number.  The choice is yours.

Here is how to modify the code to provide for that.  Note the lines near the bottom.

===================================
Option Explicit
Sub PruneColA()
'Requires setting reference (tools/references) to
'  Microsoft Scripting Runtime

Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim dColA As Dictionary, dColB As Dictionary
Dim i As Long
Dim d As Variant

Set dColA = New Dictionary
Set dColB = New Dictionary
Set ws = ActiveSheet
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
End With

vColB = rColB
vColA = rColA

For i = LBound(vColB, 1) + 1 To UBound(vColB, 1)
With dColB
If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i

For i = LBound(vColA, 1) + 1 To UBound(vColA, 1)
If Not dColB.Exists(Key:=vColA(i, 1)) Then
With dColA
If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1)
End With
End If
Next i

ReDim vColA(1 To dColA.Count, 1 To 1)
i = 0
For Each d In dColA
i = i + 1
vColA(i, 1) = dColA(d)
Next d

rColA.Offset(rowoffset:=1).ClearContents
Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffset:=1)

'UNcomment one or the other of the next two lines depending on your preference
'rColA.EntireColumn.NumberFormat = "0000000000000"
rColA.EntireColumn.NumberFormat = "@"

rColA = vColA

End Sub
=======================================
Ron Rosenfeld replied to gary on 15-Jan-12 02:06 PM
Same answer as previous:  format column as text or custom format "0000000000000"
GS replied to GS on 15-Jan-12 02:20 PM
You can further optimize performance as follows...

Option Explicit

Sub StripDupes()
Dim i&, j&, lRows1&, lRows2&, lLastRow&, lCalcMode& 'as long
Dim vRng As Variant, bEventsEnabled As Boolean

lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
If lRows1 > lRows2 Then lLastRow = lRows1 Else lLastRow = lRows2
vRng = Range("A1:B" & lLastRow)

With Application
lCalcMode = .Calculation: .Calculation = xlCalculationManual
bEventsEnabled = .EnableEvents: .EnableEvents = False
.ScreenUpdating = False
End With

For i = UBound(vRng) To 1 Step -1
If Not vRng(i, 1) = "" Then
For j = 1 To UBound(vRng)
If vRng(i, 1) = vRng(j, 2) Then
Cells(i, 1).Delete shift:=xlUp: Exit For
End If
Next 'j
End If
Next 'i

'Cleanup
With Application
.Calculation = lCalcMode
.EnableEvents = bEventsEnabled
.ScreenUpdating = True
End With
End Sub

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc
GS replied to Ron Rosenfeld on 15-Jan-12 02:39 PM
on 1/15/2012, Ron Rosenfeld supposed :

Ron,
I'd appreciate feedback on using my (3rd) posted code on your test
data! I extracted the concept of using the array approach from an app I
have for filtering out rows of data from a data logging output file.
This requires at least xl12 to work due to the amount of data being
just under 1GB. I believe the limit on array size is 2GB but since my
app uses its own instance of Excel there is nothing else running in its
memory space.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc
Ron Rosenfeld replied to GS on 15-Jan-12 03:30 PM
Gary,

When I interrupted it it had been running for 219 seconds.  At that point in time it had eliminated 617 entries from the column A list.

I then started up my "dictionary" routine.  It ran for about 58.6 seconds and eliminated the remaining 260,493 duplicated entries.

To set up the sample data, I enter a formula like:

A1 & B1:  =text(randbetween(1,10^6),"0000000000000")
Fill down to row 500,000.
Then copy/Paste Values

For timing I use the HiRes timer.

I initially tried an approach like yours:
Examine each cell
If the data is invalid, delete the cell and rearrange the rest (delete xlshiftup)

After some thought, I decided it should be faster to
Get all the good data into a sequential array.
Delete ALL the original data
Write back the good data array.

The approach I used, using the dictionary, works pretty fast.  it is disadvantage is that if duplicates in the original data should be retained, it would have to be modified.  (i.e. if there are multiple 0000000123456's in column A, and none of that value in Column B, and the multiple values all need to be retained in column A; and they need to be retained in their original order).  Fortunately, that is not the case.

And if I had Excel 2010, the Advanced Filter might work.  I would filter/copy; then delete the original and write back the copy.  That would work even with duplicates.  But it will not work in Excel 2007 with this data base (and seems to run  slower even with smaller databases).
gary replied to Ron Rosenfeld on 15-Jan-12 03:28 PM
te:
=A0 =A0 =A0 =A0B
l and interpreting the data as numeric. =A0We have two choices to change th=
is and retain the speed:
s)
some Excel functions will still interpret this as a number. =A0The choice i=
s yours.
the bottom.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
xlUp))
xlUp))
(i, 1), Item:=3DvColB(i, 1)
=3DvColA(i, 1), Item:=3DvColA(i, 1)
rence
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D- Hide quoted text -

I am using:

rColA.EntireColumn.NumberFormat =3D "0000000000000"
'rColA.EntireColumn.NumberFormat =3D "@"

Because the result still contains 0000000022002 (which is in Col B)
and this makes the result suspect.
gary replied to Ron Rosenfeld on 15-Jan-12 03:41 PM
nt in time it had eliminated 617 entries from the column A list.
ds and eliminated the remaining 260,493 duplicated entries.
lete xlshiftup)
advantage is that if duplicates in the original data should be retained, it=
would have to be modified. =A0(i.e. if there are multiple 0000000123456's =
in column A, and none of that value in Column B, and the multiple values al=
l need to be retained in column A; and they need to be retained in their or=
iginal order). =A0Fortunately, that is not the case.
r/copy; then delete the original and write back the copy. =A0That would wor=
k even with duplicates. =A0But it will not work in Excel 2007 with this data b=
ase (and seems to run =A0slower even with smaller databases).

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D

I am using:

rColA.EntireColumn.NumberFormat =3D "0000000000000"
'rColA.EntireColumn.NumberFormat =3D "@"

But the results have:

0000000021957 in A1 (but that value is NOT in Col B)
0000000022002 in A2 (but that value IS in Col B).

and this makes the results suspect.
Jim Cone replied to Ron Rosenfeld on 15-Jan-12 04:31 PM
Ron,
For what it is worth...
Since, a variant filled with an array of Range.Values is one based and a Dictionary object is one
based,
the two code lines below should probably omit the "+ 1".

For i = LBound(vColB, 1) + 1 To UBound(vColB, 1)
For i = LBound(vColA, 1) + 1 To UBound(vColA, 1)
'---
Jim Cone
Portland, Oregon USA
http://www.mediafire.com/PrimitiveSoftware
(XL Companion add-in:  compares, matches, counts, lists, finds, deletes...)
gary replied to Jim Cone on 15-Jan-12 05:28 PM
From the responses and their results, I think it'd be best to re-state
my OP:

I need a list of the values in Col A that are NOT found in Col B.

My spreadsheet contains:

A                              B
0000000021957	0000000022002
0000000022002	0000000032002
0000000031957	0000000032003
0000000032002	0000000042002
0000000032003	0000000052002
0000000042002	0000000052003
0000000052002	0000000062002
0000000052003	0000000102002
0000000061967	0000000121996
0000000061968	0000000142002
0000000062002	0000000152002
0000000081963	0000000162002
0000000102002	0000000481994
0000000121996	0000000481995
0000000142002	0000000481996
0000000152002	0000000481997
0000000162002	0000000481998
0000000341991	0000000481999
0000000401961	0000000482000
0000000431978	0000000482002
0000000481994	0000000482008
0000000481995	0000000482009
0000000481996	0000000631995
0000000481997	0000000631996
0000000481998	0000000631997
0000000481999	0000000631998
0000000482000	0000000631999
0000000482002	0000000632000
0000000482008	0000000632001
0000000482009	0000000642000
0000000482010	0000000681994
0000000482011	0000000681995
0000000491959	0000000681996
0000000511958	0000000681997
0000000591982	0000000681998
0000000591983	0000000911997
0000000591984	0000000962001
0000000591990	0000000962003
0000000591991	0000001001997
0000000611962	0000001082006
0000000631993	0000001381994
0000000631995	0000001381995
0000000631996	0000002001994
0000000631997	0000002122007
0000000631998	0000002291995
0000000631999	0000002291996
0000000632000	0000002291997
0000000632001	0000002601999
0000000641984	0000002602000
0000000642000	0000002641998
0000000661957	0000002731994
0000000681994	0000003031994
0000000681995	0000003161994
0000000681996	0000003161995
0000000681997	0000003161996
0000000681998	0000003161997
0000000691959	0000003392009
0000000751990	0000003901998
0000000811961	0000004062006
0000000811991	0000004091994
0000000811992	0000004091995
0000000811993	0000004131998
0000000821959	0000004231998
0000000851958	0000004371995
0000000881990	0000004521995
0000000911997	0000004522000
0000000951959	0000004541997
0000000962001	0000004542000
0000000962003	0000004542001
0000001001997	0000005001998
0000001031957	0000005002002
0000001082006	0000005121997
0000001121970	0000005181994
0000001121973	0000005181998
0000001121974	0000005381995
0000001121975	0000005381996
0000001121976	0000005471994
0000001181960	0000005471995
0000001191952	0000005471996
0000001311961	0000005581996
0000001341959	0000005622000
0000001381994	0000005622001
0000001381995	0000005622002
0000001411981	0000005971994
0000001411982	0000006202004
0000001411983	0000006491995
0000001411991	0000006511994
0000001411992	0000006511996
0000001421956	0000006571994
0000001451982	0000006571995
0000001471982	0000006571996
0000001541990	0000007291994
0000001561957	0000007291996
0000001631971	0000007291998
0000001631972	0000007321994
0000001681959	0000007341997
0000001711991	0000007341998
0000001781953	0000007432003
0000001871955	0000007751995
0000001881952	0000007931996
0000001881955	0000008071994
0000001881959	0000008071995
0000001921986	0000008071997
0000001951958	0000008432000
0000001981958	0000008432001
0000002001954	0000008432002
0000002001955	0000008631995
0000002001994	0000008631996
0000002011957	0000008662000
0000002031958	0000008681998
0000002101957	0000008702003
0000002122007	0000008702004
0000002191957	0000008702005
0000002191963	0000008702006
0000002211979	0000008771994
0000002211980	0000008771995
0000002241960	0000008771996
0000002251958	0000008771997
0000002271976	0000008771998
0000002281981	0000008772000
0000002281982	0000008772003
0000002291982	0000008961994
0000002291983	0000008991996
0000002291984	0000008992005
0000002291985	0000008992006
0000002291986	0000009061997
0000002291987	0000009061998
0000002291993	0000009062006
0000002291995	0000009062007
0000002291996	0000009062009
0000002291997	0000009171995
0000002331955	0000009171997
0000002331961	0000009172003
0000002381990	0000009221994
0000002391960	0000009221996
0000002411958	0000009361995
0000002411967	0000009362003
0000002421958	0000009362004
0000002461981	0000009401994
0000002461982	0000009581998
0000002461983	0000009691994
0000002531986	0000009691995
0000002571990	0000009691996
0000002571991	0000009821996
0000002591977	0000009831996
0000002601963	0000009831997
0000002601999	0000009971994
0000002602000	0000009971995
0000002641998	0000009971996
0000002691990	0000009971997
0000002691991	0000009971998
0000002711961	0000009971999
0000002731990	0000009972000
0000002731992	0000009972001
0000002731993	0000009972002
0000002731994	0000009972003
0000002741956	0000009972004
0000002741990	0000009972005
0000002871977	0000009972006
0000002891956	0000010001998
0000002921961	0000010011994
0000002971956	0000010011995
0000002971991	0000010321994
0000002981959	0000010361997
0000003001989	0000010411996
0000003001990	0000010411997
0000003021990	0000010411998
0000003031994	0000010411999
GS replied to Ron Rosenfeld on 15-Jan-12 05:32 PM
Ron Rosenfeld wrote on 1/15/2012 :

Thanks, Ron.
I suspected it would take a long time since it writes the worksheet in
the loop. My source code does everything in memory using the arrays and
sett matches to an empty string. Note that the source data is read n
from a data logger output file, NOT from a worksheet. I just dump the
result back into a blank sheet. I cannot use the dictionary because I
need to preserve duplicates. The match criteria is an ambient
temperature value in a line of text at a specific position and so if
it is not '>=' then I reset the array element to an empty string, then
use the Filter() function to dump the resulting data into a worksheet.
Since there is only 1 array to loop once only the process is really
fast.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc
GS replied to Ron Rosenfeld on 15-Jan-12 07:11 PM
Your code took 46 secs to run your sample data on my machine. That's
amazing! I am curious, now, how it performs if we did similar using a
Collection so we can have duplicates in ColA!

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc
Jim Cone replied to Jim Cone on 15-Jan-12 07:47 PM
Ron,
Actually,  I misspoke, the Dictionary is zero based, but the recommendation stands.
Also, the two lines near the end should read...
rColA.ClearContents
Set rColA = rColA.Resize(dColA.Count, 1)
'---
Jim Cone
GS replied to GS on 15-Jan-12 07:58 PM
GS expressed precisely :

I modified my sub to use your idea to use Dictionary, but NOT put colA
in a dictionary and it shaved 11 secs off the ET...

Sub StripDupes()
Dim i&, j&, lRows1&, lRows2& lMatchesFound& 'as long
Dim vRngA, vRngB, vRngOut() 'as variant
Dim dRngB As Dictionary

lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
vRngA = Range("A1:A" & lRows1): vRngB = Range("B1:B" & lRows2)

Set dRngB = New Dictionary
Debug.Print Now()
For j = LBound(vRngB) To UBound(vRngB)
With dRngB
If Not .Exists(Key:=vRngB(j, 1)) Then _
.Add Key:=vRngB(j, 1), Item:=vRngB(j, 1)
End With
Next 'j
sTemp = Mid$(sTemp, 2)
'  Debug.Print Now()

For i = LBound(vRngA) To UBound(vRngA)
If dRngB.Exists(Key:=vRngA(i, 1)) Then _
vRngA(i, 1) = "": lMatchesFound = lMatchesFound + 1
Next 'i
'  Debug.Print Now()

j = 0: ReDim vRngOut(UBound(vRngA) - lMatchesFound, 1 To 1)
For i = LBound(vRngA) To UBound(vRngA)
If Not vRngA(i, 1) = "" Then
vRngOut(j, 1) = vRngA(i, 1): j = j + 1
End If
Next 'i
'  Debug.Print Now()

Range("A1:A" & lRows1).ClearContents
Range("A1").Resize(UBound(vRngOut), 1) = vRngOut

Debug.Print Now()
End Sub

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc
GS replied to GS on 15-Jan-12 08:32 PM
ET on my machine was 35 secs as per timing method used as shown. I
did not think this task deserved the trouble to setup and use
cHiResTimer class.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc