* ------------------------------------------------------------------- 
File:		find_duplicate_records.sps
Author:   	Bruce Weaver, weaverb@mcmaster.ca
Date:		20-Dec-2001
Notes:	Use MATCH FILES to check for double entry .
* ------------------------------------------------------------------- .

* Here's an example of how to find duplicate records using a method 
* suggested by Johannes Hartig, one of the regular posters to the  
* newsgroup comp.soft-sys.stat.spss.  You can find his post by searching
* for "SPSS query regarding duplicate cases" at http://groups.google.com.

DATA LIST LIST /ID (f2.0) a (f2.0) b(f2.0) c (f2.0) Y (f2.0) .
BEGIN DATA.
1 1 1 1 14
1 1 1 1 14
1 1 1 2 12
1 1 2 1 17
1 1 2 2 20
1 1 3 1 12
1 1 3 2 10
2 1 1 1 11
2 1 1 2 13
2 1 2 1 12
2 1 2 2 16
2 1 3 1 11
2 1 3 2 13
2 1 3 2 13
3 1 1 1 12
3 1 1 2 13
3 1 2 1 15
3 1 2 2 18
3 1 3 1 15
3 1 3 2 13
4 2 1 1 11
4 2 1 2 16
4 2 2 1 13
4 2 2 2 14
4 2 3 1 15
4 2 2 1 13
4 2 3 2 17
5 2 1 1  9
5 2 1 2 11
5 2 2 1 13
5 2 2 2 14
5 2 3 1 12
5 2 3 2  8
6 2 1 1 13
6 2 1 2 12
6 2 2 2 11
6 2 2 1  8
6 2 2 2 11
6 2 3 1  9
6 2 3 2  8
END DATA.

* Check for duplicate records by using MATCH FILES .
* First need to sort by ID etc.

sort cases by id a b c y .

MATCH FILES file=* 
	/BY all  / FIRST = firstrec .
EXEC .

val lab firstrec
 0 '0- Duplicate record'
 1 '1- First record'.
freq firstrec.

* There are 4 duplicate records .
* Delete the duplicates.

use all.
filter off.
select if firstrec.
exe.

freq firstrec.

* All duplicate records have been removed .

* ------------------------------------------------------------------- .