* ------------------------------------------------------------------- File: find_duplicate_records.sps Author: Bruce Weaver, weaverb@mcmaster.ca Date: 20-Dec-2001 Notes: Use MATCH FILES to check for double entry . * ------------------------------------------------------------------- . * Here's an example of how to find duplicate records using a method * suggested by Johannes Hartig, one of the regular posters to the * newsgroup comp.soft-sys.stat.spss. You can find his post by searching * for "SPSS query regarding duplicate cases" at http://groups.google.com. DATA LIST LIST /ID (f2.0) a (f2.0) b(f2.0) c (f2.0) Y (f2.0) . BEGIN DATA. 1 1 1 1 14 1 1 1 1 14 1 1 1 2 12 1 1 2 1 17 1 1 2 2 20 1 1 3 1 12 1 1 3 2 10 2 1 1 1 11 2 1 1 2 13 2 1 2 1 12 2 1 2 2 16 2 1 3 1 11 2 1 3 2 13 2 1 3 2 13 3 1 1 1 12 3 1 1 2 13 3 1 2 1 15 3 1 2 2 18 3 1 3 1 15 3 1 3 2 13 4 2 1 1 11 4 2 1 2 16 4 2 2 1 13 4 2 2 2 14 4 2 3 1 15 4 2 2 1 13 4 2 3 2 17 5 2 1 1 9 5 2 1 2 11 5 2 2 1 13 5 2 2 2 14 5 2 3 1 12 5 2 3 2 8 6 2 1 1 13 6 2 1 2 12 6 2 2 2 11 6 2 2 1 8 6 2 2 2 11 6 2 3 1 9 6 2 3 2 8 END DATA. * Check for duplicate records by using MATCH FILES . * First need to sort by ID etc. sort cases by id a b c y . MATCH FILES file=* /BY all / FIRST = firstrec . EXEC . val lab firstrec 0 '0- Duplicate record' 1 '1- First record'. freq firstrec. * There are 4 duplicate records . * Delete the duplicates. use all. filter off. select if firstrec. exe. freq firstrec. * All duplicate records have been removed . * ------------------------------------------------------------------- .