File Handling

Site hosted by Angelfire.com: Build your free website today!

Data Structures And Number Systems
© Copyright Brian Brown, 1984-2000. All rights reserved.

Part 4

FILE HANDLING
Often, data which is processed by a program must remain stored after the program terminates. This allows later retrieval, as well as providing a permanent record. Data cannot be stored in the memory of the computer system, as this is volatile and is lost when the computer is turned off.

Permanent storage is implemented using magnetic media such as floppy and hard disk systems. The binary data is encoded into magnetic patterns which alter the magnetic state of ferrite particles on the disk surface. A special drive and controller unit provides the interface mechanism which allows the transfer of data between the disk and the computer systems.

Associated data items stored on a disk system is known as a file. A file consists of many data items, normally grouped into blocks called records. The main components of a file are,

 
                          +-----------+ 
                          |   FILE    | 
                          +-----+-----+ 
                                | 
                          +-----+-----+ 
                          | RECORDS  *| 
                          +-----+-----+ 
                                | 
                          +-----+-----+ 
                          |  ITEMS   *| 
                          +-----------+

A data item is an individual field of a record, ie, an integer, character, floating point number etc. Data items are collected together to form records. We have already introduced records in the previous section. All the records used by the program are collected together to form a file.

There are several different methods used to format the data stored on magnetic media.

one item after the other with no gaps,

compact, little wasted space
hard to tell where next item starts
slow access times, must read all previous items in order


	+--+-+----+-+--+----+-+--+-+------+-+     +--+-+----+-+ 
	|  | |    | |  |    | |  | |      | |     |  | |    | | 
	+--+-+----+-+--+----+-+--+-+------+-+     +--+-+----+-+ 
	 Record 1   | Record 2| Record 3    |  .. | Record n  | 
 
	 Note that records can be different lengths, as some 	
	items within records may not contain any valid 						
	information, thus are not stored. This storage format 
	is known as sequential.

format data into common length blocks with gaps as required.

more space used
easy to calculate start of next item
fast access time for any item


	+--+-+------+-+--+-+------+-+--+-+------+-+     +--+-+------+-+ 
	|  | |      | |  | |      | |  | |      | |     |  | |      | | 
	+--+-+------+-+--+-+------+-+--+-+------+-+     +--+-+------+-+ 
	| Record 1    | Record 2    | Record 3    |  .. | Record n    | 

	 Note that all records are the same length. Data items 
	within records are padded out till they are all the 
	same length. This storage format is known as direct 			
	(also called random) access.

PROCESSING SERIAL FILES
Sequential files pose a real problem in updating. Because each record is not the same length, the new record information cannot be written back to disk on the same file (as this might be longer and thus overwrite part of the next record).

To overcome this problem, two files are used, a master file and a new file. Each record is read in, modified as necessary, then written to a new file.

The new file is then renamed as the master file, and the old master becomes part of the backup process.

The main areas associated with processing serial files are,

processing single input files
updating master files

Information in sequential files is normally stored according to a key; in ascending or descending order (ie; sorted).

Processing Input Files
A sequential file normally has the following composition.

 
                          +-----------+ 
                          |   FILE    | 
                          +-----+-----+ 
                                | 
            +-------------------+--------------------+ 
      +-----+-----+       +-----+-----+        +-----+-----+ 
      |  HEADER   |       |   BODY    |        |  TRAILER  | 
      +-----------+       +-----+-----+        +-----------+ 
                                | 
                          +-----+-----+ 
                          |  RECORDS *| 
                          +-----------+

Processing the records in the file will produce a program structure which looks like,

 
                          +-----------+ 
                          |    FILE   | 
                          |  PROGRAM  | 
                          +-----+-----+ 
                                | 
            +-------------------+--------------------+ 
      +-----+-----+       +-----+-----+        +-----+-----+ 
      |  PROCESS  |       |  PROGRAM  |        |  PROCESS  | 
      |  HEADER   |       |   BODY    |        |  TRAILER  | 
      +-----------+       +-----+-----+        +-----------+ 
                                | 
                          +-----+-----+ 
                          |  PROCESS *| 
                          |  RECORDS  | 
                          +-----------+

The pseudo-code developed from the above structure chart is,

 
	File program Begin 
		process header 
		read record 
		Process record Begin 
			while record != trailer record 
				process record 
				read record 
			endwhile 
		Process record end 
		process trailer record 
	File program end

Updating master files
The most common form of updating is known as the father/son process. The process of updating is done as follows,

 
	1) the master file is sorted by a particular key 
 
	2) the updates are stored in a file using the same key 
	   as the master 

	3) update transactions fall into three categories 
		- amendments 
		- deletions 
		- new additions 
 
	4) the transaction file is processed in conjunction 	
	   with the master file by attempting to match 						
	   respective records having the same keys. 
 
	5) the transactions are written to a new master file

PROCESSING DIRECT FILES
Records in direct files are all the same length, thus the offset into the file for a given record can be calculated if you know how long each record is.

Each record is numbered, the number being the key (and can be generated by a variety of means, typically hashing).

Direct files are easier to update and modify as all work can be carried out on the master (or a copy of the master!). Records marked for deletion can be removed by packing the file (moving all the records up into the space used by non-records). Record insertions can be appended to the file.

The sorting of information stored in files is essential for fast accessing and updating. All operations on files are improved if the information is sorted according to a key.

Sequential Sort
If the data in a file is unsorted, it will need to be sorted before any processing can continue. The sorting of records involves comparing the records in a file using one (or more) of the items of the record (called keys). The desired order may be ascending or descending.

The algorithm used is described below,

 
	position = last record - 1 
	while position greater than or equal to first record 
	begin 
		count = 1 
		while count less than or equal to position 
		begin 
			if record at count > record at count + 1 
				then swap records 
			count = count + 1 
		end 
		position = position - 1 
	end

Example The following numeric information is stored in a sequential file. Using a sequential sort, sort the information in ascending order, writing the sorted data to a new output file.


			3,7,6,8,2,9,1,5,10,4

The developed program in Pascal, using a sequential file called NUMBERS, is,

 
	program sort( numbers, newnumb ); 
	var 
		stop_counter, count, temp : integer; 
		numbers, newnumb : file of integer; 
		storage : array[1..10] of integer; 
 
	begin 
		assign( numbers, 'numbers.org' ); 
		assign( newnumb, 'numbers.srt' ); 
		reset( numbers ); 
		rewrite( newnumb ); 
 
		for count:= 1 to 10 do 
			read( numbers, storage[ count ] ); 
 
		stop_counter := 10 - 1; 
     		while stop_counter >= 1 do 
		begin 
			count := 1; 
			while count <= stop_counter do 
			begin 
				if storage[ count ] > storage[ count + 1 ] then 
				begin 
					temp := storage[count]; 
					storage[count] := storage[count + 1]; 
					storage[count + 1] := temp; 
				end; 
				count := count + 1; 
			end; 
			stop_counter := stop_counter - 1; 
		end; 
 
		for count:=1 to 10 do 
			write( newnumb, storage[ count ] ); 
		close( numbers ); 
		close( newnumb ); 
	end.

If there is not enough internal storage available to hold all the records in the file, temporary output files must be used.

A number of records (which depends on the memory available) are read from the input file, sorted accordingly, then written to an output file. The process continues, using several alternate output files, till the input file has been processed.

Records are then read alternatively from each of the output files used, internally sorted, then written to other output files. Once all the records are sorted, the remaining files are merged to produce a single output file.

Bubble Sort
This is an improved sequential sort by reducing the number of record movements.

It works similar to the sequential sort above, but when a pair is found out of order,

the current place is marked
we backtrack comparing the smaller number with the previous one
if its smaller, we exchange and keep backtracking till its correct place is found.
once stored, we continue on from where we left off

The following program illustrates the modifications necessary to the previous sort program for implementing a bubble sort.

 
program bubsort ( numbers, newnumb ); 
var 
	marker, count, temp : integer; 
	numbers, newnumb : file of integer; 
	store : array[1..10] of integer; 
 
begin 
	assign( numbers, 'numbers.org' ); 
	assign( newnumb, 'numbers.srt' ); 
	reset( numbers ); 
	rewrite( newnumb ); 
 
	for count:= 1 to 10 do 
		read( numbers, store[ count ] ); 
 
	marker := 1; 
	while marker < 10 do 
	begin 
		if store[ marker ] > store[ marker + 1 ] then 
		begin 
			temp := store[marker]; 
			store[marker] := store[marker + 1]; 
			store[marker + 1] := temp; 
			count := marker - 1; 
			while (count >= 1) and (store[count] > store[count+1]) do 
			begin 
				temp := store[count]; 
				store[count] := store[count+1]; 
				store[count+1] := temp; 
				count := count - 1; 
			end; 
		end; 
		marker := marker + 1; 
	end; 
 
	for count:=1 to 10 do 
		write( newnumb, store[ count ] ); 
 
	close( numbers ); 
	close( newnumb ); 
end.

Key (tag) Sorting
This is a means of obtaining a sorted file without the repetitive and time-consuming input/output operations required by the sequential sort method. It is only applicable to direct access files.

In a key sort, the input file is scanned, record by record, and a table is built of the relative position of each record within the file. The following diagram illustrates the table format.

 
	+-------+----------+ 
	|  Key  |  Record  | 
	| Value | Position | 
	+-------+----------+ 
	|  10   |     1    | 
	|   2   |     2    | 
	|  18   |     3    | 
	|   6   |     4    | 
	|  27   |     5    | 
	|   8   |     6    | 
	+-------+----------+

The table is then sorted into the desired key sequence. The table now looks like,

 
	+-------+----------+ 
	|  Key  |  Record  | 
	| Value | Position | 
	+-------+----------+ 
	|   2   |     2    | 
	|   6   |     4    | 
	|   8   |     6    | 
	|  10   |     1    | 
	|  18   |     3    | 
	|  27   |     5    | 
	+-------+----------+

The records are retrieved in the order indicated by the table and written to a new output file.

Home | Other Courses | Feedback | Notes | Tests