Data Structures And Number Systems
© Copyright Brian Brown, 1984-2000. All rights
reserved.
FILE HANDLING
Often, data which is processed by a program must remain stored
after the program terminates. This allows later retrieval, as
well as providing a permanent record. Data cannot be stored in
the memory of the computer system, as this is volatile and is
lost when the computer is turned off.
Permanent storage is implemented using magnetic media such as floppy and hard disk systems. The binary data is encoded into magnetic patterns which alter the magnetic state of ferrite particles on the disk surface. A special drive and controller unit provides the interface mechanism which allows the transfer of data between the disk and the computer systems.
Associated data items stored on a disk system is known as a file. A file consists of many data items, normally grouped into blocks called records. The main components of a file are,
+-----------+ | FILE | +-----+-----+ | +-----+-----+ | RECORDS *| +-----+-----+ | +-----+-----+ | ITEMS *| +-----------+
A data item is an individual field of a record, ie, an integer, character, floating point number etc. Data items are collected together to form records. We have already introduced records in the previous section. All the records used by the program are collected together to form a file.
There are several different methods used to format the data stored on magnetic media.
+--+-+----+-+--+----+-+--+-+------+-+ +--+-+----+-+ | | | | | | | | | | | | | | | | | +--+-+----+-+--+----+-+--+-+------+-+ +--+-+----+-+ Record 1 | Record 2| Record 3 | .. | Record n | Note that records can be different lengths, as some items within records may not contain any valid information, thus are not stored. This storage format is known as sequential.
+--+-+------+-+--+-+------+-+--+-+------+-+ +--+-+------+-+ | | | | | | | | | | | | | | | | | | +--+-+------+-+--+-+------+-+--+-+------+-+ +--+-+------+-+ | Record 1 | Record 2 | Record 3 | .. | Record n | Note that all records are the same length. Data items within records are padded out till they are all the same length. This storage format is known as direct (also called random) access.
PROCESSING SERIAL FILES
Sequential files pose a real problem in updating. Because each
record is not the same length, the new record information cannot
be written back to disk on the same file (as this might be longer
and thus overwrite part of the next record).
To overcome this problem, two files are used, a master file and a new file. Each record is read in, modified as necessary, then written to a new file.
The new file is then renamed as the master file, and the old master becomes part of the backup process.
The main areas associated with processing serial files are,
Information in sequential files is normally stored according to a key; in ascending or descending order (ie; sorted).
Processing Input Files
A sequential file normally has the following composition.
+-----------+ | FILE | +-----+-----+ | +-------------------+--------------------+ +-----+-----+ +-----+-----+ +-----+-----+ | HEADER | | BODY | | TRAILER | +-----------+ +-----+-----+ +-----------+ | +-----+-----+ | RECORDS *| +-----------+
Processing the records in the file will produce a program structure which looks like,
+-----------+ | FILE | | PROGRAM | +-----+-----+ | +-------------------+--------------------+ +-----+-----+ +-----+-----+ +-----+-----+ | PROCESS | | PROGRAM | | PROCESS | | HEADER | | BODY | | TRAILER | +-----------+ +-----+-----+ +-----------+ | +-----+-----+ | PROCESS *| | RECORDS | +-----------+
The pseudo-code developed from the above structure chart is,
File program Begin process header read record Process record Begin while record != trailer record process record read record endwhile Process record end process trailer record File program end
Updating master files
The most common form of updating is known as the father/son
process. The process of updating is done as follows,
1) the master file is sorted by a particular key 2) the updates are stored in a file using the same key as the master 3) update transactions fall into three categories - amendments - deletions - new additions 4) the transaction file is processed in conjunction with the master file by attempting to match respective records having the same keys. 5) the transactions are written to a new master file
PROCESSING DIRECT FILES
Records in direct files are all the same length, thus the offset
into the file for a given record can be calculated if you know
how long each record is.
Each record is numbered, the number being the key (and can be generated by a variety of means, typically hashing).
Direct files are easier to update and modify as all work can be carried out on the master (or a copy of the master!). Records marked for deletion can be removed by packing the file (moving all the records up into the space used by non-records). Record insertions can be appended to the file.
The sorting of information stored in files is essential for fast accessing and updating. All operations on files are improved if the information is sorted according to a key.
Sequential Sort
If the data in a file is unsorted, it will need to be sorted
before any processing can continue. The sorting of records
involves comparing the records in a file using one (or more) of
the items of the record (called keys). The desired order may be
ascending or descending.
The algorithm used is described below,
position = last record - 1 while position greater than or equal to first record begin count = 1 while count less than or equal to position begin if record at count > record at count + 1 then swap records count = count + 1 end position = position - 1 end
Example The following numeric information is stored in a sequential file. Using a sequential sort, sort the information in ascending order, writing the sorted data to a new output file.
3,7,6,8,2,9,1,5,10,4
The developed program in Pascal, using a sequential file called NUMBERS, is,
program sort( numbers, newnumb ); var stop_counter, count, temp : integer; numbers, newnumb : file of integer; storage : array[1..10] of integer; begin assign( numbers, 'numbers.org' ); assign( newnumb, 'numbers.srt' ); reset( numbers ); rewrite( newnumb ); for count:= 1 to 10 do read( numbers, storage[ count ] ); stop_counter := 10 - 1; while stop_counter >= 1 do begin count := 1; while count <= stop_counter do begin if storage[ count ] > storage[ count + 1 ] then begin temp := storage[count]; storage[count] := storage[count + 1]; storage[count + 1] := temp; end; count := count + 1; end; stop_counter := stop_counter - 1; end; for count:=1 to 10 do write( newnumb, storage[ count ] ); close( numbers ); close( newnumb ); end.
If there is not enough internal storage available to hold all the records in the file, temporary output files must be used.
A number of records (which depends on the memory available) are read from the input file, sorted accordingly, then written to an output file. The process continues, using several alternate output files, till the input file has been processed.
Records are then read alternatively from each of the output files used, internally sorted, then written to other output files. Once all the records are sorted, the remaining files are merged to produce a single output file.
Bubble Sort
This is an improved sequential sort by reducing the number of
record movements.
It works similar to the sequential sort above, but when a pair is found out of order,
The following program illustrates the modifications necessary to the previous sort program for implementing a bubble sort.
program bubsort ( numbers, newnumb ); var marker, count, temp : integer; numbers, newnumb : file of integer; store : array[1..10] of integer; begin assign( numbers, 'numbers.org' ); assign( newnumb, 'numbers.srt' ); reset( numbers ); rewrite( newnumb ); for count:= 1 to 10 do read( numbers, store[ count ] ); marker := 1; while marker < 10 do begin if store[ marker ] > store[ marker + 1 ] then begin temp := store[marker]; store[marker] := store[marker + 1]; store[marker + 1] := temp; count := marker - 1; while (count >= 1) and (store[count] > store[count+1]) do begin temp := store[count]; store[count] := store[count+1]; store[count+1] := temp; count := count - 1; end; end; marker := marker + 1; end; for count:=1 to 10 do write( newnumb, store[ count ] ); close( numbers ); close( newnumb ); end.
Key (tag) Sorting
This is a means of obtaining a sorted file without the repetitive
and time-consuming input/output operations required by the
sequential sort method. It is only applicable to direct access
files.
In a key sort, the input file is scanned, record by record, and a table is built of the relative position of each record within the file. The following diagram illustrates the table format.
+-------+----------+ | Key | Record | | Value | Position | +-------+----------+ | 10 | 1 | | 2 | 2 | | 18 | 3 | | 6 | 4 | | 27 | 5 | | 8 | 6 | +-------+----------+
The table is then sorted into the desired key sequence. The table now looks like,
+-------+----------+ | Key | Record | | Value | Position | +-------+----------+ | 2 | 2 | | 6 | 4 | | 8 | 6 | | 10 | 1 | | 18 | 3 | | 27 | 5 | +-------+----------+
The records are retrieved in the order indicated by the table and written to a new output file.
Home | Other Courses | Feedback | Notes | Tests
© Copyright Brian Brown, 1984-2000. All rights
reserved.