Scor­pio News

  

July–September 1987 – Volume 1. Issue 3.

Page 16 of 67

consequently doesn’t require buffers where the physical sector size is 128 bytes (Blocking and Deblocking are discussed later). When used, CP/M 3 discards these buffers and overwrites the date in them on a least recently used (LRU) basis so that when access is required to a recently used sector, CP/M 3 reads the data from memory rather than from the disk giving an appropriate increase in performance. CP/M 3 doesn’t use the buffers at all if it knows that one or more complete physical sectors are to be read and hence deblocking of the data will not be required. In this case, the BIOS is instructed to read the disk data directly to the TPA at the DMA address.

CP/M 3 directory hashing

The HASH parameter contains the address of a table used for directory hashing. Hashing is a term used to refer to the process of converting a string of characters into s single number that is unique to that string. There are a vast number of methods of doing this, all varying in complexity. The problems encountered with hashing are guaranteeing that the number produced is in fact unique to the particular string involved (and no other) while keeping the number small enough to be manageable.

The problem of uniqueness can be explained with the use of an example. A simple way of converting a character string into a number is to add the ASCII values of the characters together. Taking combinations of the characters A-D for simplicity, there are 16 permutations of those characters that will give the same result. If we then permit characters to repeat, the number of combinations giving identical results increases still further. A few examples are given below, values are in decimal:

“ABCD” = 65 + 66 + 67 + 68 = 266
“BDAC” = 66 + 68 + 65 + 67 = 266
“AADD” = 65 + 65 + 68 + 68 = 266
“ACCC” = 65 + 67 + 67 + 67 = 266

One solution to this problem is to weight the character values differently depending upon which position they occupy in the string. If we were to assign values to the characters such that A=1, B=2 etc., and multiply the character values by multiples of 27 depending upon their position, the same strings as we used above would give different results as shown below:

“ABCD” = 1 + (2 * 27) + (3 * 54) + (4 * 81) = 541
“BDAC” = 2 + (4 * 27) + (1 * 54) + (3 * 81) = 407
“AADD” = 1 + (1 * 27) + (4 * 54) + (4 * 81) = 568
“ACCC” = 1 + (3 * 27) + (3 * 54) + (3 * 81) = 487

This example is fine as far as it goes but we want to hash a directory entry of 11 characters. Using the method described above, the lowest number produced will be 1486 for as file “AAAAAAA.AAA” and the highest number will be 28636 for file “ZZZZZZZZ.ZZZ”. However, filenames can also contain figures, spaces and some other characters in addition to upper case letters. CP/M 3 needs to include the user area byte and both extent bytes (EXT and S2) so that it can differentiate between files with the same name in different user areas and also between different extents of the same file. All of this serves to increase the range of numbers generated as a result of hashing.

In CP/M 3, Digital Research hash the file name, user area and extent bytes into a four byte number which constitutes the hash table entry for the directory entry. The user area number is stored as the 4 least significant bits (lsbs) of the first byte. The file name is reduced to an 18 bit number of which the lower 16 bits are stored in the two middle bytes of the hash table entry and the 2 most significant bits (msbs) are stored in the 2 msbs of the first byte. The extent and S2 bytes are combined to form a directory entry number for the file such that the first entry number is 0 and directory entry number ‘n’ is numbered a-1. This entry number is truncated to form a 9 bit value and is stored so that the msb is held as the 6th bit (bit 5) of the first byte of the hash table entry and the lower 8 bits are stored in the last byte. The method of hashing used on

Page 16 of 67