File Management

File management can be adapted to the resources available in the operating environment. To manage files efficiently, it is important to know

Managing the number of open files

In many of the environments in which Zim runs, there is an operating system limit on the total number of files that can be open at any one time (per task or for the entire operating system). In some cases, the definition of file includes each use of a device, such as terminal or a printer, for reading and writing. The operating system limit affects Zim's use of its own directories, its entity sets, relationships, documents, and compiled programs, and possibly, its use of the terminal, printers, and so on.

Zim manages its use of files in order to be able to function within the limitations imposed by the operating system. The management of line-oriented files is reasonably straight forward. For example, when you invoke a non-compiled application program, the current program file is closed before the new program file is opened. As a result, there is, at any one time, only one file being used for reading commands. Output is directed to two files (by the SET OUTPUT and SET TRACE OUTPUT commands). When a SET OUTPUT or a SET TRACE OUTPUT command is executed, the current output or trace output file is closed before the new file is opened. Each error causes the error file (containing templates for the Zim error messages) to be opened. The errors file is closed after an error message is produced.

However, the main use of files involves block-oriented files such as directories, entity sets, relationships with fields, and compiled programs. Each of these objects has a corresponding operating system file. For these files, Zim maintains a pool of file control blocks. entity sets and relationships file are logically opened and closed around each command. At the end of a command, all these files are marked as no longer being in use; they are, however, left open as far as the operating system is concerned. During the execution of the next command, if a required file is already open, it is marked to show that the file is now in use. If the file is not open, then an unused file slot in the pool is sought. If a slot is not found, a file that is open, but not actually in use, is closed to free a file control block. The required file is then opened. In this way, any number of files can be used during a session while only a fixed number are actually open at one time.

The files configuration option determines the size of the file pool (i.e. the maximum number of block-oriented files that can be open at any one time). Within operating environments that place a limit on this number, the files setting must be lower than the operating environment's upper limit. The operating environment limit minus the files setting is the number of slots that are available for line-oriented files such as documents, terminals, and work files. For more information about the performance implications of the files configuration option, see Increasing Speed: Maximizing Memory Use.

Estimating File Sizes

When managing files, it is valuable to know the amount of space that an existing file currently occupies, and how much space a new file will occupy in the future. The method of estimating a file's size depends on the use of the file. This section describes the techniques for estimating the size of the following types of files:

Entity Sets and Relationships

In Zim, an entity set and its related indices are stored in a single operating system file. Relationships with fields are stored in the same way. Every file in a database is organized into fixed-size pages; each page contains 1024 bytes.

Pages are the unit of transfer between disk and memory. Every page belongs either to the entity set (or relationship) or to some specific index on that entity set or relationship. Zim manages the data within each page. In particular it tracks the free space available within partially filled pages and also tracks completely empty pages. The empty pages remain part of the file because the file systems that the program uses do not permit files to get smaller, only bigger. Thus, if you create a large entity set and delete all the members, you have lots of free space, but the file size is still large.

If a new page is needed, one of the empty pages is used, if available; otherwise, the size of the file is extended.

Entity set records are packed into pages. A record can be split between pages, but excessive splitting is avoided. The size of a record in an entity set is determined by the formula

L + N + 5

where

L is the total length of the non-virtual fields in the record (virtual fields are not included in the calculation because their values are not stored in an EntitySet record)

N is the number of non-virtual fields in the record.

For char, alpha, and numeric fields, length is the length specified in the field definition. The size of int, longint, vastint, and date fields depends on the underlying machine. Usually, the sizes are 2, 4, 8, and 8 respectively. Some machines force the alignment of certain kinds of data. As a result, records can be somewhat longer than the above calculation indicates. Some RISC machines, for example, force all data types except char, alpha, and numeric to start on an even address or on a multiple of 4 or 8.

Each page contains a header of approximately 22 bytes, meaning that the number of records per page, ignoring splitting, is the greatest integer in

(1024 - 22) / (RL)

where

RL is the calculated length of a record in the entity set

These calculations are approximate and are further complicated by varchar or varalpha (variable length character) fields, which occupy an amount of space equal only to their actual length, plus two bytes to store the length itself.

Using your knowledge of the average size of each variable length field, you can reasonably estimate the number of pages to be occupied by the data in an entity set or relationship. For example, consider an entity set composed of fields shown in the following table:

Field

Type

Actual Length (bytes)

Fld1

Char

12

Fld2

Int

2

Fld3

Longint

4

Fld4

Vastint

8

Fld5

Numeric

6

Fld6

Date

8

In this case, the length of each record in the EntitySet is

(12 + 2 + 4 + 8 + 6 + 8) + 5 + 6 = 51

The most records that can be stored in one 1024-byte page is

(1024 - 22) / 51 = 19

For an entity set containing one thousand records, the data would require 53 pages (i.e. 53Kb bytes).

Index space is somewhat harder to estimate accurately. Accuracy is difficult because Zim uses a sophisticated BXtree algorithm that tries to keep the BX tree as balanced as possible and to keep pages as full as possible. This strategy optimizes performance. The actual result is heavily dependent on the data and its physical order.

For an indexed field, including virtual fields, each non-null field value is stored as a key in the index. The maximum number of keys that can be stored in a single page is approximately

(1024 - 12) / LIF + 10)

where

LIF is the length of the indexed field

Note: The length of a key for a variable length field is always its maximum length.

Assuming that all pages are completely full, you can calculate that the minimum number of pages used by an index is the smallest integer greater than

TR/ MK

where

TR is the total number of records in the entity set

MK is the calculated maximum number of keys per page

As previously noted, the index algorithms in Zim merge partially-filled pages in order to keep pages as full as possible. The actual utilization of blocks depends on the distribution of key values and on the pattern of adding and deleting. Typical utilizations range from 50 percent to 80 percent.

If there is an index on field Fld1 from the preceding example, then the maximum number of keys per page is

(1024-12) / (12+10) = 46

For an entity set containing one thousand records, at 70 percent utilization, this index requires approximately

(1000/46) * (100/70) = 32 pages

The total size of a file includes the space used for entity set records, the space used for each index, any completely empty pages created by deletions, and one control page.

Directories

Directory files are also block-oriented. Zim directories contain information about every object defined in your application, including entity sets, relationships, fields, roles, variables, virtual fields, directories, named sets, constants, windows, menus, menu items, form fields, and displays. The amount of information maintained for each object varies. For example, an entity set is described by its name, the file number, and links to the information about its fields. Relationships require more space, primarily to store the encoded relationship condition. Information about an object is separated into basic information and descriptor information.

The number of pages occupied by a directory file can be estimated by the following formula:

3 + N/B + N/D

where

N is the number of objects in the directory

B is the number of objects whose basic information can be packed into a single page

D is the number of objects whose descriptor information can be packed into a single page

B is approximately 20 and D is approximately 10.

If you have chosen to store cross-reference information, that information is also kept in the directory file. Additional space is required for cross-reference information.

Compiled Programs

Files contained compiled application programs are also block-oriented. The amount of compiled code varies enormously from one source command to another. For example, the command

let V = 1

compiles to a rather small amount of code that assigns the value 1 to the variable V. On the other hand, consider

add ent1 from form1

The compiled code for this instruction must assign values from the fields in form1 to the fields with the same name in ent1.

If these objects had twenty-five objects in common, the compiled code for this ADD command would be at least twenty-five times the size of that produced for the preceding LET command. This comparison is indicative of the expressive power of Zim. Unfortunately, the variability in compiled output makes it virtually impossible to estimate the size of compiled programs.

Controlling Grown Characteristics of Database files

In most operating systems, files that grow frequently by small increments can become fragmented; each file is stored in many pieces throughout the disk. As a file becomes more heavily fragmented, access to the file becomes very inefficient, as increased disk head movement is required to locate all the pieces. Fragmentation can be reduced by forcing files to grown less frequently, but in greater increments. Zim provides several methods for controlling the growth characteristics of database files.

Overcoming Fragmentation

When all pages of a file have been filled with data, Zim Server normally extends the length of the file by 10% of the current size. This type of growth somewhat minimizes the size of the database file, but it can also reduce system performance if the file becomes extremely fragmented.

Many operating systems store files in numerous separate fragments. These fragments are often called extents. As a file grows, new extents can be created, increasing the fragmentation of the file. The average time that it takes to access the file can increase as the number of extents increases. Under some operating systems, this problem can be remedied by copying the file to another location; copying alone can reduce the number of fragments.

If the file is known to be growing, the ZIMXTEND utility or the file extend and data extend configuration options can be used to pre-allocate file space in the database, thereby controlling file growth and reducing fragmentation.

 

Related Topics