Site hosted by Angelfire.com: Build your free website today!
     
Intro

This article attempts to answer the most common questions from those new to the InterBase and Firebird database servers. Please send suggestions, corrections and complaints about this FAQ to the maintainer, Rado Benc bencrado@angelfire.com.

Credits

List of contributors in no particular order: Helen Borrie, Diane Brown, Ann Harrison, Brett Bandy, Paul Beach, Bill Karwin, Kevin Lawrence, Sean Leyne, Mark O'Donohue, Ivan Prenosil, Paul Reeves, Jim Starkey, Claudio Valderrama, Chris Vallis, David Zverkic

Arrays

Q: How are InterBase arrays stored internally?

A: Arrays are layered on the internal blob mechanism. The data is preceded by some binary gook that describes the structure of the array: dimensions, bounds, data type, etc.

Backup / Restore & Garbage Collection

Q1: I have a 530 MByte database that I backed up to do garbage collection. This resulted in a 160 MByte file. This operation took about an hour, which is not acceptable.

A1: I suggest doing the backup with "disable garbage collection". If you're just going to replace the database with the restored version, it's a waste of time to garbage collect the original. Disabling garbage collection will make the backup run faster.

Q:2 Does the restore build indexes while restoring?

A2: Yes, unless you check "deactivate indexes" when restoring (gbak -i).

Q3: Is this why it takes so long?

A3: Most probably.

Q4: If I "deactivate indexes" while restoring, will I have to manually activate all indexes after the restore?

A4: Yes. Here's a tip: create a SQL script with all the ALTER INDEX statements necessary to activate your indexes, and keep that around. Use it like a batch file with isql -i script.sql to help automate this procedure. You can create this script with this query:

 
  SELECT 'ALTER INDEX ' || RDB$INDEX_NAME || ' ACTIVE;'
  FROM RDB$INDICES 
  WHERE RDB$SYSTEM_FLAG = 0 OR RDB$SYSTEM_FLAG IS NULL;

It's possible for your situation that this is worthwhile, because you can get the database up and restored more quickly, then activate indexes afterwards. The data is still accessible even if the indexes are inactive, it'll just be slower to query the tables.

Another suggestion: set the TEMP directory (using the system environment variable or entry in isc_config) to a directory on a separate physical drive from both the drive with your database and the drive with NT's pagefile.sys. Don't point the TEMP directory at a mapped drive, though.

Backup / Restore Large Databases

Q: I had a database greater that 2 Gbytes, so I decided to split the db file into multiple files. I did the database backup procedure and checked the backup file dimension. It is 1,06 Gbytes, a lot less than the original database file (2 Gbytes). Do you think that if I restore the db into a single file the size will be less than 2 Gbytes? Or do you think it is still necessary to restore the db into multiple files?

A1: Irrelevant. Your data is bound to grow past 2 gb anyway, and you should have defined at least 1 extra secondary file from the begining anyway. I'd define 5 files, give them page counts large enough to make then just under 2 gb, and then that way you won't need to worry about it for a few months (or hopefully even a few years) at least. You can use syntax like:


  gbak d:\backupfile d:\database.gdb 245000 d:\database.gd02 245000
  d:\database.d03 245000 d:\database.gd04 245000 d:\database.gd05 245000
  -c -v -y r.out -user "%1" -password "%2" -p 8192 -bu 75

This would restore your database into 5 files each to grow to a maximum size just under 2 gb (but initially all about 16kb) with 8kb page sizes, and 75 buffers. %1 is your login name (probably SYSDBA) and %2 is your password. I choose 75 buffers for restore because of my experience that this gives me the best performance on our hardware/OS/file system.

After we restore we use gfix to change the buffer size to 1000 pages.

Interbase has a very problematic bug in that you can not add secondary files to a database without corrupting it (except perhaps to small databases, or perhaps for your first secondary file only). The only safe way to increase the number of database files is to use gbak and backup and restore. This is a problem because it requires you to take your database offline or atleast suddenly make your database appear to the work to be hours or days out of date. Better to create lots of extra files so that you won't have to do this often and perhaps ISC will fix this bug by the next time you need to do it.

A2: GBAK files do not contain all the entries for an index data structure. Indexes in a GDB are stored as a kind of B-tree, which takes up some space proportional to the size of the table it indexes. GBAK records the definition of the index, but not the B-tree. When you restore, it rebuilds the tree based on the values in the table. So if you have a lot of indexes, it would follow that the gbak output file would be smaller than the GDB file. When you restore, it will reallocate that space to rebuild the indexes.

I always recommend to define a secondary file if you think there is a chance that the database will grow. The secondary file is very small until it is needed. Unlike some database systems that require you pre-allocate the entire 2GB even for an empty file...

Bitwise Storage

Q1: Can any one explain me what bitwise storage means ?

A1: It's a method that works very badly in most relational databases, including InterBase, but which is common among older programmers who suffered from machines with almost no memory.

Suppose you have 200 items that a customer might buy. You number them from one to 200 and include an array of 25 unsigned bytes in the customer record. For each item the customer orders, you set the bit in the array that corresponds to the item bought. It's very dense and allows you to look for combinations very effectively - as long as you're prepared to read every customer record.

Ordinarily, I'd use a two dimensional array with customer id as one dimension and product id as the other. It's a nice (sort of) in- memory structure as long as your identifier space is dense, but it translates very very badly to relational databases.

One problem, perhaps the major one, is that the database wants to understand the data so it can convert it. When you construct one of these arrays out of a character string, you've lied to the database about the contents of the column. Databases don't like being lied to and have wonderous and strange ways of getting their revenge.

The other problem is that (at least in InterBase) there's no way to create an index based on which bits are set - if you've set bit eight, it wants that to be 256 - not item eight.

Anyway, that's what I think we've been talking about.

Q2: I do use a traditional relational design to store sale transaction details (potentially 200,000 new records a day). The bit wise storage solution is for rolled up tables.

A2: There's an InterBase subtype of text that treats a character array as an array of unsigned bytes. I would worry about passing that through middle-ware layers, and I would worry that future versions of the middle-ware might decide to do more favors for me. I'd also worry about overlaying a bit stream on a different datatype (e.g. date) because at some point InterBase might choose to change the internal representation. You'd also run a risk (I think) of confusion if you ever switched between big-endian and little-endian representation.

Then too, there's the problem that no bitwise representation can be indexed effectively. If you ever want to find all the customers who bought a product, you'll be reading all the customer rows.

How many different products is each customer likely to order? Is there some other data structure that will do? If not, I'd use longwords and hope nobody ever switches architectures.

Boolean Domain - How to ?

Q: I want to know the best way to create a domain called Boolean?

A: Use


  CREATE DOMAIN D_BOOLEAN AS SMALLINT  
    DEFAULT 0 NOT NULL 
    CHECK (VALUE BETWEEN 0 AND 1);
The values can then be cast as a boolean in Dlephi. This also has the advantage of being wholely international - not every language matches T(RUE)/F(ALSE) or Y(ES)/N(O).

Connection Pooling

Q1: Can anyone help me where I can find anything about this "connection pool"? I really only have some slight thoughts what that could be.

A1: Connection pooling is a programming technique that can be used with any database. Essentially, you have a controlling application to which users log on rather than logging on to the database itself. When a user requests that the application perform a task that requires database access, the application uses one of the connections it has already established to perform the query.

The application maintains a pool of connections which it uses to satisfy the needs of its users. Since for most databases, including InterBase, making a connection is a relatively expensive operation, reusing them improves performance. Five connections can serve the requirements of many more concurrent users of the application because users hold the connection only for the time it takes to execute their queries.

Using a conection pool also (in my opinion) improves security. The application determines who can perform which operations. Thus a random user could, for example, find out how many people have registered for a conference. Qualified users could register themselves and change their own registrations. Highly qualified users could change the registrations for others in their company.

SQL privileges don't map well into such operationaly based security constraints. If the users log into the database directly, they will have the user name and password they need to exploit the data. Since select access is required to get a count of attendees, anyone (who can write a simple program) can get a full list of attendees. Anyone who makes a plausible attempt at registration can probably delete all registrations.

If the application controls access, each group of uses can do exactly what the application allows and nothing more. The database user names and passwords are never disclosed.

Cost Optimizer

Q: Is there a some kind of cost optimizer in IB Vx.x and if so, what strategy he use ?

A: All versions of InterBase since V1.0 - since T0.1, actually - have used a cost based optimizer to select join order. The cost considered are the table cardinality(1) and the index selectivity(2). The intention is to minimize the size of the intermediate products(3). Starting in V2.0 - well maybe a little later - we added decision tree pruning(4) which reduced the cost of optimizing a 14 way join from 4 hours to 10 seconds, without significantly reducing the performance of the optimized query.

Basically, the strategy is to determine all the conjuncts between tables in the join, including those derived by distributing equalities. The optimizer then examines paths through the conjucts that connect all tables and estimates the size of the intermediate product at each step. Some weight is given to the nature of the conjunct - equality through a unique index is best; inequality (<>) is so bad it is never used.

Where there are not indexed conjuncts between tables, it interpolates either a sort/merge join or a natural scan of one table. There's a cost estimate for each of those operations. Obviously without the weighting factors for different operations.

The optimizer also choses access strategies for subqueries and applies secondary indexes to queries. InterBase's bit mapped intermediate index structures allow it to use mulitple indexes on access to a single table. The V5 optimizer has been enhanced to recognize redundant(5) and unnecessary (6) indexes.

In theory, the order of tables in an inner join statement should have no effect on the output of the optimizer. Prior to V5, that was not the case with joins expressed in SQL92 (FROM JOIN ON ...) rather than the SQL89 syntax (FROM , , WHERE...). If you find that a query has been badly optimized and you are satisfied that the indexes you have defined are appropriate, you can create a PLAN which overrides the optimizer's decisions.

Probably more than you wanted to know.

But in case not:

  1. When I was familiar with the code, cardinality was based on the number of pointer pages in the table and the size of a row. This is cheap to maintain and provides an accurate estimate of the amount of I/O necessary to retrieve the rows. The drawback is that it does not accurately represent the number of rows in a sparcely populated table - for example one in which a number of rows chosen randomly (relative to their storage order) had been deleted. The alternatives we considered would create 'hotspots' in the database. Our decision was that the hotspots would overwhelm any gain in performance.
  2. Selectivity is the ratio of the number of distinct values in the index to the number of entries. The selectivity of an index is set when it is created, reactivated, or when the someone requests that it be reset. Again, the improvement in the optimizer due to more accurate and timely values of selectivity is more than offset by the cost of maintaining the values.
  3. The intermediate product is the number of rows present at any stage of building up the result set. In a simple example, consider the query
    
      SELECT D.NAME FROM DEPARTMENTS, MANAGERS M, 
        WHERE D.MANAGER = M.NAME AND D.NAME = 'NOD'
    
    For argument assume that name is a unique key in each of the tables. Manager is a non-unique indexed column in departments (a manager can handle departments, but each department has one manager). There are 500 departments and 300 managers. If you start with the managers, you have an intermediate set of 300. Applying the manager/department conjunct gives you 500 intermediate rows. Applying the department/NOD conjunct reduces the final result to one. If you start with department/NOD, you get one row; when you apply the department/manager conjunct you still have only one row. That's the intermediate product issue.
  4. Decision tree pruning means throwing out unpromising results. The query above can be resolved in one of three ways:
    
      Managers/departments/'NOD' 300/500/1
      Departments/managers/'NOD' 500/500/1
      Departments/'NOD'/managers 500/1/1
      NOD/Departments/managers 1/1/1
    
    If the cases were evaluated in that order, choices two and three would be discarded after the first conjunct was evaluated (I think) be cause even if the next conjuncts were unique equalities, they'd still have intermediate results as large or larger than the previous case. The doubt comes from their all being so close. Adding more terms makes some of this more imperative. Cases where NOD and managers are considered together before departments are added aren't considered because there's no conjunct between those two items. The optimizer builds up the longest strings of good choices then, if none of the strings include all the inputs, it adds sort merge joins to cover non-indexed conjuncts or natural joins if there are no conjuncts at all.
  5. Redundant indexes are indexes on columns for which other indexes have already been included. Consider at table with a non-unique index on A (descending) and another on B (ascending), plus a unique ascending index on A,B and a unique descending index on B,A - don't snicker, I've seen worse, much worse. A query that includes values for A and B would, prior to V5 use all four indexes, when either of the unique indexes would produce a single value.
  6. Unnecessary indexes are indexes that are applied even though they don't restrict the result significantly. Redundant indexes are unnecessary, but not all unnecessary indexes are redundant. Consider a table with a unique indexed column A and a non-unique indexed column B. If your query includes values for both A and B, prior to V5 InterBase would use both indexes, even though the index on A reduces the result set to one row at most.

If that isn't more than you wanted to know, your boss isn't working you hard enough.

Crossdatabase Updates

Q: Is there a way to use a trigger to update a different database which hosted in the same server?

A1: Triggers have no direct access to other databases.

  • You can use UDF which connects to different database and makes the update (but it will be very slow !)
  • The trigger can post EVENT. Then you can have application (connected to both source and destination databases) which waits for that event and makes the update (but the trigger has no information whether the synchronizing application is running or not !)
  • Or the update can be done by the application which causes trigger to fire (not nice solution).
  • Or you can merge the two databases into one !? (best solution if you can afford it).

A2: No, not directly. Triggers and stored procedures can access only the context in which they are defined - the current database. However, you could create an event in a trigger which notified other databases of significant changes and allowed them to update themselves appropriately.

Events can be sensitive to environment. Check the archive for the required OS and patch levels.

Database Design: Standardization

Q1: Are there any suggested methods for database desing / standardization ?

A1: A convention I've found works well goes something like this:

  • Tables: Name master tables with the full name of the class they represent, e.g. CUSTOMER, ACCOUNT, PERSON. Don't use plurals. Name secondary tables so that they show the relationship, e.g. CUSTOMERACCOUNT, PERSONADDRESS, PERSONPATIENTDOCTOR.
  • Name views to indicate the contents or end-use of the record set. Prefix views with "V" e.g. VCUSTOMERADDRESSES.
  • Name procedures to indicate what they do - which almost always requires the use of a verb. GET and SET are useful in English because they are short - you might have problems finding short verbs in French. :-)) You might have to design some abbreviations for "standard" verbs. Prefix procedures with "P". e.g. PGETAGEDTRIALBALANCE, PSETCUSTOMERID. Name triggers as the name of the table that owns the trigger, prefixed with the trigger type and sequence number, e.g. BD1CUSTOMER for a Before Delete trigger on the CUSTOMER table.

These are just suggestions, of course. I'd recommend that you sit down alone for a couple of hours and draft a Naming Conventions document for circulation to all of the developers. If you have the authority, you should say "This is our new naming scheme. Please submit suggested modifications for discussion by dd/mm/yyyy". Discuss the suggested changes but make it clear that you are not discussing a totally different approach to the conventions. I promise you that protracted debate about one system of systemic naming over another will gain nothing and lose a lot of valuable developer time.

Modify your draft if everyone agrees the changes are reasonable and THEN implement it, announcing that non-compliant names will be rejected in all peer audits from dd/mm/yyyy.

some suggestions about column and index names...

  • Primary key columns - name of table + "ID""
  • Foreign key columns - exact name of referenced column - this makes life a lot simpler when writing queries.
  • Generators - "GEN" + name of column for which the value is generated, e.g. GENCUSTOMERID.

ACCOUNTNAME, CUSTOMERNAME. HOMEADDRESS and WORKADDRESS might also be more useful than ADDRESS1 and ADDRESS2.

You might like to consider designing a small set of abbreviations for use in naming columns, e.g. "CUST" for Customer, "ACCT" for Account. If you do this, you would need to be strict about compliance, not allowing some column names to use abbreviations whilst others use the fully extended word.

A3: One addition I might make to this is in the instance of multiple Foreign Key references to the same field within a given table: an example of this might be an ORDER, which has a "ship to" address and an "invoice to" address, both of which reference an ADDRESS_ID in the ADDRESS table. These should be named with some differentiating text at the front of the name, followed by the foreign key field name, e.g. SHIP_ADDRESS_ID and INVOICE_ADDRESS_ID.

Also, I find table names without underscores to be extremely difficult to read, so in the absence of lower case, I'll use the underscores every time.

Standardization is a CRITICAL issue, especially when:

  1. building "Enterprise" level systems,
  2. building systems that might share data with other departmental systems or access common data stores,
  3. working for customers with well defined standards (government clients often have "abbreviation algorithm" that result in strange values, but must be adhered to.

Is anything more annoying than random occurrences of personal abbreviations [DATE, DT]? Probably, but not too many so unnecessary..

I agree that the name must be meaningful, although the "incrementing value" approach can add some value. On one very (very, very) large project ( I forget the numbers of tables in the data model - multiple hundreds, anyway; 150 person team), a naming standard adopted was to preface the meaningful table name by "TXXXX_" where XXXX was a incrementing number (0001, 0002 ...) issue by the data management group. Foreign key column names were always used the TXXXX_ in front of the meaningful name. This was extremely helpful in finding where the heck to get data from.

Because we use table names in so many other places, and meta data object names are limited to 31 characters, I like to limit table names to about 26 characters. This lets us name a before delete trigger "TBD_tablename" without problems. We prefix all meta data objects except tables and columns with "object type identifiers as follows:

  • Domains : D_domainname (e.g. D_BOOLEAN)
  • Views: V_viewname (e.g. V_MEMBER_ADDRESS)
  • Triggers:
    • before insert: TBI_tablename
    • after insert: TAI_tablename
    • before update: TBU_tablename
    • after update: TAU_tablename
    • before delete: TBD_tablename
    • after delete: TAD_tablename
    (eg TBI_SET_MEMBER_ID). Note that triggers also often a Txx_SET_fieldname or Txx_SET_datatype construct.
  • Stored Procedures:
    • P_procedurename
    • P_SET_procedurename
    • P_GET_procedurename
    (eg P_GET_MEMBER_ID)
  • Generators: G_fieldname (eg G_MEMBER_ID)
  • Exceptions: E_exceptionname (eg E_SALARY_RANGE_LIMIT)
  • User Define Functions: don't use them - but would probably use F_udfname

Once the basic standards are in place, you may wish to consider trying to standardize other common database design items, such as:

  1. Common field types eg: name Boolean fields "IS_question" (eg IS_ACTIVE)
  2. Reference table design. Most of our "simple" reference tables have a structure (illustrated below like the AWARD table) like:
Field Name Domain Type
AWARD_ID D_PRIMARY_KEY INTEGER
AWARD_NAME D_SHORT_DESCRIPTION VARCHAR(100)
AWARD_CODE D_CODE VARCHAR(5)
AWARD_SORT D_SORT_VALUE INTEGER
AWARD_COLLATE D_COLLATE INTEGER
IS_ACTIVE D_BOOLEAN CHAR(1) ('Y', 'N')

This allows our customers to:

  • display a full description or code (AWARD_NAME or AWARD_CODE)
  • use AWARD_SORT to force sorts based other than those of the code or name. This can be very helpful in reporting (eg printing Canadian provinces West to East), or when the reference table is for workflow or other "process" related concept that should be ordered (like order status: Received, Assessed, Costed, Authorized, Shipped, Paid).
  • use AWARD_COLLATE to collapse (yes, this can be done though proper normalization) categories for reporting.
  • use IS_ACTIVE to retain values in the database for referential integrity and historical reporting purposes, that are no longer used (the lookup lists are filtered so as not to display values where IS_ACTIVE = 'N')

I urge you to ignore the comments you received devaluing the effort to standardize: members of development teams want and need (and even LIKE) standards, especially those that:

  1. add value and reduce uncertainty,
  2. well thought out (don't change alot, creating meaningless rework)
  3. they had input to.
Error : Undefined service gds_db/tcp

Q: What's wrong?

A1: Based on the message, I'd take a look at the sevices file on the server ( %systemroot%\win32\drivers\etc\services ) and check to be certain that there is still an entry for Interbase:


  gds_db     3050/tcp    # Borland InterBase

If not, add this line, be certain that there are some lines following it, even if they just contain , and then the connection should work again. Don't be confused by the services.sam file in the same directory; it's not used by NT, it's just there as a SAMple.

This is done when you install InterBase, but you can also do it with any text editor. Remember to add a blank line at the end of the file, because Windows has a bug such that a cr/lf is required after the last line of the services file. Remember also that you need the gds_db entry in the services file on the server, and each client machine.

Forced Writes

Q1: How does Forced Writes and Commit After Each Table Work?

A: Interbase uses a strategy called "careful write" to schedule physical database writes so that the database on disk is always valid and consistent. So as long as the operating system does its job and writes disk blocks accurately and correctly and in the order requested, "nothing can go wrong" (go wrong go wrong). NT, like Unix, has a default behavior when disk writes don't go directly to disk but hang around a page cache until NT (or Unix) gets around to flushing them to disk. Both operating systems have a feature called "forced write" that tells the operating system "when I say write, I mean write". Unfortunately, on Unix at least, this features incurs an unreasonably high performance penalty.

With forced write turned on, there is a close to iron-clad guarentee that if the disk survives an event, the database will be ok. If you get a power failure in the process of writing a large disk block, the disk controller may well decide to use the last of the dwindling voltage to park the heads, and to hell with the write in operation, so the guarentee isn't perfect. And the theory does assume that Interbase actually works as intended...

With forced write turned on, you have to be very, very unlucky to lose a database. With forced write turned off, you have to have an uninterruptable power supply, a machine immune to hardware crashes, and the illusive bug-free version of NT (call 1-800-MSC-VAPOR), or be incredible lucky.

If you want performance on a limited luck budget, try shadowing, which guarentees that disk block is written to at least two different disks.

Once upon a time Interbase supported journalling, which wrote after images to a sequential journal. After a fatal crash, the database could be rebuilt from the journal. The performance impact of the journal was less than disk shadowing, but since any device eventually fills up, you were more or less obliged to pay somebody to watch a tape spin. Now, of course, disks cost $150 for 10 gigabytes, which can hold a lot of journal before it fills up. If I remember correctly, so bright light in engineering (no longer there) came up with a neat scheme to improve journalling performance significantly. The only down side was that it didn't work. Eventually somebody else decided that not working was a bad tradeoff for performance, and the whole journalling subsystem went down the tubes. A pity, all in all.

So, your choices are:

  • Shadow
  • Use forced write
  • Be lucky
  • Keep your resume in order
IBConsole, isql & WISQL

Q1: When i run queries in Wisql, when i end the session, it prompts me if i want to commit. Even when I just do select count(*) queries it prompts me to commit.

A1: Right. Every data access happens in the context of a transaction implicitly or explicitly started by the client. WISQL starts a transaction when you enter the first DML statment of the session and before the first DML statement after a commit or rollback. I suppose it could keep track of what you're doing and end read-only transactions without prompting you, but one way or another, you're going to have a transaction and it ought to end before you disconnect.

Q2: So, if i was the only one using the database like Rebekkah said, and I committed one time, and then came back into Wisql, would it find new garbage to collect every time ?

A2: You'd only find garbage if you had updated or deleted some rows. Selects and inserts don't leave any rubbish behind. If you look at a different set of data, you may find garbage there, but a single pass should remove everything from the rows you search. On the single user issue, I'm less sure. There are some optimizations that can be done if you know that you are the only one using a table. In the past, they weren't done because there was no efficient way to find the changes made by any particular transaction. That may have changed. In that case, even if you did update or delete things in the first transaction, the second wouldn't find anything to clean up.

Q3: If i started a wisql session,and no one else is on the database, I run the select count the first time, interbase then performs the garbage collection, making write >0 then i run the same select count query again, would it still have more garbage to collect making writes > 0 ?

A3: No. All the garbage collection should have happened on the first pass.

Q4: If they ran the queries on after the other, I would have expected the writes to decrease, go from 1 to 0, or 6 to 0, with all the garbage collection happening on the first select count(*).

A4: Right.

Q5: And I guess I could imagine the writes going from 6 to 1 to 0 (which means it missed one the first time ? or a new one was introduced ? )

A5: If some other transaction had updated or deleted rows in that set and the two counts were in different transactions and the second count transaction happened after the other transaction (and all its friends and relations) had committed, then the second count would find some more rubbish to remove.

Q6: But, the original results says the writes went from 1 on the first query to 6 on the 2nd query, which *seems* inconsistent with their statement (or my limited memory of their statement) that nothing else was going on. It sure seems like their statistics were transposed or something additional happened.

A6: The results were unusual and difficult to explain in the absence of other users. I guess I should spend a few minutes trying to reproduce those tests.

InterBase vs. Oracle

Q: We've two database which are exactly the same. One is built in Local Interbase (which is shipped with Delphi) and one is built in Oracle. Both are filled with the same data. After that we say 'recordcount' in our Delphi program. When we do this at InterBase, it takes about 4,5 minutes when the recordcount is finished. Oracle responds in 0,8 second! Why is a recordcount so slow in Interbase?

A: Unlike Oracle, Interbase implements multi-version concurrency. The database table contains not only those rows your transaction can see but also newer rows that you can't see yet, and older rows that were deleted before your transaction started but are needed to maintain read stability for older transactions.

OK? An InterBase table that contains 13000 rows that are active for your transaction may contain many times that in total, when all the versions are considered. Three transactions that start at different times can count the same table at the same instant and get different results. That seems wrong, but in fact it is the best way (I think) to give read-consistency and high throughput. One way to think of it is that time stops for each transaction when the transaction starts. Nothing that happens afterward is visible, except changes made by the transaction itself.

But, InterBase can't just keep a counter somewhere saying "This table has 13000 rows," because at any instant, the table may have 10000 for one transaction, 13000 for another, and 15000 for a third. InterBase must read every row to be sure it is correctly part of the data set of the current transaction. Oracle is looking a a counter somewhere - perhaps in memory. InterBase is reading at least 13000 rows.

When InterBase was created, we understood that the design would make counts slow. However, the benefits outweighed that drawback in most cases. The multi-version concurrency system gives high consistency, high concurrency, and instantaneous rollback. It also eliminates the database maintenance involved in undo logs. But, if your application requires a count, that's a problem.

The best solution I've seen, other than rethinking the application and eliminiating the need to count all the rows in a table, is this:

  • Create a table to hold the counts. It might have a column called "TableName" and a column called "count". Start by counting all the tables you care about and store the results in the counts table. Index the table name. Do this while no one is inserting or deleting rows.
  • Each transaction that adds a row to a table stores (, 1) in the counts table. Each transaction that deletes a row stores (, -1). To get the count for your tranaction simple sum all the counts for that table name.
  • Every now and again - nightly, weekly, whatever, re-initialize the counts table, eliminating all the incremental changes and staring with a new, correct, total count.

Low concurrency impact, reasonable performance, and kind of a pain. Triggers help.

Linux & Firebird

I've just downloaded firebird for Linux. What's the server executable called? I couldn't find any file named ibserver there.

If you have the classic version there is no server executable - at least nothing called ibserver. Classic works with inetd (or xinetd) which listens for requests on port 3050 (usually) and launches a process from there. If you look in /etc/inetd.conf you will see the name of the process launched.

Best way to check is :


  netstat -an | grep 3050

If something is listening, then the server is running. For classic the server process is gds_inet_server but it only runs when a client is connected, the gds_lock_mgr process is also likely to be running.

I personally prefer classic, it's less awkward to use, but it depends upon your circumstances, in other situations super would probably be more preferable.

Primary Key - How to Alter?

Q: I need to change the datatype of a field that is part of the Primary Key of a table. Example Table:


  CREATE TABLE MYTABLE (
    MYINDEX INTEGER, 
    MYNUMBER SMALLINT, 
    MYDATA VARCHAR(50),
  PRIMARY KEY (MYINDEX, MYNUMBER)
  ); 

I need to change MYNUMBER from a SMALLINT to a DECIMAL(3,1). Can I do this without destroying the table and rebuilding it?

A: Yes, but not with SQL. The SQL standard supplies no syntax for altering the datatype, scale, or length of columns. The only 'standard compliant' way to alter those characteristics is to delete and recreate the field. Since you can not delete part of a primary key, the only way to change the primary is is to delete the table, if you must live within the bounds of SQL compliance.

InterBase allows changes to the datatype, scale, and length of columns, but that support is not visible through the SQL DDL interface. Some of those changes are destructive, and the database makes no effort to keep you from making incompatible changes. For example, if you changed the length of a char or varchar from 30 to 15, InterBase would take the change, but any data stored with more than 15 characters in that field would cause runtime errors. Similarly, you could decide to change a column from char (255) to integer. If any of the columns contained values other than digits (or leading or trailing spaces) or any of the digits represented numbers outside the range -2**32 to (2**32 -1), then you'd get a runtime error.

However, changing a small int to an integer or varchar(12) to varchar (20) will cause no trouble to the database. Your applications may not be too happy unless you extend their storage. You'll also have to check any varables you've defined in triggers and stored procedures. That said, here's the procedure:


create table mytable (myindex integer not null, 
mynumber smallint not null, 
mydata varchar (50), 
primary key (myindex, mynumber)); 

commit 

<check through the metadata option on the menu&qt; 

MYINDEX INTEGER Not Null 
MYNUMBER SMALLINT Not Null 
MYDATA VARCHAR(50) Nullable 
CONSTRAINT INTEG_13: 
Primary key (MYINDEX, MYNUMBER) 

select rdb$field_source 
from rdb$relation_fields 
where rdb$field_name = "MYNUMBER" 
and rdb$relation_name = "MYTABLE" 

RDB$FIELD_SOURCE 
=============================== 

RDB$68 

select rdb$field_scale from rdb$fields where rdb$field_name = "RDB$68" 

RDB$FIELD_SCALE 
=============== 

0 

update rdb$fields set rdb$field_scale = -1 
where rdb$field_name = "RDB$68" 

commit 

SHOW TABLE MYTABLE 
MYINDEX INTEGER Not Null 
MYNUMBER NUMERIC(4, 1) Not Null 
MYDATA VARCHAR(50) Nullable 
CONSTRAINT INTEG_13: 
Primary key (MYINDEX, MYNUMBER) 

The value of rdb$field_scale is the power of ten by which the stored value is to be multiplied to produce the result intended. For example, US currency is normally stored with a scale of -2.

Single / Multiprocessor

Is ther a command line option to run the IBServer under a single processor?

Have a look on the IBPhoenix site for the IBAffinity program - this will allow you to set the processor affinity, even when running as a service.

By the way, Firebird v1.0 Beta 3, will have support for a new IBConfig option to allow you to set the processor affinity mask.

UDFs

Q: I have problems with IB 5.5 and ib_udf.dll.

A: The ib_udf.dll relies on another dll: ib_util.dll. InterBase needs to find this dll so that it can be loaded. Try moving the ib_util.dll to the interbase bin directory and retrying your test.

  index