Intro | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
This article attempts to answer the most common questions from those new to the InterBase and Firebird database servers. Please send suggestions, corrections and complaints about this FAQ to the maintainer, Rado Benc bencrado@angelfire.com. | |||||||||||||||||||||||
Credits | |||||||||||||||||||||||
List of contributors in no particular order: Helen Borrie, Diane Brown, Ann Harrison, Brett Bandy, Paul Beach, Bill Karwin, Kevin Lawrence, Sean Leyne, Mark O'Donohue, Ivan Prenosil, Paul Reeves, Jim Starkey, Claudio Valderrama, Chris Vallis, David Zverkic |
|||||||||||||||||||||||
Arrays | |||||||||||||||||||||||
Q: How are InterBase arrays stored internally? A: Arrays are layered on the internal blob mechanism. The data is preceded by some binary gook that describes the structure of the array: dimensions, bounds, data type, etc. |
|||||||||||||||||||||||
Backup / Restore & Garbage Collection | |||||||||||||||||||||||
Q1: I have a 530 MByte database that I backed up to do garbage collection. This resulted in a 160 MByte file. This operation took about an hour, which is not acceptable.
A1: I suggest doing the backup with "disable garbage collection". If you're just going to replace the database with the restored version, it's a waste of time to garbage collect the original. Disabling garbage collection will make the backup run faster. Q:2 Does the restore build indexes while restoring? A2: Yes, unless you check "deactivate indexes" when restoring ( Q3: Is this why it takes so long? A3: Most probably. Q4: If I "deactivate indexes" while restoring, will I have to manually activate all indexes after the restore? A4: Yes. Here's a tip: create a SQL script with all the ALTER INDEX statements necessary to activate your indexes, and keep that around. Use it like a batch file with
It's possible for your situation that this is worthwhile, because you can get the database up and restored more quickly, then activate indexes afterwards. The data is still accessible even if the indexes are inactive, it'll just be slower to query the tables. Another suggestion: set the |
|||||||||||||||||||||||
Backup / Restore Large Databases | |||||||||||||||||||||||
Q: I had a database greater that 2 Gbytes, so I decided to split the db file into multiple files. I did the database backup procedure and checked the backup file dimension. It is 1,06 Gbytes, a lot less than the original database file (2 Gbytes). Do you think that if I restore the db into a single file the size will be less than 2 Gbytes? Or do you think it is still necessary to restore the db into multiple files? A1: Irrelevant. Your data is bound to grow past 2 gb anyway, and you should have defined at least 1 extra secondary file from the begining anyway. I'd define 5 files, give them page counts large enough to make then just under 2 gb, and then that way you won't need to worry about it for a few months (or hopefully even a few years) at least. You can use syntax like:
This would restore your database into 5 files each to grow to a maximum size just under 2 gb (but initially all about 16kb) with 8kb page sizes, and 75 buffers. %1 is your login name (probably SYSDBA) and %2 is your password. I choose 75 buffers for restore because of my experience that this gives me the best performance on our hardware/OS/file system. After we restore we use gfix to change the buffer size to 1000 pages. Interbase has a very problematic bug in that you can not add secondary files to a database without corrupting it (except perhaps to small databases, or perhaps for your first secondary file only). The only safe way to increase the number of database files is to use gbak and backup and restore. This is a problem because it requires you to take your database offline or atleast suddenly make your database appear to the work to be hours or days out of date. Better to create lots of extra files so that you won't have to do this often and perhaps ISC will fix this bug by the next time you need to do it. A2: GBAK files do not contain all the entries for an index data structure. Indexes in a GDB are stored as a kind of B-tree, which takes up some space proportional to the size of the table it indexes. GBAK records the definition of the index, but not the B-tree. When you restore, it rebuilds the tree based on the values in the table. So if you have a lot of indexes, it would follow that the gbak output file would be smaller than the GDB file. When you restore, it will reallocate that space to rebuild the indexes. I always recommend to define a secondary file if you think there is a chance that the database will grow. The secondary file is very small until it is needed. Unlike some database systems that require you pre-allocate the entire 2GB even for an empty file... |
|||||||||||||||||||||||
Bitwise Storage | |||||||||||||||||||||||
Q1: Can any one explain me what bitwise storage means ? A1: It's a method that works very badly in most relational databases, including InterBase, but which is common among older programmers who suffered from machines with almost no memory. Suppose you have 200 items that a customer might buy. You number them from one to 200 and include an array of 25 unsigned bytes in the customer record. For each item the customer orders, you set the bit in the array that corresponds to the item bought. It's very dense and allows you to look for combinations very effectively - as long as you're prepared to read every customer record. Ordinarily, I'd use a two dimensional array with customer id as one dimension and product id as the other. It's a nice (sort of) in- memory structure as long as your identifier space is dense, but it translates very very badly to relational databases. One problem, perhaps the major one, is that the database wants to understand the data so it can convert it. When you construct one of these arrays out of a character string, you've lied to the database about the contents of the column. Databases don't like being lied to and have wonderous and strange ways of getting their revenge. The other problem is that (at least in InterBase) there's no way to create an index based on which bits are set - if you've set bit eight, it wants that to be 256 - not item eight. Anyway, that's what I think we've been talking about. Q2: I do use a traditional relational design to store sale transaction details (potentially 200,000 new records a day). The bit wise storage solution is for rolled up tables. A2: There's an InterBase subtype of text that treats a character array as an array of unsigned bytes. I would worry about passing that through middle-ware layers, and I would worry that future versions of the middle-ware might decide to do more favors for me. I'd also worry about overlaying a bit stream on a different datatype (e.g. date) because at some point InterBase might choose to change the internal representation. You'd also run a risk (I think) of confusion if you ever switched between big-endian and little-endian representation. Then too, there's the problem that no bitwise representation can be indexed effectively. If you ever want to find all the customers who bought a product, you'll be reading all the customer rows. How many different products is each customer likely to order? Is there some other data structure that will do? If not, I'd use longwords and hope nobody ever switches architectures. |
|||||||||||||||||||||||
Boolean Domain - How to ? | |||||||||||||||||||||||
Q: I want to know the best way to create a domain called Boolean? A: Use
The values can then be cast as a boolean in Dlephi. This also has the advantage of being wholely international - not every language matches T(RUE)/F(ALSE) or Y(ES)/N(O).
| |||||||||||||||||||||||
Connection Pooling | |||||||||||||||||||||||
Q1: Can anyone help me where I can find anything about this "connection pool"? I really only have some slight thoughts what that could be. A1: Connection pooling is a programming technique that can be used with any database. Essentially, you have a controlling application to which users log on rather than logging on to the database itself. When a user requests that the application perform a task that requires database access, the application uses one of the connections it has already established to perform the query. The application maintains a pool of connections which it uses to satisfy the needs of its users. Since for most databases, including InterBase, making a connection is a relatively expensive operation, reusing them improves performance. Five connections can serve the requirements of many more concurrent users of the application because users hold the connection only for the time it takes to execute their queries. Using a conection pool also (in my opinion) improves security. The application determines who can perform which operations. Thus a random user could, for example, find out how many people have registered for a conference. Qualified users could register themselves and change their own registrations. Highly qualified users could change the registrations for others in their company. SQL privileges don't map well into such operationaly based security constraints. If the users log into the database directly, they will have the user name and password they need to exploit the data. Since select access is required to get a count of attendees, anyone (who can write a simple program) can get a full list of attendees. Anyone who makes a plausible attempt at registration can probably delete all registrations. If the application controls access, each group of uses can do exactly what the application allows and nothing more. The database user names and passwords are never disclosed. |
|||||||||||||||||||||||
Cost Optimizer | |||||||||||||||||||||||
Q: Is there a some kind of cost optimizer in IB Vx.x and if so, what strategy he use ? A: All versions of InterBase since V1.0 - since T0.1, actually - have used a cost based optimizer to select join order. The cost considered are the table cardinality(1) and the index selectivity(2). The intention is to minimize the size of the intermediate products(3). Starting in V2.0 - well maybe a little later - we added decision tree pruning(4) which reduced the cost of optimizing a 14 way join from 4 hours to 10 seconds, without significantly reducing the performance of the optimized query. Basically, the strategy is to determine all the conjuncts between tables in the join, including those derived by distributing equalities. The optimizer then examines paths through the conjucts that connect all tables and estimates the size of the intermediate product at each step. Some weight is given to the nature of the conjunct - equality through a unique index is best; inequality (<>) is so bad it is never used. Where there are not indexed conjuncts between tables, it interpolates either a sort/merge join or a natural scan of one table. There's a cost estimate for each of those operations. Obviously without the weighting factors for different operations. The optimizer also choses access strategies for subqueries and applies secondary indexes to queries. InterBase's bit mapped intermediate index structures allow it to use mulitple indexes on access to a single table. The V5 optimizer has been enhanced to recognize redundant(5) and unnecessary (6) indexes. In theory, the order of tables in an inner join statement should have no effect on the output of the optimizer. Prior to V5, that was not the case with joins expressed in SQL92 (FROM Probably more than you wanted to know. But in case not:
If that isn't more than you wanted to know, your boss isn't working you hard enough. |
|||||||||||||||||||||||
Crossdatabase Updates | |||||||||||||||||||||||
Q: Is there a way to use a trigger to update a different database which hosted in the same server? A1: Triggers have no direct access to other databases.
A2: No, not directly. Triggers and stored procedures can access only the context in which they are defined - the current database. However, you could create an event in a trigger which notified other databases of significant changes and allowed them to update themselves appropriately. Events can be sensitive to environment. Check the archive for the required OS and patch levels. |
|||||||||||||||||||||||
Database Design: Standardization | |||||||||||||||||||||||
Q1: Are there any suggested methods for database desing / standardization ? A1: A convention I've found works well goes something like this:
These are just suggestions, of course. I'd recommend that you sit down alone for a couple of hours and draft a Naming Conventions document for circulation to all of the developers. If you have the authority, you should say "This is our new naming scheme. Please submit suggested modifications for discussion by dd/mm/yyyy". Discuss the suggested changes but make it clear that you are not discussing a totally different approach to the conventions. I promise you that protracted debate about one system of systemic naming over another will gain nothing and lose a lot of valuable developer time. Modify your draft if everyone agrees the changes are reasonable and THEN implement it, announcing that non-compliant names will be rejected in all peer audits from dd/mm/yyyy. some suggestions about column and index names...
ACCOUNTNAME, You might like to consider designing a small set of abbreviations for use in naming columns, e.g. "CUST" for Customer, "ACCT" for Account. If you do this, you would need to be strict about compliance, not allowing some column names to use abbreviations whilst others use the fully extended word. A3: One addition I might make to this is in the instance of multiple Foreign Key references to the same field within a given table: an example of this might be an Also, I find table names without underscores to be extremely difficult to read, so in the absence of lower case, I'll use the underscores every time. Standardization is a CRITICAL issue, especially when:
Is anything more annoying than random occurrences of personal abbreviations [DATE, DT]? Probably, but not too many so unnecessary.. I agree that the name must be meaningful, although the "incrementing value" approach can add some value. On one very (very, very) large project ( I forget the numbers of tables in the data model - multiple hundreds, anyway; 150 person team), a naming standard adopted was to preface the meaningful table name by "TXXXX_" where XXXX was a incrementing number (0001, 0002 ...) issue by the data management group. Foreign key column names were always used the TXXXX_ in front of the meaningful name. This was extremely helpful in finding where the heck to get data from. Because we use table names in so many other places, and meta data object names are limited to 31 characters, I like to limit table names to about 26 characters. This lets us name a before delete trigger "TBD_tablename" without problems. We prefix all meta data objects except tables and columns with "object type identifiers as follows:
Once the basic standards are in place, you may wish to consider trying to standardize other common database design items, such as:
This allows our customers to:
I urge you to ignore the comments you received devaluing the effort to standardize: members of development teams want and need (and even
|
|||||||||||||||||||||||
Error : Undefined service gds_db/tcp | |||||||||||||||||||||||
Q: What's wrong? A1: Based on the message, I'd take a look at the sevices file on the server ( %systemroot%\win32\drivers\etc\services ) and check to be certain that there is still an entry for Interbase:
If not, add this line, be certain that there are some lines following it, even if they just contain This is done when you install InterBase, but you can also do it with any text editor. Remember to add a blank line at the end of the file, because Windows has a bug such that a cr/lf is required after the last line of the services file. Remember also that you need the |
|||||||||||||||||||||||
Forced Writes | |||||||||||||||||||||||
Q1: How does Forced Writes and Commit After Each Table Work? A: Interbase uses a strategy called "careful write" to schedule physical database writes so that the database on disk is always valid and consistent. So as long as the operating system does its job and writes disk blocks accurately and correctly and in the order requested, "nothing can go wrong" (go wrong go wrong). NT, like Unix, has a default behavior when disk writes don't go directly to disk but hang around a page cache until NT (or Unix) gets around to flushing them to disk. Both operating systems have a feature called "forced write" that tells the operating system "when I say write, I mean write". Unfortunately, on Unix at least, this features incurs an unreasonably high performance penalty. With forced write turned on, there is a close to iron-clad guarentee that if the disk survives an event, the database will be ok. If you get a power failure in the process of writing a large disk block, the disk controller may well decide to use the last of the dwindling voltage to park the heads, and to hell with the write in operation, so the guarentee isn't perfect. And the theory does assume that Interbase actually works as intended... With forced write turned on, you have to be very, very unlucky to lose a database. With forced write turned off, you have to have an uninterruptable power supply, a machine immune to hardware crashes, and the illusive bug-free version of NT (call If you want performance on a limited luck budget, try shadowing, which guarentees that disk block is written to at least two different disks. Once upon a time Interbase supported journalling, which wrote after images to a sequential journal. After a fatal crash, the database could be rebuilt from the journal. The performance impact of the journal was less than disk shadowing, but since any device eventually fills up, you were more or less obliged to pay somebody to watch a tape spin. Now, of course, disks cost $150 for 10 gigabytes, which can hold a lot of journal before it fills up. If I remember correctly, so bright light in engineering (no longer there) came up with a neat scheme to improve journalling performance significantly. The only down side was that it didn't work. Eventually somebody else decided that not working was a bad tradeoff for performance, and the whole journalling subsystem went down the tubes. A pity, all in all. So, your choices are:
|
|||||||||||||||||||||||
IBConsole, isql & WISQL | |||||||||||||||||||||||
Q1: When i run queries in Wisql, when i end the session, it prompts me if i want to commit. Even when I just do A1: Right. Every data access happens in the context of a transaction implicitly or explicitly started by the client. WISQL starts a transaction when you enter the first DML statment of the session and before the first DML statement after a commit or rollback. I suppose it could keep track of what you're doing and end read-only transactions without prompting you, but one way or another, you're going to have a transaction and it ought to end before you disconnect. Q2: So, if i was the only one using the database like Rebekkah said, and I committed one time, and then came back into Wisql, would it find new garbage to collect every time ? A2: You'd only find garbage if you had updated or deleted some rows. Selects and inserts don't leave any rubbish behind. If you look at a different set of data, you may find garbage there, but a single pass should remove everything from the rows you search. On the single user issue, I'm less sure. There are some optimizations that can be done if you know that you are the only one using a table. In the past, they weren't done because there was no efficient way to find the changes made by any particular transaction. That may have changed. In that case, even if you did update or delete things in the first transaction, the second wouldn't find anything to clean up. Q3: If i started a wisql session,and no one else is on the database, I run the select count the first time, interbase then performs the garbage collection, making write >0 then i run the same select count query again, would it still have more garbage to collect making writes > 0 ? A3: No. All the garbage collection should have happened on the first pass. Q4: If they ran the queries on after the other, I would have expected the writes to decrease, go from 1 to 0, or 6 to 0, with all the garbage collection happening on the first A4: Right. Q5: And I guess I could imagine the writes going from 6 to 1 to 0 (which means it missed one the first time ? or a new one was introduced ? ) A5: If some other transaction had updated or deleted rows in that set and the two counts were in different transactions and the second count transaction happened after the other transaction (and all its friends and relations) had committed, then the second count would find some more rubbish to remove. Q6: But, the original results says the writes went from 1 on the first query to 6 on the 2nd query, which *seems* inconsistent with their statement (or my limited memory of their statement) that nothing else was going on. It sure seems like their statistics were transposed or something additional happened. A6: The results were unusual and difficult to explain in the absence of other users. I guess I should spend a few minutes trying to reproduce those tests. |
|||||||||||||||||||||||
InterBase vs. Oracle | |||||||||||||||||||||||
Q: We've two database which are exactly the same. One is built in Local Interbase (which is shipped with Delphi) and one is built in Oracle. Both are filled with the same data. After that we say 'recordcount' in our Delphi program. When we do this at InterBase, it takes about 4,5 minutes when the recordcount is finished. Oracle responds in 0,8 second! Why is a recordcount so slow in Interbase? A: Unlike Oracle, Interbase implements multi-version concurrency. The database table contains not only those rows your transaction can see but also newer rows that you can't see yet, and older rows that were deleted before your transaction started but are needed to maintain read stability for older transactions. OK? An InterBase table that contains 13000 rows that are active for your transaction may contain many times that in total, when all the versions are considered. Three transactions that start at different times can count the same table at the same instant and get different results. That seems wrong, but in fact it is the best way (I think) to give read-consistency and high throughput. One way to think of it is that time stops for each transaction when the transaction starts. Nothing that happens afterward is visible, except changes made by the transaction itself. But, InterBase can't just keep a counter somewhere saying "This table has 13000 rows," because at any instant, the table may have 10000 for one transaction, 13000 for another, and 15000 for a third. InterBase must read every row to be sure it is correctly part of the data set of the current transaction. Oracle is looking a a counter somewhere - perhaps in memory. InterBase is reading at least 13000 rows. When InterBase was created, we understood that the design would make counts slow. However, the benefits outweighed that drawback in most cases. The multi-version concurrency system gives high consistency, high concurrency, and instantaneous rollback. It also eliminates the database maintenance involved in undo logs. But, if your application requires a count, that's a problem. The best solution I've seen, other than rethinking the application and eliminiating the need to count all the rows in a table, is this:
Low concurrency impact, reasonable performance, and kind of a pain. Triggers help. |
|||||||||||||||||||||||
Linux & Firebird |
I've just downloaded firebird for Linux. What's the server executable called? I couldn't find any file named ibserver there. |
||||||||||||||||||||||
If you have the classic version there is no server executable - at least nothing called ibserver. Classic works with inetd (or xinetd) which listens for requests on port 3050 (usually) and launches a process from there. If you look in /etc/inetd.conf you will see the name of the process launched. Best way to check is :
If something is listening, then the server is running. For classic the server process is I personally prefer classic, it's less awkward to use, but it depends upon your circumstances, in other situations super would probably be more preferable. |
|||||||||||||||||||||||
Primary Key - How to Alter? | |||||||||||||||||||||||
Q: I need to change the datatype of a field that is part of the Primary Key of a table. Example Table:
I need to change A: Yes, but not with SQL. The SQL standard supplies no syntax for altering the datatype, scale, or length of columns. The only 'standard compliant' way to alter those characteristics is to delete and recreate the field. Since you can not delete part of a primary key, the only way to change the primary is is to delete the table, if you must live within the bounds of SQL compliance. InterBase allows changes to the datatype, scale, and length of columns, but that support is not visible through the SQL DDL interface. Some of those changes are destructive, and the database makes no effort to keep you from making incompatible changes. For example, if you changed the length of a char or varchar from 30 to 15, InterBase would take the change, but any data stored with more than 15 characters in that field would cause runtime errors. Similarly, you could decide to change a column from char (255) to integer. If any of the columns contained values other than digits (or leading or trailing spaces) or any of the digits represented numbers outside the range -2**32 to (2**32 -1), then you'd get a runtime error. However, changing a small int to an integer or varchar(12) to varchar (20) will cause no trouble to the database. Your applications may not be too happy unless you extend their storage. You'll also have to check any varables you've defined in triggers and stored procedures. That said, here's the procedure:
create table mytable (myindex integer not null, mynumber smallint not null, mydata varchar (50), primary key (myindex, mynumber)); commit <check through the metadata option on the menu&qt; MYINDEX INTEGER Not Null MYNUMBER SMALLINT Not Null MYDATA VARCHAR(50) Nullable CONSTRAINT INTEG_13: Primary key (MYINDEX, MYNUMBER) select rdb$field_source from rdb$relation_fields where rdb$field_name = "MYNUMBER" and rdb$relation_name = "MYTABLE" RDB$FIELD_SOURCE =============================== RDB$68 select rdb$field_scale from rdb$fields where rdb$field_name = "RDB$68" RDB$FIELD_SCALE =============== 0 update rdb$fields set rdb$field_scale = -1 where rdb$field_name = "RDB$68" commit SHOW TABLE MYTABLE MYINDEX INTEGER Not Null MYNUMBER NUMERIC(4, 1) Not Null MYDATA VARCHAR(50) Nullable CONSTRAINT INTEG_13: Primary key (MYINDEX, MYNUMBER) The value of rdb$field_scale is the power of ten by which the stored value is to be multiplied to produce the result intended. For example, US currency is normally stored with a scale of -2. |
|||||||||||||||||||||||
Single / Multiprocessor | |||||||||||||||||||||||
Is ther a command line option to run the IBServer under a single processor? Have a look on the IBPhoenix site for the IBAffinity program - this will allow you to set the processor affinity, even when running as a service. By the way, Firebird v1.0 Beta 3, will have support for a new IBConfig option to allow you to set the processor affinity mask. |
|||||||||||||||||||||||
UDFs | |||||||||||||||||||||||
Q: I have problems with IB 5.5 and A: The |
|||||||||||||||||||||||
index |