Test Drive of Omnidex
Shawn M. Gordon
What it can do?
Omnidex provides two different fundamental types of indexing options for your Image database to drammatically improve performance. The first, and original, is what they call ASK (previously known as IMSAM). These are the equivlant of KSAM type access and are very similar to what is provided by the relatively new Image b-trees. The biggest advantage of ASK over the native b-trees is that they can be applied to any field in a master or a detail set, not just key fields, and can be a concatenation of several fields or subsets of fields.
The other type of key structure available from DISC is the MDK (previously known as OMNIDEX) key which is where the real power and meat exist. The MDK index is described as being an inverted b-tree. Technically I don’t know what the implication is of this, but however they are doing it, there are amazing options. With the MDK key you can use wild card search, range searches, soundex pattern matching, composite keys (consisting of portions or combinations of fields), aggregate indexes that will return aggregate data such as MIN, MAX, SUM, COUNT without having to actually read the data.
There is a feature that is part of Image called the TPI, which stands for Third Party Indexing. This opened up the standard Image database calls with extra modes to support products such as Omnidex directly through the Image intrinsics without having to code specifically for the API of the indexing product. This has pluses and minuses, but overall it’s a good thing.
How does it work?
Some time back DISC rearchitected Omnidex for Open Systems. When the did this they moved the indexes out of Image and into a proprietary format external to the database. This had a number of advantages even for HP 3000 customers. It meant that you could add a new index and repopulate all the indexes without having exclusive access to the database. Of course no one can be writing to the database at the same time.
You can access the indexes created by Omnidex through one of two interfaces. There is the native Omnidex API which provides a whole host of ODX intrinsics and DBI intrinsics. Say for example you wanted to do a lookup on a name, you would call ODXFIND followed by a loop around ODXGET and DBGET. What happens is ODXFIND does the initial find and returns the record counts of qualifying records. This process is almost instantaneous and quite remarkable. You can do successive ODXFINDs against different keys and keep trimming down the qualyfing record count.
Once you go to your ODXGET..DBGET loop what will happen in the case of a manual master is that the key of the record will be returned by ODXGET and you would then do a DBGET mode 7. If you are working against a detail data set, then the ODXGET will return the relative record number and you retrieve it with a DBGET mode 4.
If you are using the TPI then you use a different mode on DBFIND to get record counts and then do essentially a changed read with DBGET. This eliminates one of the calls you have to make and is easier for people to follow, however it does leave a little of the functionality behind. There are some cute tricky things you can do like returning the record id’s into a file and using that for other processes within your job or tree that you can’t get to through the TPI.
DISC claims Omnidex can build its indexes at rates over 100 million keywords per hour, with a compression technology that uses just a fraction of the data indexed. Since I didn’t actually have 100 million keywords to work on, all I can say is that it went really fast on the data I did have, typically running in a couple of minutes against a base with a half million records and about 75 keywords specified.
What I did find a little odd is when you indexed an empty database it could take 10 minutes at times depending on the capacities of the sets. It’s almost like they don’t check the number of records in the table before they start working and simply do a serial read of the entire base. I realize that you wouldn’t normally try to index an empty database, but I like to try non-standard things.
There has been work between Taurus and DISC for creating highly optimized Data Marts as well. Omnidex is perfect for indexing large data warehouse type databases, especially with the aggregate indexes where you just want to get certain types of statistical information. There are some very good white papers at both the DISC and Taurus sites on data warehousing and indexing.
The performance of Omnidex make it ideal for indexing large amounts of data within a data warehouse, and the fit with Taurus is very good for populating the database with the data.
The idea with Omnidex is that once it’s in place, it should be pretty transparent, other than the performance improvement. By using the TPI you can keep the programming changes to a minimum as well. There was somethings I did that didn’t require changes at all, so that was pretty nice. One thing you need to do though with certain 3rd party tools like Warehouse or Suprtool is to add the Omnidex XL to the run statement to make sure they go through the TPI.
Installation and Documentation
Installation is smooth and includes a nice graphical front end that runs on a standard terminal. The documentation leaves a bit to be desired however. I remember when I first used Omnidex about 11 years ago, the manuals were well organized and clearly written. I was able to figure out the product pretty thoroughly just from the manuals. Since they have gone multi platform however the manuals are just poorly indexed reference manuals and are not very clear. I like to think I’ve gotten much better at this stuff in the last 11 years and I had more trouble now than I did 11 years ago. DISC really needs to make a heavy investment in cleaning up the manuals and making sure they cover the appropriate platforms clearly. The language on the TPI is especially vague.
Ever since the addition of GUI based OmniUtil years ago, adding indexes has become very easy. The entire operation is point and click, drill down menu driven. If you don’t know the name of a set or data item, the program will provide a pick list to choose from, see figure 1 for an example of a data set pick list.
I set up a variety of indexes in my test database. I did composite keys, composite with soundex, ASK keys, and reguler MDX keys. This is all very straight forward, especially if you know what you want to do. When creating a composite key you simply enter a data item name that doesn’t exist and OmniUtil will start prompting you for fields and byte positions. OmniUtil will create installation and indexing jobs for you when you are done. You are going to want to keep these around for those times when you need to reindex, and it’s probably good as standard maintainance to do once in a while, just like reloading your database.
Omnidex stores it’s indexes external to Image now, so it’s possible to create and apply indexes without exclusive access to the database, as long as people are only reading data you are pretty safe.
I modified my database to have all manual masters and no details and let Omnidex assign the keys. This is probably the most effecient way to do it, but you can also take advantage of the master..detail relationship and create record complexes that will return data from both sets within single queries.
We had a couple of false starts using the TPI since all my experience with Omnidex had been with using the native API directly. We wanted to use the TPI to be consistent with our client/server development which was using ODBC/32 and MiddleMan from Minisoft. Neither of these products will let you go directly against the Omnidex API, this wasn’t a problem with the TPI, attesting to the usefulness of the feature.
I was very pleased with the results from the reports and screens we created. The only downside was that large data loads can be VERY slow because it is adding the indexes as you go. Sometimes it’s best to disable the direct indexing and then just go back and reindex the whole base which will operate much faster, especially for loads of hundreds of thousands of records.
Omnidex is a terrific product. Once you have sat down and added some indexes and written a search screen or two and shown them to users, you will be a convert. Users just drop their jaws in amazment at the performance that they are seeing. It opens up a huge world of possibilities for data modeling and retrieval that you didn’t have before. The move to Open Systems for Omnidex really helped build out the technology even more, I just hope they don’t forget about their roots.
You can take the best advantage of Omnidex on a system that you are designing from scratch. That said, there are still tons of possibilities on existing systems. Since you can bolt it on without having to change your existing code, except where you want/need to, it’s easy to put in.
Omnidex isn’t cheap, and it’s hard to explain and justify to management. The best thing is to get a demo and solve a couple of current big issues for the company with it and prove what you can do. This is a powerful tool, and can be the difference between success and failure if time is of the essence.
Omnidex version 3.5
Dynamic Information Systems
5733 Central Avenue
Boulder, Colorado 80301
(303) 444-7460 FAX
Omnidex includes all the software required for host based indexing and processing. Optionally you can make use of technologies for ODBC and Internet access.
Omnidex for the HP 3000 runs on all HP 3000 Series 900s. The software is tier based ranging from $40,000 to $110,000. Discounting for government or multiple CPUs. Support ranges from $3,800 to $12,000 per year, the first year is included in the purchase price and includes phone in, electronic support and new releases of the software. DISC recently announced OMNIDEX for HP3000-based Data Warehouses for $60,000 to $140,000. All prices are in US dollars.
Omnidex also supports the following platforms and formats. HP9000/HP-UX, Intel/NT, Sun/Solaris, IBM/AIX, Sequent/Dynix, Digital/Unix, and Digital/VMS. OMNIDEX enhances data access into Oracle, SQL Server, Informix, Sybase, RMS, C-ISAM, as well as flat files.