Test Drive of DataBridge (formerly Warehouse/Forklift)
Shawn M. Gordon
What it can do?
First off I want to say that the product known for years as Warehouse has been slowly changing the name to DataBridge and refers to the Warehouse, Forklift combination. Forklift is included with Warehouse so there is no cost associated with the change.
DataBridge purpose in life is to allow you to easily move data around and reformat it. This can range from the very simple tot he very complex. It uses an easy to learn scripting language to facilitate this process. With the addition of the Forklift piece, you get a client/server front end that will allow you to graphically build your maps and then generate scripts. See figure 1 of a Forklift mapping process.
By either writing scripts or using Forklift to generate scripts you can easily (or not so easily) generate code that will perform data movement and transformation. There are plenty of built in functions to do things like type conversions and type checking and data parsing.
How does it work?
You can approach DataBridge in one of two fashions. You can make use of the Forklift GUI client/server interface to visually build your script, see Figure 2 for an example of the main screen. This method is really nice if you are doing something like moving from one database to another because Forklift will pull the record layout for tables and you can just draw lines between the fields.
Forklift will generate a Warehouse script that you can either execute on the PC and it will do remote connections (which is slow), or you can upload it to the HP and execute local (much faster). You can also write the scripts yourself. The scripting language is interpreted, but Warehouse makes a prepass at it first to verify the syntax, so it seems to run faster than something that is interpreting every line as it goes. The language is kind of a hybrid of features from C, Pascal and Basic. Don’t let that put you off though, it’s pretty straight forward. Take a look at the script generated from our mapping in Figure 1:
open FYIDB IMAGE fyidb PASS=READ MODE=5 open TEMP FIXED TEMP mode=w format TEMP_FMT MAIL-NAME : IMAGE X12 TERMINAL-NO : IMAGE I1 PRINTER : IMAGE X8 PHONE-EXT : IMAGE I1 DEPT : IMAGE X26 NODE : IMAGE X8 FLAGS : IMAGE X16 USER-PASS : IMAGE I1 USER-NAME : IMAGE X30 TIME-ON : IMAGE I2 DATE-ON : IMAGE I1 FLAGS2 : IMAGE X80 end define TEMP_REC : format TEMP_FMT read USER-M_FLOW_R = FYIDB.USER-M for MAIL-NAME = "SMG" setvar TEMP_REC.MAIL-NAME = MAIL-NAME setvar TEMP_REC.TERMINAL-NO = USER-PASS setvar TEMP_REC.PHONE-EXT = TERMINAL-NO setvar TEMP_REC.NODE = NODE setvar TEMP_REC.USER-NAME = PRINTER setvar TEMP_REC.TIME-ON = TIME-ON setvar TEMP_REC.DATE-ON = DATE-ON copy TEMP_REC to TEMP endread
The only reason we are naming fields here is because we are not doing a straight map of all fields. Now if you wanted to do something a little more complex with the information, like total up a variable for all the records you could do something like:
setvar total-bill = total-bill + (numeric(str(MY-BUFF,26,9)) * 100)
What we are doing here is using the “str” function to pull out 9 bytes starting from byte 26. MY-BUFF is an array of four items, so we are getting the third item from the array. Finally we convert the string to a numeric and multiply by 100 to get rid of the decimal.
Forklift provides a very simple point and click for building maps. You need to know the name and location of the data structure, so you will supply a logon in the case of the HP3000, the ip address, the base name and password. Forklift will then pull in the data source and you can pick data sets and items from there. You can have a single source going to multiple locations as well if you want. Some of the terminology is odd at first, but once you use Forklift for a bit, it makes sense. I gave it to a contractor with the manual and after a day he knew almost as much about it as I did, so it’s pretty easy to learn.
All the Warehouse functions are available in ForkLift from a picklist, so you can build up as nasty or complex set of nested functions as you wish, just by clicking. It’s pretty fun actually. I was able to knock out useful scripts within just a few minutes without having to type anything, and I like that.
If you choose to write scripts directly then you should grab your favorite editor and start typing, then execute it from Warehouse to run it. I like to have small scripts laying around to test concepts with so I can make sure that I’m on the right track. I like to do this with COBOL as well. Since Warehouse and execute a script at any point in it’s processing, we like to create global routines and record layouts as external files and then include them in our script like you would in COBOL with the $INCLUDE statement. This allows us the obvious advantage of code and record layout reuse.
Warehouse supports the ability for you to create your own functions that can be used in expressions. While I haven’t built any yet, I can see how powerful this would be. I’m a big fan of COBOL macros, and I can see creating custom Warehouse functions in the same way.
Overall Warehouse makes for nice shorthand. One of the neat features is the way you can create record structures, and this is one of its similarities to the C language. You can define a record type and then declare a variable that is of that type. So if you have a record structure that you need to reuse, you can define it multiple times with different names. This is one of those features that is really handy at times, but isn’t so well documented.
Installation and Documentation
Installation of the HP software is cute. They restore one huge executable which you run and it’s a self extracting archive that creates all the correct accounting structure and puts the files in place. I’ve never seen anyone build a self extracting archive file on the HP before, so I found it fascinating.
Documentation is probably the only real weak part of the product. There is a tutorial guide that does an admirable job of giving you the basics, but the reference guide is rather oddly arranged and the examples, especially on the functions and how to create user defined functions, is not so great. Some of the functions are explained in great detail with plenty of examples, but a host of others are difficult to pull out from what is explained. This is my only real source of complaint with the software.
The Test Drive
I’ve actually used Warehouse for several years now for different types of projects. My first project was to allow the developers to copy client specific data off of our main system and database into a test database on a test system. I set up a little MPE file where they could enter a range and/or list of client numbers and a date range for history information. Using the network interface of DataBridge, it would read the data across the network and populate the local database. The extent of the code for each table basically looked like this:
open TRACE REMOTE SMGANET user=mgr.smga IMAGE TRACE PASS=READ MODE=5
open TRACEL IMAGE TRACE.DB PASS=READ MODE=5
read RUNM = TRACE.RUN-M for CLIENT-ID = “PG”
copy RUNM to TRACEL.RUN-M
Now you’ve got to admit that that’s easy. A neat feature that I ran into by accident is if you change datatypes in one of the data sets and do a copy DataBridge will figure it out and tell you and take care of the conversion. I had converted all I2 fields to I4 in the target base and forgotten that we had the change, and went to copy the data and got the message from DataBridge, and everything copied nicely.
A new way that I’m using DataBridge is in manipulating data that is electronically transferred to us. Everyone sends there data in different formats, but we have to put it into a single format. So we create custom scripts for each layout to put it into a standardized format, then run a COBOL program to do the load. We use a COBOL program at this point because we have some serious edit checking a reporting to do, sometimes we go direct to the database depending on the data.
We also go the other direction on this where we have a COBOL program extract the data to a standard format, then use client specific scripts to put the data into their requested format. By putting the script names in the database and writing a loop in the job with MPEX, we are able to create processes that don’t ever need to be updated if clients come or go.
We’ve come up with a variety of really bizarre scripts, but we’ve been able to do pretty much everything we’ve wanted to. I did find that using Forklift was great for getting started, but in general you can move a lot faster once you know what you’re doing. I still go to it every once in a while, it’s a prettty handy tool and a decent way to learn.
There are tremendous possibilities opened up by using a tool like DataBridge. Initially you might look at it and think that you can just write programs to do what it does, and you would be right. The beauty of DataBridge is it’s ability to easily move data between different types of storage, and how quickly you can generate code. You could consider it a batch oriented 4GL, but it doesn’t really carry the baggage of a convential 4GL in that it’s performance is excellent.
There is a learning curve involved, but if you are familiar with C, Pascal, or even CI programming, it will help. What throws most COBOL programmers off is the nested functions. Once they get over that hurdle the light comes on and they start smoking through it.
Hopefully I’ve shown you some non-data warehouse uses for DataBridge. It’s certainly useful for creating and maintaining data warehouses, and the BridgeWare component for real time data warehousing is quite a technical feat that will make your historical data that much more useful. If you are looking into data movement at all, or some of the examples I’ve described here, I would recommend that you check out DataBridge, it’s a powerful tool with lots of possibilities.
Warehouse version 2.06.0500-M
Forklift version 1.B1.1005
Taurus Software Inc.
1032 Elwell Court
Palo Alto, CA 94303
DataBridge includes the 3000 based server software for client/server access, the Forklift client software, and the Warehouse engine that executes the scripts.
DataBridge runs on all versions of MPE/iX as well as Windows and many versions of Unix. DataBridge supports Image/SQL, Oracle, Allbase, Informix, SQL Server, and all types of flat and delimtied files. Pricing on the software for the HP 3000 is tier based ranging from $20,000 to $44,000. Discounting on multiple CPUs. Support is 15% of the purchase price per year and includes phone in, electronic support and new releases of the software. All prices are in US dollars.