The Wonderful World of Data Shaping
The Wonderful World of Data Shaping
by Chris Payne
Introduction
Data shaping is a little known feature of ADO 2.0 and above that allows you to create hierarchical recordsets, which can give you tremendous power over manipulating and displaying data. Unfortunately, data shaping has taken a back seat for many developers.
Data shaping lets you create parent-child relationships between recordsets, i.e. recordsets within recordsets, all with a single ADO object. This is a dream for a web developer, because it eliminates messy joins, complicated filtering, and the need for spaghetti-code in presentation logic. This in turn reduces the amount of traffic crossing a network and reduces overhead when interfacing with other tools, such as XML.
This article will discuss the theories and applications behind data shaping, its raison d^etre, with beginning to advanced examples, and assumes that you have a good basic knowledge of database methodology and SQL syntax.
What is Database Hierarchy?
Imagine a set of tables that hold information on a product database. Some items beg to be related in a parent-child relationship, for instance, manufacturers and products. Each manufacturer can have several products, or children. Since each product can only belong to one manufacturer; it will have only one parent. This is the idea behind relational databases, something which has been in use for quite some time.
In practice, however, things aren't so easy. We can set all kinds of foreign key relationships to enforce this bond between tables, but those relationships don't do much for you except to make sure you don't input random information. We need a way to take this parent-child relationship to the next step, and take it out of theory and put it into practice.
Why Hierarchy?
Suppose you had three tables for a product database: a manufacturer table, a product table, and transaction table, described below (note that these tables are simple and solely for demonstration):
tblManufacturer
| Column Name |
Description |
| MID |
Unique identifier for manufacturers |
| Name |
Manufacturer name |
| Address |
Manufacturer's address |
tblProducts
| Column Name |
Description |
| PID |
Unique identifier for products |
| MID |
Foreign key to tblManufacturer |
| Name |
Product name |
| Instock_Flag |
Out of stock flag |
tblTransaction
| Column Name |
Description |
| TID |
Unique identifier for transactions |
| MID |
Foreign key to tblManufacturer |
| PID |
Foreign key to tblProducts |
| Type |
Transaction Type |
These tables would hold information about manufacturers, the products each manufacturer sells, and the individual transactions associated with each, for example, a sell or buy. Both tblTransaction and tblProducts could have a many-to-one relationship with tblManufacturer.
If you wanted to display this information to a user, you might want to relate the information somehow. Typically, this could be done with SQL join statements:
SELECT tblProducts.Name, tblManufacturer.Name, Type FROM tblManufacturer JOIN tblProducts ON tblManufacturer.MID = tblProducts.MID JOIN tblTransaction ON tblManufacturer.MID = tblTransaction.MID AND tblProducts.PID = tblTransaction.PID
Or simply:
SELECT tblProducts.Name, tblManufacturer.PID, tblManufacturer.Name, Type FROM tblManufacturer, tblProducts, tblTransaction WHERE tblManufacturer.MID = tblProducts.MID AND tblManufacturer.MID = tblTransaction.MID AND tblProducts.PID = tblTransaction.PID
This would display something like the following:
| tblProducts.Name |
tblManufacturer.Name |
tblManufacturer.PID |
Type |
| Rubber band |
ACME Rubber Band Co |
1 |
Buy |
| Big Rubber band |
ACME Rubber Band Co |
1 |
Sell |
| Pencil |
Northeast Pencil Inc |
2 |
Sell |
| Sunglasses |
Shades Inc |
3 |
Sell |
| ... |
... |
... |
... |
As you can see, you get a pretty flat recordset in return, with potentially a lot of duplicate information. While this may be appropriate for some applications, it doesn't give you very much flexibility - you can't display the information in any other way than what's displayed above, and updating or inserting any information would be a chore. Often times, an ASP application needs more flexibility, especially in the display of information. This type of recordset would benefit from a way to easily relate the different tables and fields.
The Power of Hierarchy
Using data shaping, you could manipulate the above information in all sorts of wonderful ways. As discussed above, we'll relate manufacturers and products to each other as parent and child (we'll discuss transactions a bit later). Here's the syntax to create a basic data shape:
SHAPE {parent-command} [[AS] table-alias]
APPEND {child-command}
RELATE(parent-column TO child-column)
The parent and child-commands can be any valid SQL clause that returns a recordset, for instance, a SELECT statement. Note however, that SQL is not required - you can use any query language you wish. Let's try an example:
SHAPE { SELECT MID, Name, Address FROM tblManufacturer }
APPEND ({ SELECT Name, Instock_flag FROM tblProducts } AS Products
RELATE MID to MID )
This query produces the following results:
| MID |
Name |
Address |
Products |
| 1 |
ACME Rubber Band Co |
Dayton, OH |
(reference to Products chapter) |
| 2 |
Northeast Pencil Inc |
Boston, MA |
(reference to Products chapter) |
| 3 |
Shades Inc |
Boise, ID |
(reference to Products chapter) |
| ... |
... |
... |
... |
Appending a child recordset onto our first parent recordset produces a new field in the parent recordset, called Products. This field contains references to pre-filtered recordsets. Each reference contains only the products that belong to the specified manufacturer!
Also, notice the use of the word chapter. Child recordsets are called chapters to the shaping provider.
Here is some code in VBScript that presents the information to the user.
<%
Dim strSQL, rst, strConnectionString, rstChild
strConnectionString = "Provider=MSDataShape;dsn=MyDSN;database=MyDatabase"
strSQL = "SHAPE { SELECT MID, Name, Address FROM tblManufacturer } " & _
"APPEND ({ SELECT Name, Instock_flag FROM tblProducts } " & _
"AS Products RELATE MID to MID )"
Set rst = Server.CreateObject("ADODB.Recordset")
'open recordset
rst.Open strSQL, strConnectionString
Do While Not rst.EOF
Response.Write(rst("MID") & ", ")
Response.Write(rst("Name") & "<br> ")
Response.Write("Products:")
'set object to child recordset and iterate through
Set rstChild = rst("Products").Value
If Not rstChild.EOF Then
Do While Not rstChild.EOF
Response.Write(" " & rstChild("Name"))
If rstChild("InStock_Flag") Then
Response.Write("In stock")
Else
Response.Write("Out of stock")
End If
rstChild.MoveNext
Loop
rst.MoveNext
End If
Loop
rst.Close
Set rst = Nothing
%>
|
This code would produce the following output:
1, Acme Rubber Band Co
Products:
Rubber band, In stock
Big Rubber band, In stock
Small Rubber band, Out of stock
...
2, Northeast Pencil Inc
Products:
Pencil, Out of stock
Yellow pencil, In stock
...
3, Shades, Inc
Products:
Sunglasses, Out of stock
Shades, In stock
...
We set an object equal to the reference column in the parent recordset, and iterate through that as we would a normal recordset object. The results returned are only applicable to the current parent item (i.e. only products belonging to the specific manufacturer are returned). This eliminates the need to create multiple recordset objects and form sub queries for each one.
Also note the extra clause in the connection string, "Provider=MSDataShape". The provider defaults to "MSDASQL," which normally supplies the necessary data, but is unable to interpret SHAPE commands. Therefore we must tell the ADO objects to use the MSDataShape provider instead -- not specifying this will result in an error.
As you can see, this is an efficient, well formed method of displaying data -- we don't have duplicate information, everything is already grouped properly, and we don't have a headache.
Complex Enough For Ya?
Before we move onto more complex examples, let's take a closer look at data shaping.
In theory, a shaped recordset may contain the following types of fields:
- Regular data (just like a regular recordset)
- Pointers to other recordsets (children or chapters)
- Calculations on a row of the recordset
- Calculations over a column of the recordset
- Empty columns (in case you need to keep track of some temporary information while you work with a recordset)
The most important of these is probably the pointers to other recordsets -- the true power of data shaping.
The best thing about data shaping is that you aren't limited to one child, or even one level of children. You can have multiple children, grand children, great grandchildren, etc. When you get into large databases, this feature comes in handy. They syntax for adding multiple children is as follows:
SHAPE {parent-command} [[AS] table-alias]
APPEND {child-command}
RELATE(parent-column TO child-column) {,
({child SQL statement} [[AS] alias]
RELATE parent column TO child column) [[AS] alias]}
An actual query would look like this:
SHAPE {SELECT MID, Name, Address FROM tblManufacturer}
APPEND
({SELECT Name, Instock_Flag FROM tblProducts} AS Products RELATE MID TO MID),
({SELECT MID, PID, Type FROM tblTransaction} AS Transaction RELATE MID TO MID)
By adding another loop to your ASP script, you can now list all transactions for manufacturers as well.
1, Acme Rubber Band Co
Products:
Rubber band, In stock
...
Transactions:
Sell
...
2, Northeast Pencil Inc
Products:
Pencil, Out of stock
...
Transactions:
Sell
...
This is definitely very useful, but wouldn't it make more sense to relate the transactions to products as well? Enter grandchildren:
SHAPE {parent-command} [[AS] table-alias]
APPEND (
( SHAPE {child SQL statement} [[AS] alias]
APPEND ( {grandchild SQL statement } [[AS] alias]
RELATE child column TO grandchild column) [[AS] alias] )
RELATE parent column TO child column) [[AS] alias]
As you can see, you simply embed another shape statement within the append statement of the first shape command. The output would look like:
1, Acme Rubber Band Co
Products:
Rubber band, In stock
Transactions:
1, Buy
2, Sell
...
Big Rubber band, In stock
Transactions:
1, Sell
2, Sell
...
Now we have products sorted by manufacturers, and transactions sorted by manufacturers and products. Of course, if we stored more pertinent information in our databases, you would be able to easily see some very interesting data, such as buy/sell trends, but these tables are simply for demonstration purposes.
Note, however, that data shaping isn't only useful for one-to-many relationships, such as the ones we've been describing above, but that they can also be very helpful in one-to-one relationships that you would normally handle with SQL JOINS. If we stored the manufacturers' address in another table, for instance, it would be easy to simply append the address table to the manufacturers' as a child. The display of such data might also be more logical. Just another way to really put into use the concepts of normalized and relational databases.
1, Acme Rubber Band Co
Products:
Rubber band, In stock
Transactions:
1, Buy
Address:
Dayton, OH
You can easily see how complex and wonderful data shaping can get with relatively simple commands.
Computation
Using a shape compute command changes things a bit. First, I'll outline the syntax:
SHAPE {child-command} [[AS] table-alias]
COMPUTE aggregate-command-field-list
[BY grp-field-list]
This seems counter-intuitive from what we just learned. What happens is that the compute statement generates an implied parent recordset -- you don't have to explicitly specify one yourself. For instance:
SHAPE { SELECT MID, Name, Address FROM tblManufacturer } AS Manufacturer
COMPUTE Manufacturer BY Name
This statement would produce the following parent recordset:
| Manufacturer |
Name |
| (reference to Man. chapter) |
ACME Rubber Band Co |
| (reference to Man. chapter) |
Northeast Pencil Inc |
| (reference to Man. chapter) |
Shades Inc |
| .... |
... |
In essence, we are creating a parent recordset with references to children that are built from the SQL Select statement we issued, and grouping each item by manufacturer name. The child recordset for ACME Rubber Band Co would look as follows:
| MID |
Name |
Address |
| 1 |
ACME Rubber Band Co |
Dayton, OH |
This is exactly what is produced by the SQL statement we specified, filtered by the parent recordset Name. Pretty neat huh?
What exactly is the application for this? Why don't we just use a normal shape command? For one thing, the COMPUTE statement and underlying data here are very simple. Once these start getting even mildly complex, this statement can help tremendously. Secondly, it can help you group data in ways that you normally wouldn't think of. Once we start nesting SHAPE commands (i.e. creating grand children), you can organize the hierarchy by any level in the hierarchy. Meaning the top recordset, or parent, doesn't necessarily have to be the top most select statement.
A Search Engine
Imagine a search engine, where we can search for manufacturers. But, the user would rather search for the type of industry a manufacturer is in, before choosing from a list of manufacturers. How do we group this in a logical fashion? Take a look at the following statement, which assumes we have a new "Industry" table:
SHAPE (
SHAPE { SELECT MID, IID, Industry FROM tblIndustry }
APPEND (
{ SELECT MID, Manufacturer FROM tblManufacturer WHERE (Manufacturer LIKE '%" & searchstring & "%') } AS Manufacturer
RELATE MID to MID ) ) AS RS
COMPUTE RS BY Industry
Suppose the user types in "band" as the search query. The returned display could look like:
Rubber
ACME Rubber Band Co
Asian Rubber Bands
Clothing
Clothes Band, Inc
HatBand Co
Entertainment
I-Band Entertainment
...
Or simply:
Rubber
Clothing
Entertainment
...
We have returned the matching manufacturers and industries. The user may wonder how you gave him information that has nothing to do with the search term he supplied. Magic at its finest follks.
Conclusion
We can easily see how data shaping provides a powerful mechanism for logical manipulation of relational information. And we've barely touched on it in this article. Make sure to check out the resources below to get you started, but the best way to explore data shaping's potential is by jumping in and applying it to many (if not all) of your current database applications. Many times you'll see that data shaping is the way to go.
Resources
Check out these links for further resources on data shaping syntax and methodology:
- 4 Guys From Rolla - Advanced Data Shaping Techniques
- Smart Access - Data Shaping
- MSDN - Data Shaping Service for OLEDB