Description and Usage of Joins in Microsoft Query

ID: Q136699


The information in this article applies to:


SUMMARY

This article discusses using joins in Microsoft Query. A join is a method you can use with databases to define and illustrate relationships between two tables. In Microsoft Query, you can create and manipulate various types of joins. In many cases, Microsoft Query joins the tables in your query for you. Joins are represented by lines connecting the tables in the Table pane.

Topics covered in this article include the following:


   What Is a Join?
   Inner Joins
   SQL Statements
   Outer Joins
   Subtract Joins
   Full Outer Joins
   Self Joins
   Equi-Joins
   Natural Joins
   Cartesian Products 


MORE INFORMATION

What Is a Join?

A join is a connection between two tables where the two tables are merged according to a field that they have in common, creating a new virtual table (which can be saved out as a real table). For example, with the following two tables:


      Color_Table:

   Join_Field   Color_Field
   1            Red
   2            Blue
   3            Green

      Pattern_Table:

   Join_Field   Pattern_Field
   2            Striped
   3            Checkered
   4            Polka-Dot 


a simple join resembles the following:


   Join_Field   Color_Field      Pattern_Field
   2            Blue             Striped
   3            Green            Checkered 


The result table contains only records 2 and 3 in the Join_Field field because they are the only records that exist in both the Color_Table and the Pattern_Table.

A practical example of a join is a retailer's list of products and manufacturers; a Products table and a Vendors table could be joined on a Product ID field.

The Inner Join

The join in the previous example, called an inner join, is the simplest type of join. Usually, you want to use only a fraction of the fields in the tables. For example, in a real inner join, such as the one above, you may want to exclude Join_Field, so that the join looks like this:


   Color_Field      Pattern_Field
   Blue             Striped
   Green            Checkered 


In Microsoft Query, inner joins are the default join type (for more information, see page 105 in the "Microsoft Query User's Guide," version 1.0).

The SQL Statement

A Structured Query Language (SQL) SELECT statement is a type of macro that you can use when you create a join. Note that SQL is very different from Microsoft Excel's other macro languages (Visual Basic for Applications, and the Excel 4.0 macro language).

It is NOT necessary to understand SQL in order to easily create joins in Microsoft Query.

Every join has a SELECT statement associated with it. You can view the SELECT Statement for any join in Microsoft Query by clicking "SQL" on the toolbar. As with Microsoft Excel's macro recorder, you can use Query to record the SELECT statement. Unlike Microsoft Excel's macro recorder, the SELECT statement recorder is always on and cannot be turned off. Here's how the SELECT statement might look in Microsoft Query for the above inner join:


   SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field
   FROM c:\database.mdb Color_Table, c:\database.mdb
   Pattern_Table
   WHERE Color_Table.Join_Field = Pattern_Table.Join_Field 


Note that .mdb is the file name extension for a Microsoft Access database, which can have multiple tables in one file. In some other databases, such as dBASE, Paradox, and FoxPro, each table must have its own file. In these cases, the SQL syntax may appear redundant because the table name is always the same as the file name without the extension.

SQL syntax varies among querying engines; for example, in Microsoft Access the query from the above example resembles the following:


   SELECT Color_Table.[Color_Field],
   Pattern_Table.Pattern_Field
   FROM Pattern_Table INNER JOIN Color_Table ON
   Pattern_Table.[Join_Field] = Color_Table.[Join_Field]; 


The path to the table is not used in Microsoft Access because the table is in a Microsoft Access .mdb file. Even if an external table is attached and used in the query, the Microsoft Access SQL statement does not display the path to the external table.



The Outer Join

Another kind of join is called an outer join. With an outer join, you get all the records from one table and only those records from the other table that have matching values from the first table. This may leave some of the field entries blank, or "Null." For any two tables to be joined, there are two possible outer joins, the "left outer join" and the "right outer join," (so named because you usually view the tables side-by-side). Using the previous two tables in an example, the following is one of the two possible outer joins:


   Join_Field   Color_Field   Pattern_Field
   1            Red           (NULL)
   2            Blue          Striped
   3            Green         Checkered 


The other possible join is as follows:


   Join_Field   Color_Field   Pattern_Field
   2            Blue          Striped
   3            Green         Checkered
   4            (NULL)         Polka-Dot 


NOTE: In an join, you do not see the word "NULL" when you view the data; use the keyword "NULL" when you are working with joins.

In Microsoft Query, both kinds of outer join can be created easily using the mouse (for more information about this procedure, see page 112 of the "Microsoft Query User's Guide," version 1.0).

The following is how the SQL statement might look for the second example of an outer join:


   SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field
   FROM {oj c:\database.mdb Color_Table LEFT OUTER JOIN
   c:\database.mdb Pattern_Table ON Color_Table.Join_Field =
   Pattern_Table.Join_Field} 


To create a practical example of an outer join, make a list of a company's products with sales figures for the products that had been sold, but not excluding products that had not been sold. To do this, use a Product ID field to join a Products table and a Sales table.

The Subtract Join

A third kind of join is the subtract join. A subtract join is the opposite of an outer join; it includes ONLY those records in one table that don't match any record in the other table. Like outer joins, there are two possible subtract joins for any two tables that you want to join; however, they are not usually referred to as "Left subtract join" or "Right subtract join." A subtract join normally return fields from only one of the tables, because by definition the other table's fields return only NULL values. The following is one possible subtract join:


   join_Field   Color_Field
   1            Red 


and here is the other:


   Join_Field   Pattern_Field
   4            Polka-Dot 


In Microsoft Query, a subtract join is created by first creating an Outer join, and then using the criteria "IS NULL" on an appropriate field (Pattern_Field in the first example above; Color_Field in the second example) to exclude records that match between the tables.

The following is how the SQL statement might look for the first subtract join above:


   SELECT Color_Table.Join_Field, Color_Table.Color_Field
   FROM {oj c:\database.mdb Color_Table LEFT OUTER JOIN
   c:\database.mdb Pattern_Table ON Color_Table.Join_Field =
   Pattern_Table.Join_Field}
   WHERE (Pattern_Table.Pattern_Field Is Null) 


To create a practical example of a subtract join, a list of customers that had not ordered recently. To do this, use an Order ID field to join a Customers table and an Orders table.

The Full Outer Join

A fourth kind of join is the full outer join. A full outer join is a combination of an outer join with its complimentary subtract join. A full outer join includes all records from both tables and merges those records that are common between the two tables. The following is a full outer join:


   Join_Field   Color_Field   Pattern_Field
   1            Red           (NULL)
   2            Blue          Striped
   3            Green         Checkered
   4            (NULL)        Polka-Dot 




In Microsoft Query, a full outer join is created by inserting the UNION operator on its own line between the outer join's SELECT statement and the subtract join's SELECT statement (see above).

To create a full outer join in Microsoft Query, use the following steps:

  1. Create an outer join, then click New on the File menu and create a subtract join.


  2. Copy the subtract join's SQL.


  3. Switch to the outer join, type the word UNION on its own line below the outer join's SQL statement, paste the subtract join's SQL below the word UNION, and close the subtract join's window.


Here is how the SQL might look for the above full outer join (commands below the UNION operator is pasted):


   SELECT Color_Table.Join_Field, Color_Table.Color_Field,
   Pattern_Table.Pattern_Field
   FROM {oj c:\database.mdb Pattern_Table LEFT OUTER JOIN
   c:\database.mdb Color_Table ON Color_Table.Join_Field =
   Pattern_Table.Join_Field}
   UNION
   SELECT Color_Table.Join_Field, Color_Table.Color_Field,
   Pattern_Table.Pattern_Field
   FROM {oj c:\database.mdb Pattern_Table LEFT OUTER JOIN
   c:\database.mdb Color_Table ON Color_Table.Join_Field =
   Pattern_Table.Join_Field}
   WHERE (Color_Table.Color_Field Is Null) 


To create a practical example of a full outer join, merge overlapping customer lists used by different departments, including fax numbers (which were only on the first list), and Internet e-mail names (which are only in the second list). Each department could continue to use its partial list while having the complete, joined list available. They could be joined on a Customer ID field.

The Self-Join

A fifth kind of join is the self-join. A self-join is a connections where a field in a table is matched with a different field in a copy of the same table. Using this example table:


      Table_Three

   Employee_ID   Employee_Name   Reports_To
   1             Bob             3
   2             Sue             (NULL)
   3             Jim             2
   4             Jane            3 


and a copy of it, as follows:


      Table_Three_01

   Employee_ID   Employee_Name   Reports_To
   1             Bob             3
   2             Sue             (NULL)
   3             Jim             2
   4             Jane            3 


A self-join could be used to create a list of employee names with their supervisor's names. The Employee_ID in Table_Three would be joined with the Reports_To in Table_Three_01. The following is how it might look at first:


   Employee_Name   Employee_Name
   Bob             Jim
   Sue             (NULL)
   Jim             Sue
   Jane            Jim 


However, because it is confusing to have the same field name for both fields, change one of the field names, as follows:


   Employee_Name   Supervisor
   Bob             Jim
   Sue             (NULL)
   Jim             Sue
   Jane            Jim 


The following is how the SQL might look for the above self-join:


   SELECT table_three.Employee_Name,
   table_three_01.Employee_Name 'Supervisor'
   FROM c:\database.mdb table_three, c:\database.mdb
   table_three_01
   WHERE table_three.Employee_ID = table_three_01.Reports_To 


When you return data to Microsoft Excel, it is of no use to rename the field in Microsoft Query. This is true because Microsoft Excel uses the original field name. For more information about this issue, please see the following article in the Microsoft Knowledge Base:

Q121551 : XL5: Field Instead of Column Name in MSQUERY Returned to Excel

A Microsoft Excel macro must change the column name every time the returned data is refreshed (unless you return the data in a PivotTable, in which case the Pivot itself can create and keep a custom field name).

The Equi-Join and Natural Join

Almost all joins, including all examples given so far, are equi-joins and natural joins. The meanings of these terms are of little significance to the average user of Microsoft Query, but the next two paragraphs attempt to explain the terms for those who may be curious.

The equi-join is a join in which records are retrieved based on whether the join fields have matching values in both tables. That may seem like just a definition of a join, but it isn't. An example of a non-equal join is a join where records in the first table are joined to those records in the second table where the joined field in the first table is greater than (instead of equal to) the joined field in the second table (or less than, or anything besides equal to). Naturally, this returns more records than an equi-join.

A natural join is one in which only one of the two tables' joined fields is returned. Since these two fields are by definition identical in an equi-join, it is redundant to include both. For a non-equal join, it is important to include both of those fields. So, equi-joins and natural joins go together. You would want an equi- joins (which describes most joins) to be natural join by returning only one of the joined fields; but, if you ever use a non-equal join, you might also want to make it a non-natural join by returning both of the joined fields.

There are other kinds of joins. The full spectrum of joins was most recently defined in 1992 and this standard is known as SQL-92. Some joins are not important to Microsoft Excel users because these joins do things that are easier to do in Microsoft Excel.

The Cartesian Product

Trying to return data from two or more tables without any joins creates what is called a "Cartesian product." A Cartesian product is defined as all possible combinations of rows in all tables. Be sure you have joins before trying to return data, because a Cartesian product on tables with many records and/or on many tables could take several hours to complete. The following is a Cartesian product as used on the two example tables; note that this table is only 3 records times 3 records, which yields a total of 9 records. However, imagine if instead, the table was 100 records times 1,000 records times 10,000 records; then the table would contain 1,000,000,000 records!


   Join_Field   Join_Field   Color_Field   Pattern_Field
   1            2            Red           Striped
   1            3            Red           Checkered
   1            4            Red           Polka-Dot
   2            2            Blue          Striped
   2            3            Blue          Checkered
   2            4            Blue          Polka-Dot
   3            2            Green         Striped
   3            3            Green         Checkered
   3            4            Green         Polka-Dot 


Occasionally, some users want to use a Cartesian product; however, most users who get them do so accidentally, and are often confused by them. Because most users exclude most of the fields in a join, a real Cartesian product can easily look as perplexing as this:


   Color_Field
   Red
   Red
   Red
   Blue
   Blue
   Blue
   Green
   Green
   Green 


If 100 records are added to Pattern_Table, this query would have 309 records (103 records each of Red, Blue, and Green).

Cartesian products have relatively simple SELECT statements. The following is how the SQL might look for the above Cartesian product:


   SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field
   FROM c:\database.mdb Color_Table, c:\database.mdb
   Pattern_Table 


A practical example of a Cartesian product would be to create a list of all possible combinations of options on a merchandise product, with price totals for each combination.


REFERENCES

Microsoft Query "User's Guide," version 1.0, pages 101-114, 123-131

The following two books are not included with any Microsoft products and are not produced by Microsoft.

The third-party products discussed here are manufactured by vendors independent of Microsoft; we make no warranty, implied or otherwise, regarding these products' performance or reliability.

"Understanding the New SQL: A Complete Guide," Morgan Kaufmann Publishers, Inc., 1993.

"Joe Celko's SQL for Smarties: Advanced SQL Programming," Morgan Kaufmann Publishers, Inc., 1995.

For more information about creating joins in Microsoft Query, choose the Search button in Help and type:


   joins, overview 

Additional query words:


Keywords          : xlquery 
Version           : WINDOWS:1.0,2.0; MACINTOSH:1.0,98
Platform          : MACINTOSH WINDOWS 
Issue type        : 

Last Reviewed: April 14, 1999