EtcProtocol Demonstrates Pluggable Protocol Handler

ID: Q180367


The information in this article applies to:


SUMMARY

EtcProtocol is a working example of an Asynchronous Pluggable Protocol ("AsynchPP" for short). It parses a URL string of a particular form, retrieves the data for the requested resource, and then uses the correct AsynchPP interfaces to pass that data on to URLMON. In this example, EtcProtocol simply retrieves "text/html" data using another URL Moniker download and then briefly filters the HTML characters before passing them on. All of the "<" and ">" characters are HTML encoded to "&gt;" and "&lt;," respectively, which causes an HTML page to display as raw source in the browser rather than a fully rendered page.

EtcProtocol is a C++ sample that uses ATL to implement the COM server functionality. It has been built and tested for Visual C++ 5.0.


MORE INFORMATION

The following file is available for download from the Microsoft Software Library:

EtcProt.exe
NOTE: This sample is intended for demonstration purposes only and is not the ultimate implementation of an Asynchronous Pluggable Protocol. In all cases, efficiency and robustness were sacrificed in the name of simplicity and clarity. Implementations based on EtcProtocol should make an effort to improve on the basic design. To further this end, the code for EtcProtocol has been marked in several places with // TODO: comments that suggest code that could make the protocol safer, more complete, or more efficient.

Registration

The registry entries necessary for a pluggable protocol are straightforward and covered well in the main documentation for Pluggable Protocols in the Internet Client SDK. All that is required to use EtcProtocol is that EtcProtocol.dll is present on the system and then regsvr32 EtcProtocol.dll is run. EtcProtocol uses ATL's simple registration facilities via the .rgs file; open it up and take a peek.

The Basics

Pluggable Protocols can best be thought of as a new layer of abstraction in Internet Explorer 4.0's (Internet Explorer 4) process of retrieving data for rendering in the browser. URL Monikers, a COM-based download technology that introduced a new implementation of the well known OLE interface IMoniker, were first seen with the release of Internet Explorer 3.0. URL Monikers encapsulate the data retrieval process so that data pulled from a variety of different sources and protocols can be retrieved in a generic fashion: ala IMoniker::BindToStorage or IMoniker::BindToObject. Even though the Win32 Internet API (WinInet) are remarkably easy to use, the functions needed to retrieve FTP files are very different from the functions needed for HTTP files or especially local files. With URL Moniker downloads, the only thing that changes is the URL string - "ftp://file" or "http://file" or "file://file" - the rest is common code.

Internet Explorer 4 adds to this picture by abstracting out the piece of URLMON (the URL Moniker module) that actually retrieves the data. Because there might be many more ways to retrieve browsable documents than directly from an HTTP server, FTP server, or Gopher server, it is now possible to supplement or replace the URLMON modules with COM objects that serve as an intermediary between URLMON and the data source, whatever it might be. The pieces of this new puzzle are called Pluggable Protocol Handlers and they are basic COM objects that implement and use a basic set of documented interfaces. For example, URLMON implements the "http://" protocol as an internal COM object that uses WinInet to retrieve HTTP data from the wire and pass it on to the main implementation layer that abstracts the data into a single BindToStorage or BindToObject call. All told, Internet Explorer 4 ships with 10 AsynchPPs, some which are quite useful such as "res://" and "mailto://".

When AsynchPPs are thought of as simple middlemen, their responsibilities are very clear. As input, the AsynchPP must be able to parse and understand a particular URL. AsynchPPs are registered according to the protocol they implement. As per the URL spec, the "protocol" for data retrieval is specified by the prefix before the colon on the URL string, as in "protocol://resource". So when a particular protocol is asked for, URLMON will pass the whole URL string in to the AsynchPP and expect it to understand the rest of the syntax.

As output, AsynchPPs are expected to produce a stream of raw data bytes that can be read in much the same manner as the IStream interface can be read. How the AsynchPP gets those bytes is its business. The whole point of having AsynchPPs is to allow for the possibility of new protocol forms other than the standard HTTP. Because the AsynchPP's requirements are so simple, anything is possible. Ideas include a "SQL://" protocol handler which could possibly retrieve data from a SQL table and format it into HTML output or even a "my http://" protocol handler which defines a new application level protocol over TCP/IP for the transfer of documents or data back and forth.

Note that Pluggable Protocols actually have been around since Internet Explorer 3. Take a look at the DevStudio 5.0 Help system, and you'll notice a working protocol handler. However, the architecture is now much more complete and allows for a greater degree of flexibility, particularly with the protocol string format (no more "mk:@progid://".)

The Interfaces

The interfaces for an AsynchPP are fairly simple and well documented in the Internet Client SDK. An AsynchPP implements mainly one interface, IInternetProtocol, with an optional second interface, IInternetProtocolInfo.

IInternetProtocol is actually derived from another interface, IInternetProtocolRoot, but currently almost all AsynchPPs will want to implement the full meal. The main methods on this interface are Start and Read. URLMON uses IInternetProtocol::Start to ask the AsynchPP to begin obtaining the data. Later, once the data is available URLMON will use IInternetProtocol::Read to get the data bits from the AsynchPP. In a sense, IInternetProtocol is how URLMON talks to the AsynchPP.

URLMON in turn implements one main interface, IInternetProtocolSink, that AsynchPPs can use to talk back to URLMON. The three methods of main concern to EtcProtocol are ReportProgress, ReportData, and ReportResult. IInternetProtocolSink::ReportProgress is used to report information about the incoming data but not that the data is available itself. The suggested or verified MediaType of the data is a good example of such a report. IInternetProtocolSink::ReportData is used to let URLMON know about data that is available for reading via IInternetProtocol::Read. Last, IInternetProtocolSink::ReportResult is used to report errors during a data request.

In some respects, IInternetProtocolSink matches up quite closely to the IBindStatusCallback (IBSC) interface that URLMON will use to talk in turn to the ultimate client of the download. ReportProgress corresponds to IBSC::OnProgress, even so far as taking the same BINDSTATUS enumeration as one of the parameters. However, OnProgress should refrain from passing any DATA notifications such as BINDSTATUS_BEGINDOWNLOADDATA; that is ReportData's job. (The only really important OnProgress notification is BINDSTATUS_MIMETYPEAVAILABLE, which we'll discuss later.) ReportData matches closely with the IBSC::OnDataAvailable method, and passes the same BSCF flags. Last, ReportResult is like OnStopBinding - it reports an HRESULT and string to URLMON whenever something goes wrong inside. Usually, the string you pass won't ever see the light of day, but it is - as always in programming - good practice to return reasonable error codes. This illustration is intended as an analogy only, but it should be easily visible how calls from the AsynchPP on IInternetProtocolSink are very likely to have a direct effect on URLMON's calls to a client's IBindStatusCallback.

In our simple example, we only have two main C++ classes - CEtcPlugProt and CBindStatusCallback2. CEtcPlugProt implements the interfaces discussed above. CBindStatusCallback2 is used to download data for the protocol.

The Download & CBindStatusCallback2

The interfaces in the last section describe pretty concisely one side of the coin for AsynchPPs -- how to talk to URLMON. The other side of the coin is how the AsynchPP gets its data. When it comes the AsynchPP's method for getting the information it needs, there are no requirements or specified interfaces. AsynchPPs are allowed to retrieve their data however they want. That is why they are so flexible. In fact, AsynchPPs don't even have to download data. Some AsynchPPs, such as the Internet Explorer 4 "mailto://" protocol, are utility protocols not data-readers. "mailto://" invokes the registered mail application with the specified arguments and then tells URLMON that it is done without ever reporting any data available.

Given, though, that most protocols will need to do some sort of data transfer, and that most data transfer in an Internet situation will be unpredictable or slow, AsynchPPs should get their data asynchronously. That's why they're called Asynchronous Pluggable Protocols after all. With asynchronous behavior, there comes a set of expected capabilities that allow the protocol to work effectively in a browser. Four demonstrated in EtcProtocol are abort functionality, progress notifications, data size, and failure notification.

Be very careful when looking at how EtcProtocol retrieves its data. EtcProtocol is intended to be a simple demonstration so it uses a simple data retrieval process. That process is well encapsulated in the CBindStatusCallback class already implemented in ATL, which uses a URL Moniker based download and supplies an IBindStatusCallback for asynchronous notifications. Basically, it saves us from most of the ugly, hard to read, voluminous code necessary to do asynchronous communications of one type or another.

It should be immediately apparent that EtcProtocol, which URLMON is asking to retrieve data for it, is turning right around and asking URLMON to get the data for it itself. A true AsynchPP should instead implement the data retrieval more directly. In most protocol cases, URLMON couldn't do the work anyway; a "sql://" protocol would have to use the OLE DB or ADO calls instead of a URL Moniker download. For things supported in WinInet, direct WinInet calls would be superior. The goal in EtcProtocol, though, was to support a wide range of methods for getting HTML files easily. To reinvent the wheel, so to speak, just wasn't necessary. So for EtcProtocol we probably want a URL Moniker, which supports just about everything under the sun now (and even other pluggable protocols!)

ATL's CBindStatusCallback is exactly what we want - an encapsulated class that gets us the data we want asynchronously and stays out of our hair. Unfortunately, it's too encapsulated and only informs us about data availability and nothing else. So EtcProtocol instead uses a derived class, CBindStatusCallback2, to make some "extensions" so to speak to the default class. CBindStatusCallback2 relies on a handful of call-up functions where it calls up to the using class (CEtcPlugProt) to report various download happenings. The call-up mechanism was chosen because most of these call-up functions would need access to the data members of the using class. While it seriously breaks the encapsulation of the CBindStatusCallback2 class, it was quick and easy and isn't all that difficult to comply with.

Also note that EtcProtocol might be better implemented as a MIME filter. The URLMON architecture supports two other pluggable protocol types that are used in different situations. MIME filters can be used to filter all data seen by URLMON of a particular MIME type. EtcProtocol is much like this - it reads only data of type "text/html" and filters it to a different form. However, the difference is that we don't want all HTML data filtered, just a particular resource and only on demand. Refer to the Pluggable Protocol documentation in the Internet Client SDK for a further discussion of MIME filters and the other alternative, namespace handlers.

Abort Functionality: When working with URL Monikers, a client can abort a download by calling the IBinding::Abort method on the binding object that arrives through OnStartBinding. When requested by the IInternetProtocol::Abort method, EtcProtocol in turn stops its own internal download by using the CBindStatusCallback2::m_spBinding member, which holds a references to the binding object.

Progress Notifications: By passing information from the IBSC::OnProgress notification up to the CEtcPlugProt::OnProgress function, EtcProtocol can in turn selectively pass some information on to URLMON about the download data as it is arriving. As discussed already, only a very small subset of notifications should be passed on through ReportProgress, and only one is really important.

Data size: A big concern for all protocols, no matter what medium they are using for obtaining data, is a foreknowledge of the data size. In an asynchronous, slow-transfer situation it is very considerate to give the user some sort of progress information. In order to know how far the download has progressed at a certain point, URLMON needs to know how far is left to go.

In HTTP situations, this information is easily available. Most decent web servers will use the standard HTTP CONTENT-LENGTH header to indicate the proper size of the data. URLMON reads this header and passes the info on to IBSC::OnProgress during DATA notifications. EtcProtocol grabs this information and remembers it. We don't rely on this to know when we're done processing the data, but it is useful to keep track of for debugging purposes.

A complication in EtcProtocol is that we know the size of the incoming data, but there is no way to know the exact size of the processed data we'll pass to URLMON. We handle this by guessing and then outright lying if our first guess was wrong. The progress indicator isn't quite correct, but at least it moves.

Failure Notification: CBindStatusCallback2::OnStopBinding calls over to CEtcPlugProt::OnBindingFailure whenever an error occurs during the data download. This translates pretty well into an IInternetProtocolSink::ReportResult call.

Threading Possibilities

Everyone knows how big a pain it would be if browsers locked up until all the data on a page is available. Given the current Internet Explorer 4 architecture, all the blame for such a tragedy would clearly lie with the AsynchPP, the little guy doing the download. Fortunately, the asynchronous architecture of AsynchPPs, all the complication you've just been reading about in these pages, allows for all the necessary user intervention during a lengthy download.

To further improve matters, AsynchPPs with lengthy download times should usually spawn a new thread for their download. This thread would live just for the lifetime of the download and then go away. The IInternetProtocol interfaces facilitate inter-thread communication through the IInternetProtocolSink::Switch and IInternetProtocol::Continue methods. When a worker download thread needs to send data to the main apartment UI thread, the worker thread can pack the data into the PROTOCOLDATA structure and call Switch. URLMON will handle the inter-thread communication necessary to get that PROTOCOLDATA structure and its associated bits back to the AsynchPP on the UI thread through a call on IInternetProtocol::Continue.

EtcProtocol briefly demonstrates how this works in its Start method. When URLMON requests asynchronous behavior, EtcProtocol doesn't really comply. It fully parses the URL and prepares itself for an eventual download. But before binding, it calls Switch and waits for a callback on Continue before going through with the bind.

MIME Types: It Could Have Been Simpler

After all of that discussion, it is probably inappropriate to say that there was an easier way to filter the HTML data. If the ultimate goal of EtcProtocol was to take "text/html" data and make it show up in the browser just as source text, EtcProtocol could play some MIME games. Internet Explorer 4 is capable of showing "text/plain" documents that don't have HTML tags, just like any standard text viewer. If there was just some way to convince URLMON that the HTML data that EtcProtocol is downloading from the wire is actually in fact, just plain text, then this would save EtcProtocol the effort of parsing through that data to hide the tags.

At first guess, it might seem that all that is necessary is for the Web server to send a CONTENT-TYPE header of "text/plain" instead of "text/html". This is a good thought but Internet Explorer 4 is too smart for that kind of trick. In fact, Internet Explorer will actually sniff suspect data to second-guess what it is actually receiving. It does this to try to clear up confusions with web servers who aren't smart enough to report the correct data type of the documents they are sending. If Internet Explorer receives a "text/plain" document that has the "<HTML>" tag, it will likely reinterpret the MIME type to be "text/html". So the content-header idea won't really work.

AsynchPPs have a second recourse - they can call ReportProgress with BINDSTATUS_MIMETYPEAVAILABLE and suggest their version of the story. Nevertheless, the results will be the same. URLMON sniffs the data after it has been passed from the pluggable protocol, even if the protocol suggested a MIME type. So BINDSTATUS_MIMETYPEAVAILABLE isn't the answer either.

But this is close. In fact, there is a special BINDSTATUS value just for this situation: BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE. URLMON turns off data sniffing when it receives this status notification. BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE indicates that the pluggable protocol handler has checked and sniffed the data itself and is convinced that the MIME type it reports is correct. In fact, this can be tested. If EtcProtocol calls ReportProgress for BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE and reports "text/plain", Internet Explorer will show the data as plain text. The parsing code could be removed and EtcProtocol would still function like a "view source" command of sorts.

EtcProtocol is coded as it is, though, to demonstrate how a protocol can parse and manipulate data is receives as well as how it can store that data in an IStream object. Also, the handful of changes that CEtcPlugProt::OnData makes could easily be expanded to a whole host of features such as color coding of tags and script. This wouldn't show up in plain text.

Additional query words: APP AsynchPP


Keywords          : kbfile kbIE400 kbVC500 
Version           : WINDOWS:2.0,2.1,4.0,4.01,4.01 SP1
Platform          : WINDOWS 
Issue type        : 

Last Reviewed: July 1, 1999