EtcProtocol Demonstrates Pluggable Protocol Handler
ID: Q180367
|
The information in this article applies to:
-
Microsoft Internet Explorer (Programming) versions 4.0, 4.01, 4.01 SP1
-
The Microsoft Active Template Library (ATL), versions 2.0, 2.1, used with:
-
Microsoft Visual C++, 32-bit Editions, versions 5.0, 5.0sp3
SUMMARY
EtcProtocol is a working example of an Asynchronous Pluggable Protocol
("AsynchPP" for short). It parses a URL string of a particular form,
retrieves the data for the requested resource, and then uses the correct
AsynchPP interfaces to pass that data on to URLMON. In this example,
EtcProtocol simply retrieves "text/html" data using another URL Moniker
download and then briefly filters the HTML characters before passing them
on. All of the "<" and ">" characters are HTML encoded to ">" and
"<," respectively, which causes an HTML page to display as raw source in
the browser rather than a fully rendered page.
EtcProtocol is a C++ sample that uses ATL to implement the COM server
functionality. It has been built and tested for Visual C++ 5.0.
MORE INFORMATION
The following file is available for download from the Microsoft
Software Library:
EtcProt.exe
NOTE: This sample is intended for demonstration purposes only and is not
the ultimate implementation of an Asynchronous Pluggable Protocol. In all
cases, efficiency and robustness were sacrificed in the name of simplicity
and clarity. Implementations based on EtcProtocol should make an effort to
improve on the basic design. To further this end, the code for EtcProtocol
has been marked in several places with // TODO: comments that suggest code
that could make the protocol safer, more complete, or more efficient.
Registration
The registry entries necessary for a pluggable protocol are straightforward
and covered well in the main documentation for Pluggable Protocols in the
Internet Client SDK. All that is required to use EtcProtocol is that
EtcProtocol.dll is present on the system and then regsvr32 EtcProtocol.dll
is run. EtcProtocol uses ATL's simple registration facilities via the .rgs
file; open it up and take a peek.
The Basics
Pluggable Protocols can best be thought of as a new layer of abstraction in
Internet Explorer 4.0's (Internet Explorer 4) process of retrieving data for rendering in
the browser. URL Monikers, a COM-based download technology that introduced
a new implementation of the well known OLE interface IMoniker, were first
seen with the release of Internet Explorer 3.0. URL Monikers encapsulate the data
retrieval process so that data pulled from a variety of different sources
and protocols can be retrieved in a generic fashion: ala
IMoniker::BindToStorage or IMoniker::BindToObject. Even though the Win32
Internet API (WinInet) are remarkably easy to use, the functions needed to
retrieve FTP files are very different from the functions needed for HTTP
files or especially local files. With URL Moniker downloads, the only thing
that changes is the URL string - "ftp://file" or "http://file" or
"file://file" - the rest is common code.
Internet Explorer 4 adds to this picture by abstracting out the piece of URLMON (the URL
Moniker module) that actually retrieves the data. Because there might be
many more ways to retrieve browsable documents than directly from an HTTP
server, FTP server, or Gopher server, it is now possible to supplement or
replace the URLMON modules with COM objects that serve as an intermediary
between URLMON and the data source, whatever it might be. The pieces of
this new puzzle are called Pluggable Protocol Handlers and they are basic
COM objects that implement and use a basic set of documented interfaces.
For example, URLMON implements the "http://" protocol as an internal COM
object that uses WinInet to retrieve HTTP data from the wire and pass it on
to the main implementation layer that abstracts the data into a single
BindToStorage or BindToObject call. All told, Internet Explorer 4 ships with 10 AsynchPPs,
some which are quite useful such as "res://" and "mailto://".
When AsynchPPs are thought of as simple middlemen, their responsibilities
are very clear. As input, the AsynchPP must be able to parse and understand
a particular URL. AsynchPPs are registered according to the protocol they
implement. As per the URL spec, the "protocol" for data retrieval is
specified by the prefix before the colon on the URL string, as in
"protocol://resource". So when a particular protocol is asked for, URLMON
will pass the whole URL string in to the AsynchPP and expect it to
understand the rest of the syntax.
As output, AsynchPPs are expected to produce a stream of raw data bytes
that can be read in much the same manner as the IStream interface can be
read. How the AsynchPP gets those bytes is its business. The whole point of
having AsynchPPs is to allow for the possibility of new protocol forms
other than the standard HTTP. Because the AsynchPP's requirements are so
simple, anything is possible. Ideas include a "SQL://" protocol handler
which could possibly retrieve data from a SQL table and format it into HTML
output or even a "my http://" protocol handler which defines a new
application level protocol over TCP/IP for the transfer of documents or
data back and forth.
Note that Pluggable Protocols actually have been around since Internet
Explorer 3. Take a look at the DevStudio 5.0 Help system, and you'll notice
a working protocol handler. However, the architecture is now much more
complete and allows for a greater degree of flexibility, particularly with
the protocol string format (no more "mk:@progid://".)
The Interfaces
The interfaces for an AsynchPP are fairly simple and well documented in the
Internet Client SDK. An AsynchPP implements mainly one interface,
IInternetProtocol, with an optional second interface,
IInternetProtocolInfo.
IInternetProtocol is actually derived from another interface,
IInternetProtocolRoot, but currently almost all AsynchPPs will want to
implement the full meal. The main methods on this interface are Start and
Read. URLMON uses IInternetProtocol::Start to ask the AsynchPP to begin
obtaining the data. Later, once the data is available URLMON will use
IInternetProtocol::Read to get the data bits from the AsynchPP. In a sense,
IInternetProtocol is how URLMON talks to the AsynchPP.
URLMON in turn implements one main interface, IInternetProtocolSink, that
AsynchPPs can use to talk back to URLMON. The three methods of main concern
to EtcProtocol are ReportProgress, ReportData, and ReportResult.
IInternetProtocolSink::ReportProgress is used to report information about
the incoming data but not that the data is available itself. The suggested
or verified MediaType of the data is a good example of such a report.
IInternetProtocolSink::ReportData is used to let URLMON know about data
that is available for reading via IInternetProtocol::Read. Last,
IInternetProtocolSink::ReportResult is used to report errors during a data
request.
In some respects, IInternetProtocolSink matches up quite closely to the
IBindStatusCallback (IBSC) interface that URLMON will use to talk in turn
to the ultimate client of the download. ReportProgress corresponds to
IBSC::OnProgress, even so far as taking the same BINDSTATUS enumeration as
one of the parameters. However, OnProgress should refrain from passing any
DATA notifications such as BINDSTATUS_BEGINDOWNLOADDATA; that is
ReportData's job. (The only really important OnProgress notification is
BINDSTATUS_MIMETYPEAVAILABLE, which we'll discuss later.) ReportData
matches closely with the IBSC::OnDataAvailable method, and passes the same
BSCF flags. Last, ReportResult is like OnStopBinding - it reports an
HRESULT and string to URLMON whenever something goes wrong inside. Usually,
the string you pass won't ever see the light of day, but it is - as always
in programming - good practice to return reasonable error codes. This
illustration is intended as an analogy only, but it should be easily
visible how calls from the AsynchPP on IInternetProtocolSink are very
likely to have a direct effect on URLMON's calls to a client's
IBindStatusCallback.
In our simple example, we only have two main C++ classes - CEtcPlugProt and
CBindStatusCallback2. CEtcPlugProt implements the interfaces discussed
above. CBindStatusCallback2 is used to download data for the protocol.
The Download & CBindStatusCallback2
The interfaces in the last section describe pretty concisely one side of
the coin for AsynchPPs -- how to talk to URLMON. The other side of the coin
is how the AsynchPP gets its data. When it comes the AsynchPP's method for
getting the information it needs, there are no requirements or specified
interfaces. AsynchPPs are allowed to retrieve their data however they want.
That is why they are so flexible. In fact, AsynchPPs don't even have to
download data. Some AsynchPPs, such as the Internet Explorer 4 "mailto://" protocol, are
utility protocols not data-readers. "mailto://" invokes the registered mail
application with the specified arguments and then tells URLMON that it is
done without ever reporting any data available.
Given, though, that most protocols will need to do some sort of data
transfer, and that most data transfer in an Internet situation will be
unpredictable or slow, AsynchPPs should get their data asynchronously.
That's why they're called Asynchronous Pluggable Protocols after all. With
asynchronous behavior, there comes a set of expected capabilities that
allow the protocol to work effectively in a browser. Four demonstrated in
EtcProtocol are abort functionality, progress notifications, data size, and
failure notification.
Be very careful when looking at how EtcProtocol retrieves its data.
EtcProtocol is intended to be a simple demonstration so it uses a simple
data retrieval process. That process is well encapsulated in the
CBindStatusCallback class already implemented in ATL, which uses a URL
Moniker based download and supplies an IBindStatusCallback for asynchronous
notifications. Basically, it saves us from most of the ugly, hard to read,
voluminous code necessary to do asynchronous communications of one type or
another.
It should be immediately apparent that EtcProtocol, which URLMON is asking
to retrieve data for it, is turning right around and asking URLMON to get
the data for it itself. A true AsynchPP should instead implement the data
retrieval more directly. In most protocol cases, URLMON couldn't do the
work anyway; a "sql://" protocol would have to use the OLE DB or ADO calls
instead of a URL Moniker download. For things supported in WinInet, direct
WinInet calls would be superior. The goal in EtcProtocol, though, was to
support a wide range of methods for getting HTML files easily. To reinvent
the wheel, so to speak, just wasn't necessary. So for EtcProtocol we
probably want a URL Moniker, which supports just about everything under the
sun now (and even other pluggable protocols!)
ATL's CBindStatusCallback is exactly what we want - an encapsulated class
that gets us the data we want asynchronously and stays out of our hair.
Unfortunately, it's too encapsulated and only informs us about data
availability and nothing else. So EtcProtocol instead uses a derived class,
CBindStatusCallback2, to make some "extensions" so to speak to the default
class. CBindStatusCallback2 relies on a handful of call-up functions where
it calls up to the using class (CEtcPlugProt) to report various download
happenings. The call-up mechanism was chosen because most of these call-up
functions would need access to the data members of the using class. While
it seriously breaks the encapsulation of the CBindStatusCallback2 class, it
was quick and easy and isn't all that difficult to comply with.
Also note that EtcProtocol might be better implemented as a MIME filter.
The URLMON architecture supports two other pluggable protocol types that
are used in different situations. MIME filters can be used to filter all
data seen by URLMON of a particular MIME type. EtcProtocol is much like
this - it reads only data of type "text/html" and filters it to a different
form. However, the difference is that we don't want all HTML data filtered,
just a particular resource and only on demand. Refer to the Pluggable
Protocol documentation in the Internet Client SDK for a further discussion
of MIME filters and the other alternative, namespace handlers.
Abort Functionality: When working with URL Monikers, a client can abort a
download by calling the IBinding::Abort method on the binding object that
arrives through OnStartBinding. When requested by the
IInternetProtocol::Abort method, EtcProtocol in turn stops its own internal
download by using the CBindStatusCallback2::m_spBinding member, which holds
a references to the binding object.
Progress Notifications: By passing information from the IBSC::OnProgress
notification up to the CEtcPlugProt::OnProgress function, EtcProtocol can
in turn selectively pass some information on to URLMON about the download
data as it is arriving. As discussed already, only a very small subset of
notifications should be passed on through ReportProgress, and only one is
really important.
Data size: A big concern for all protocols, no matter what medium they are
using for obtaining data, is a foreknowledge of the data size. In an
asynchronous, slow-transfer situation it is very considerate to give the
user some sort of progress information. In order to know how far the
download has progressed at a certain point, URLMON needs to know how far is
left to go.
In HTTP situations, this information is easily available. Most decent web
servers will use the standard HTTP CONTENT-LENGTH header to indicate the
proper size of the data. URLMON reads this header and passes the info on to
IBSC::OnProgress during DATA notifications. EtcProtocol grabs this
information and remembers it. We don't rely on this to know when we're done
processing the data, but it is useful to keep track of for debugging
purposes.
A complication in EtcProtocol is that we know the size of the incoming
data, but there is no way to know the exact size of the processed data
we'll pass to URLMON. We handle this by guessing and then outright
lying if our first guess was wrong. The progress indicator isn't quite
correct, but at least it moves.
Failure Notification: CBindStatusCallback2::OnStopBinding calls over to
CEtcPlugProt::OnBindingFailure whenever an error occurs during the data
download. This translates pretty well into an
IInternetProtocolSink::ReportResult call.
Threading Possibilities
Everyone knows how big a pain it would be if browsers locked up until all
the data on a page is available. Given the current Internet Explorer 4 architecture, all
the blame for such a tragedy would clearly lie with the AsynchPP, the
little guy doing the download. Fortunately, the asynchronous architecture
of AsynchPPs, all the complication you've just been reading about in these
pages, allows for all the necessary user intervention during a lengthy
download.
To further improve matters, AsynchPPs with lengthy download times should
usually spawn a new thread for their download. This thread would live just
for the lifetime of the download and then go away. The IInternetProtocol
interfaces facilitate inter-thread communication through the
IInternetProtocolSink::Switch and IInternetProtocol::Continue methods. When
a worker download thread needs to send data to the main apartment UI
thread, the worker thread can pack the data into the PROTOCOLDATA structure
and call Switch. URLMON will handle the inter-thread communication
necessary to get that PROTOCOLDATA structure and its associated bits back
to the AsynchPP on the UI thread through a call on
IInternetProtocol::Continue.
EtcProtocol briefly demonstrates how this works in its Start method. When
URLMON requests asynchronous behavior, EtcProtocol doesn't really comply.
It fully parses the URL and prepares itself for an eventual download. But
before binding, it calls Switch and waits for a callback on Continue before
going through with the bind.
MIME Types: It Could Have Been Simpler
After all of that discussion, it is probably inappropriate to say that
there was an easier way to filter the HTML data. If the ultimate goal of
EtcProtocol was to take "text/html" data and make it show up in the browser
just as source text, EtcProtocol could play some MIME games. Internet Explorer 4 is capable
of showing "text/plain" documents that don't have HTML tags, just like any
standard text viewer. If there was just some way to convince URLMON that
the HTML data that EtcProtocol is downloading from the wire is actually in
fact, just plain text, then this would save EtcProtocol the effort of
parsing through that data to hide the tags.
At first guess, it might seem that all that is necessary is for the Web
server to send a CONTENT-TYPE header of "text/plain" instead of
"text/html". This is a good thought but Internet Explorer 4 is too smart for that kind of
trick. In fact, Internet Explorer will actually sniff suspect data to second-guess what it
is actually receiving. It does this to try to clear up confusions with web
servers who aren't smart enough to report the correct data type of the
documents they are sending. If Internet Explorer receives a "text/plain" document that has
the "<HTML>" tag, it will likely reinterpret the MIME type to be
"text/html". So the content-header idea won't really work.
AsynchPPs have a second recourse - they can call ReportProgress with
BINDSTATUS_MIMETYPEAVAILABLE and suggest their version of the story.
Nevertheless, the results will be the same. URLMON sniffs the data after it
has been passed from the pluggable protocol, even if the protocol suggested
a MIME type. So BINDSTATUS_MIMETYPEAVAILABLE isn't the answer either.
But this is close. In fact, there is a special BINDSTATUS value just for
this situation: BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE. URLMON turns off data
sniffing when it receives this status notification.
BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE indicates that the pluggable protocol
handler has checked and sniffed the data itself and is convinced that the
MIME type it reports is correct. In fact, this can be tested. If
EtcProtocol calls ReportProgress for BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE
and reports "text/plain", Internet Explorer will show the data as plain text. The parsing
code could be removed and EtcProtocol would still function like a "view
source" command of sorts.
EtcProtocol is coded as it is, though, to demonstrate how a protocol can
parse and manipulate data is receives as well as how it can store that data
in an IStream object. Also, the handful of changes that
CEtcPlugProt::OnData makes could easily be expanded to a whole host of
features such as color coding of tags and script. This wouldn't show up in
plain text.
Additional query words:
APP AsynchPP
Keywords : kbfile kbIE400 kbVC500
Version : WINDOWS:2.0,2.1,4.0,4.01,4.01 SP1
Platform : WINDOWS
Issue type :
Last Reviewed: July 1, 1999