2110
PROPOSED STANDARD

MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML) (Obsoleted)

Authors: J. Palme, A. Hopmann
Date: March 1997
Area: app
Working Group: mhtml
Stream: IETF
Obsoleted by: RFC 2557

Abstract

This document describes a set of guidelines that will allow conforming mail user agents to be able to send, deliver and display these objects, such as HTML objects, that can contain links represented by URIs. [STANDARDS-TRACK]

RFC 2110: MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML) [RFC Home] [TEXT|PDF|HTML] [Tracker] [IPR] [Info page]

Obsoleted by: 2557 PROPOSED STANDARD
Network Working Group                                          J. Palme
Request for Comments: 2110                     Stockholm University/KTH
Category: Standards Track                                    A. Hopmann
                                                  Microsoft Corporation
                                                             March 1997


 <span class="h1">MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML)</span>

Status of this Document

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

   Although HTML [<a href="./rfc1866">RFC 1866</a>] was designed within the context of MIME,
   more than the specification of HTML as defined in <a href="./rfc1866">RFC 1866</a> is needed
   for two electronic mail user agents to be able to interoperate using
   HTML as a document format. These issues include the naming of objects
   that are normally referred to by URIs, and the means of aggregating
   objects that go together. This document describes a set of guidelines
   that will allow conforming mail user agents to be able to send,
   deliver and display these objects, such as HTML objects, that can
   contain links represented by URIs. In order to be able to handle
   inter-linked objects, the document uses the MIME type
   multipart/related and specifies the MIME content-headers "Content-
   Location" and "Content-Base".

Table of Contents

   <a href="#section-1">1</a>. Introduction..............................................  <a href="#page-2">2</a>
   <a href="#section-2">2</a>. Terminology...............................................  <a href="#page-3">3</a>
      <a href="#section-2.1">2.1</a> Conformance requirement terminology...................  <a href="#page-3">3</a>
      <a href="#section-2.2">2.2</a> Other terminology.....................................  <a href="#page-4">4</a>
   <a href="#section-3">3</a>. Overview..................................................  <a href="#page-5">5</a>
   4. The Content-Location and Content-Base MIME Content Headers  6
      <a href="#section-4.1">4.1</a> MIME content headers..................................  <a href="#page-6">6</a>
      <a href="#section-4.2">4.2</a> The Content-Base header...............................  <a href="#page-7">7</a>
      <a href="#section-4.3">4.3</a> The Content-Location Header...........................  <a href="#page-7">7</a>
      <a href="#section-4.4">4.4</a> Encoding of URIs in e-mail headers....................  <a href="#page-8">8</a>
   <a href="#section-5">5</a>. Base URIs for resolution of relative URIs.................  <a href="#page-8">8</a>
   <a href="#section-6">6</a>. Sending documents without linked objects..................  <a href="#page-9">9</a>
   <a href="#section-7">7</a>. Use of the Content-Type: Multipart/related................  <a href="#page-9">9</a>
   <a href="#section-8">8</a>. Format of Links to Other Body Parts....................... <a href="#page-11">11</a>



<span class="grey">Palme & Hopmann             Standards Track                     [Page 1]</span>

<span id="page-2" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


      <a href="#section-8.1">8.1</a> General principle..................................... <a href="#page-11">11</a>
      <a href="#section-8.2">8.2</a> Use of the Content-Location header.................... <a href="#page-11">11</a>
      <a href="#section-8.3">8.3</a> Use of the Content-ID header and CID URLs............. <a href="#page-12">12</a>
   <a href="#section-9">9</a> Examples................................................... <a href="#page-12">12</a>
      9.1 Example of a HTML body without included linked objects 12
      9.2 Example with absolute URIs to an embedded GIF picture  13
      9.3 Example with relative URIs to an embedded GIF picture  13
      9.4 Example using CID URL and Content-ID header to an
          embedded GIF picture.................................. <a href="#page-14">14</a>
   <a href="#section-10">10</a>. Content-Disposition header............................... <a href="#page-15">15</a>
   <a href="#section-11">11</a>. Character encoding issues and end-of-line issues......... <a href="#page-15">15</a>
   <a href="#section-12">12</a>. Security Considerations.................................. <a href="#page-16">16</a>
   <a href="#section-13">13</a>. Acknowledgments.......................................... <a href="#page-17">17</a>
   <a href="#section-14">14</a>. References............................................... <a href="#page-18">18</a>
   <a href="#section-15">15</a>. Author's Address......................................... <a href="#page-19">19</a>

Mailing List Information

   Further discussion on this document should be done through the
   mailing list [email protected].

   To subscribe to this list, send a message to
      [email protected]
   which contains the text
   SUB MHTML <your name (not your e-mail address)>

   Archives of this list are available by anonymous ftp from
      <a href="FTP://SEGATE.SUNET.SE/lists/mHTML/The">FTP://SEGATE.SUNET.SE/lists/mHTML/</a>
   <a href="FTP://SEGATE.SUNET.SE/lists/mHTML/The">The</a> archives are also available by e-mail. Send a message to
   [email protected] with the text "INDEX MHTML" to get a list
   of the archive files, and then a new message "GET <file name>" to
   retrieve the archive files.

   Comments on less important details may also be sent to the editor,
   Jacob Palme <jpalme@dsv.su.se>.

   More information may also be available at URL:
   <a href="HTTP://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.HTML">HTTP://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.HTML</a>

<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>

   There are a number of document formats, HTML [<a href="#ref-HTML2" title=""Hypertext Markup Language - 2.0"">HTML2</a>], PDF [<a href="#ref-PDF" title=""Portable Document Format Reference Manual, Version 1.1"">PDF</a>] and
   VRML for example, which provide links using URIs for their
   resolution. There is an obvious need to be able to send documents in
   these formats in e-mail [RFC821=SMTP, <a href="./rfc822">RFC822</a>]. This document gives
   additional specifications on how to send such documents in MIME [RFC
   1521=MIME1] e-mail messages. This version of this standard was based
   on full consideration only of the needs for objects with links in the



<span class="grey">Palme & Hopmann             Standards Track                     [Page 2]</span>

<span id="page-3" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   Text/HTML media type (as defined in <a href="./rfc1866">RFC 1866</a> [<a href="#ref-HTML2" title=""Hypertext Markup Language - 2.0"">HTML2</a>]), but the
   standard may still be applicable also to other formats for sets of
   interlinked objects, linked by URIs. There is no conformance
   requirement that implementations claiming conformance to this
   standard are able to handle URI-s in other document formats than
   HTML.

   URIs in documents in HTML and other similar formats reference other
   objects and resources, either embedded or directly accessible through
   hypertext links. When mailing such a document, it is often desirable
   to also mail all of the additional resources that are referenced in
   it; those elements are necessary for the complete interpretation of
   the primary object.

   An alternative way for sending an HTML document or other object
   containing URIs in e-mail is to only send the URL, and let the
   recipient look up the document using HTTP. That method is described
   in [<a href="#ref-URLBODY" title=""Definition of the URL MIME External-Body Access-Type"">URLBODY</a>] and is not described in this document.

   An informational RFC will at a later time be published as a
   supplement to this standard. The informational RFC will discuss
   implementation methods and some implementation problems. Implementors
   are recommended to read this informational RFC when developing
   implementations of the MHTML standard. This informational RFC is,
   when this RFC is published, still in IETF draft status, and will stay
   that way for at least six months in order to gain more implementation
   experience before it is published.

<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Terminology</span>

<span class="h3"><a class="selflink" id="section-2.1" href="#section-2.1">2.1</a> Conformance requirement terminology</span>

   This specification uses the same words as <a href="./rfc1123">RFC 1123</a> [<a href="#ref-HOSTS" title=""Requirements for Internet Hosts -- Application and Support"">HOSTS</a>] for
   defining the significance of each particular requirement. These words
   are:

   MUST    This word or the adjective "required" means that the item is
           an absolute requirement of the specification.

   SHOULD  This word or the adjective "recommended" means that there may
           exist valid reasons in particular circumstances to ignore this
           item, but the full implications should be understood and the
           case carefully weighed before choosing a different course.








<span class="grey">Palme & Hopmann             Standards Track                     [Page 3]</span>

<span id="page-4" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   MAY     This word or the adjective "optional" means that this item is
           truly optional. One vendor may choose to include the item
           because a particular marketplace requires it or because it
           enhances the product, for example; another vendor may omit
           the same item.

   An implementation is not compliant if it fails to satisfy one or more
   of the MUST requirements for the protocols it implements. An
   implementation that satisfies all the MUST and all the SHOULD
   requirements for its protocols is said to be "unconditionally
   compliant"; one that satisfies all the MUST requirements but not all
   the SHOULD requirements for its protocols is said to be
   "conditionally compliant."

<span class="h3"><a class="selflink" id="section-2.2" href="#section-2.2">2.2</a> Other terminology</span>

   Most of the terms used in this document are defined in other RFCs.

   Absolute URI,         See <a href="./rfc1808">RFC 1808</a> [<a href="#ref-RELURL" title=""Relative Uniform Resource Locators"">RELURL</a>].
   AbsoluteURI

   CID                   See [<a href="#ref-MIDCID" title=""Content-ID and Message-ID Uniform Resource Locators"">MIDCID</a>].

   Content-Base          See <a href="#section-4.2">section 4.2</a> below.

   Content-ID            See [<a href="#ref-MIDCID" title=""Content-ID and Message-ID Uniform Resource Locators"">MIDCID</a>].

   Content-Location      MIME message or content part header with the
                         URI of the MIME message or content part body,
                         defined in <a href="#section-4.3">section 4.3</a> below.

   Content-Transfer-Enco Conversion of a text into 7-bit octets as
   ding                  specified in [<a href="#ref-MIME1" title=""MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies"">MIME1</a>].

   CR                    See [<a href="./rfc822" title=""Standard for the format of ARPA Internet text messages."">RFC822</a>].

   CRLF                  See [<a href="./rfc822" title=""Standard for the format of ARPA Internet text messages."">RFC822</a>].

   Displayed text        The text shown to the user reading a document
                         with a web browser. This may be different from
                         the HTML markup, see the definition of HTML
                         markup below.

   Header                Field in a message or content heading specifying
                         the value of one attribute.






<span class="grey">Palme & Hopmann             Standards Track                     [Page 4]</span>

<span id="page-5" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   Heading               Part of a message or content before the first
                         CRLFCRLF, containing formatted fields with
                         attributes of the message or content.

   HTML                  See <a href="./rfc1866">RFC 1866</a> [<a href="#ref-HTML2" title=""Hypertext Markup Language - 2.0"">HTML2</a>].

   HTML Aggregate        HTML objects together with some or all objects,
                         to objects which the HTML object contains
                         hyperlinks.

   HTML markup           A file containing HTML encodings as specified
                         in [HTML] which may be different from the
                         displayed text which a person using a web
                         browser sees. For example, the HTML markup
                         may contain "&lt;" where the displayed text
                         contains the character "<".

   LF                    See [<a href="./rfc822" title=""Standard for the format of ARPA Internet text messages."">RFC822</a>].

   MIC                   Message Integrity Codes, codes use to verify
                         that a  message has not been modified.

   MIME                  See <a href="./rfc1521">RFC 1521</a> [<a href="#ref-MIME1" title=""MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies"">MIME1</a>], [<a href="#ref-MIME2" title=""Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"">MIME2</a>].

   MUA                   Messaging User Agent.

   PDF                   Portable Document Format, see [<a href="#ref-PDF" title=""Portable Document Format Reference Manual, Version 1.1"">PDF</a>].

   Relative URI,         See <a href="./rfc1866">RFC 1866</a> [<a href="#ref-HTML2" title=""Hypertext Markup Language - 2.0"">HTML2</a>] and <a href="./rfc1808">RFC 1808</a>[<a href="#ref-RELURL" title=""Relative Uniform Resource Locators"">RELURL</a>].
   RelativeURI

   URI, absolute and     See <a href="./rfc1866">RFC 1866</a> [<a href="#ref-HTML2" title=""Hypertext Markup Language - 2.0"">HTML2</a>].
   relative

   URL                   See <a href="./rfc1738">RFC 1738</a> [<a href="#ref-URL" title=""Uniform Resource Locators (URL)"">URL</a>].

   URL, relative         See [<a href="#ref-RELURL" title=""Relative Uniform Resource Locators"">RELURL</a>].

   VRML                  Virtual Reality Markup Language.

<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. Overview</span>

   An aggregate document is a MIME-encoded message that contains a root
   document as well as other data that is required in order to represent
   that document (inline pictures, style sheets, applets, etc.).
   Aggregate documents can also include additional elements that are
   linked to the first object.  It is important to keep in mind the
   differing needs of several audiences. Mail sending agents might send



<span class="grey">Palme & Hopmann             Standards Track                     [Page 5]</span>

<span id="page-6" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   aggregate documents as an encoding of normal day-to-day electronic
   mail. Mail sending agents might also send aggregate documents when a
   user wishes to mail a particular document from the web to someone
   else. Finally mail sending agents might send aggregate documents as
   automatic responders, providing access to WWW resources for non-IP
   connected clients.

   Mail receiving agents also have several differing needs. Some mail
   receiving agents might be able to receive an aggregate document and
   display it just as any other text content type would be displayed.
   Others might have to pass this aggregate document to a browsing
   program, and provisions need to be made to make this possible.

   Finally several other constraints on the problem arise. It is
   important that it be possible for a document to be signed and for it
   to be able to be transmitted to a client and displayed with a minimum
   risk of breaking the message integrity (MIC) check that is part of
   the signature.

<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. The Content-Location and Content-Base MIME Content Headers</span>

<span class="h3"><a class="selflink" id="section-4.1" href="#section-4.1">4.1</a> MIME content headers</span>

   In order to resolve URI references to other body parts, two MIME
   content headers are defined, Content-Location and Content-Base. Both
   these headers can occur in any message or content heading, and will
   then be valid within this heading and for its content.

   In practice, at present only those URIs which are URLs are used, but
   it is anticipated that other forms of URIs will in the future be
   used.

   The syntax for these headers is, using the syntax definition tools
   from [<a href="./rfc822" title=""Standard for the format of ARPA Internet text messages."">RFC822</a>]:

       content-location ::= "Content-Location:" ( absoluteURI |
                            relativeURI )

       content-base ::= "Content-Base:" absoluteURI

   where URI is at present (June 1996) restricted to the syntax for URLs
   as defined in <a href="./rfc1738">RFC 1738</a> [<a href="#ref-URL" title=""Uniform Resource Locators (URL)"">URL</a>].

   These two headers are valid only for exactly the content heading or
   message heading where they occurs and its text. They are thus not
   valid for the parts inside multipart headings, and are thus
   meaningless in multipart headings.




<span class="grey">Palme & Hopmann             Standards Track                     [Page 6]</span>

<span id="page-7" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   These two headers may occur both inside and outside of a
   multipart/related part.

<span class="h3"><a class="selflink" id="section-4.2" href="#section-4.2">4.2</a> The Content-Base header</span>

   The Content-Base gives a base for relative URIs occurring in other
   heading fields and in HTML documents which do not have any BASE
   element in its HTML code. Its value MUST be an absolute URI.

   Example showing which Content-Base is valid where:

    Content-Type: Multipart/related; boundary="boundary-example-1";
                  type=Text/HTML; start=foo2*[email protected]
     ; A Content-Base header cannot be placed here, since this is a
     ; multipart MIME object.

    --boundary-example-1

    Part 1:
    Content-Type: Text/HTML; charset=US-ASCII
    Content-ID: <foo2*[email protected]>
    Content-Location: <a href="http://www.ietf.cnir.reston.va.us/images/foo1.bar1">http://www.ietf.cnir.reston.va.us/images/foo1.bar1</a>
    ;  This Content-Location must contain an absolute URI, since no base
    ;  is valid here.

    --boundary-example-1

    Part 2:
    Content-Type: Text/HTML; charset=US-ASCII
    Content-ID: <foo4*[email protected]>
    Content-Location: foo1.bar1   ; The Content-Base below applies to
                                  ; this relative URI
    Content-Base: <a href="http://www.ietf.cnri.reston.va.us/images/">http://www.ietf.cnri.reston.va.us/images/</a>

    --boundary-example-1--

<span class="h3"><a class="selflink" id="section-4.3" href="#section-4.3">4.3</a> The Content-Location Header</span>

   The Content-Location header specifies the URI that corresponds to the
   content of the body part in whose heading the header is placed. Its
   value CAN be an absolute or relative URI. Any URI or URL scheme may
   be used, but use of non-standardized URI or URL schemes might entail
   some risk that recipients cannot handle them correctly.

   The Content-Location header can be used to indicate that the data
   sent under this heading is also retrievable, in identical format,
   through normal use of this URI. If used for this purpose, it must
   contain an absolute URI or be resolvable, through a Content-Base



<span class="grey">Palme & Hopmann             Standards Track                     [Page 7]</span>

<span id="page-8" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   header, into an absolute URI. In this case, the information sent in
   the message can be seen as a cached version of the original data.

   The header can also be used for data which is not available to some
   or all recipients of the message, for example if the header refers to
   an object which is only retrievable using this URI in a restricted
   domain, such as within a company-internal web space. The header can
   even contain a fictious URI and need in that case not be globally
   unique.

   Example:

   Content-Type: Multipart/related; boundary="boundary-example-1";
                    type=Text/HTML

      --boundary-example-1

      Part 1:
      Content-Type: Text/HTML; charset=US-ASCII

      ... ... <IMG SRC="fiction1/fiction2"> ... ...

      --boundary-example-1

      Part 2:
      Content-Type: Text/HTML; charset=US-ASCII
      Content-Location: fiction1/fiction2

      --boundary-example-1--

<span class="h3"><a class="selflink" id="section-4.4" href="#section-4.4">4.4</a> Encoding of URIs in e-mail headers</span>

   Since MIME header fields have a limited length and URIs can get quite
   long, these lines may have to be folded. If such folding is done, the
   algorithm defined in [<a href="#ref-URLBODY" title=""Definition of the URL MIME External-Body Access-Type"">URLBODY</a>] <a href="#section-3.1">section 3.1</a> should be employed.

<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Base URIs for resolution of relative URIs</span>

   Relative URIs inside contents of MIME body parts are resolved
   relative to a base URI. In order to determine this base URI, the
   first-applicable method in the following list applies.

     (a) There is a base specification inside the MIME body part
          containing the link which resolves relative URIs into absolute
          URIs. For example, HTML provides the BASE element for this.

     (b) There is a Content-Base header (as defined in <a href="#section-4.2">section 4.2</a>),
          specifying the base to be used.



<span class="grey">Palme & Hopmann             Standards Track                     [Page 8]</span>

<span id="page-9" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


     (c) There is a Content-Location header in the heading of the body
          part which can then serve as the base in the same way as the
          requested URI can serve as a base for relative URIs within a
          file retrieved via HTTP [<a href="#ref-HTTP" title="R. Fielding">HTTP</a>].

   When the methods above do not yield an absolute URI the procedure in
   <a href="#section-8.2">section 8.2</a> for matching relative URIs MUST be followed.

<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. Sending documents without linked objects</span>

   If a document, such as an HTML object, is sent without other objects,
   to which it is linked, it MAY be sent as a Text/HTML body part by
   itself.  In this case, multipart/related need not be used.

   Such a document may either not include any links, or contain links
   which the recipient resolves via ordinary net look up, or contain
   links which the recipient cannot resolve.

   Inclusion of links which the recipient has to look up through the net
   may not work for some recipients, since all e-mail recipients do not
   have full internet connectivity. Also, such links may work for the
   sender but not for the recipient, for example when the link refers to
   an URI within a company-internal network not accessible from outside
   the company.

   Note that documents with links that the recipient cannot resolve MAY
   be sent, although this is discouraged. For example, two persons
   developing a new HTML page may exchange incomplete versions.

<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. Use of the Content-Type: Multipart/related</span>

   If a message contains one or more MIME body parts containing links
   and also contains as separate body parts, data, to which these links
   (as defined, for example, in <a href="./rfc1866">RFC 1866</a> [<a href="#ref-HTML2" title=""Hypertext Markup Language - 2.0"">HTML2</a>]) refers, then this
   whole set of body parts (referring body parts and referred-to body
   parts) SHOULD be sent within a multipart/related body part as defined
   in [<a href="#ref-REL" title=""The MIME Multipart/Related Content- Type"">REL</a>].

   The root body part of the multipart/related SHOULD be the start
   object for rendering the object, such as a text/html object, and
   which contains links to objects in other body parts, or a
   multipart/alternative of which at least one alternative resolves to
   such a start object.  Implementors are warned, however, that many
   mail programs treat multipart/alternative as if it had been
   multipart/mixed (even though MIME [<a href="#ref-MIME1" title=""MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies"">MIME1</a>] requires support for
   multipart/alternative).





<span class="grey">Palme & Hopmann             Standards Track                     [Page 9]</span>

<span id="page-10" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   [<a id="ref-REL">REL</a>] requires that the type attribute of the "Content-Type:
   Multipart/related" statement be the type of the root object, and this
   value can thus be "multipart/alternative". If the root is not the
   first body part within the multipart/related, [<a href="#ref-REL" title=""The MIME Multipart/Related Content- Type"">REL</a>] further requires
   that its Content-ID MUST be given in a start parameter to the
   "Content-Type: Multipart/related" header.

   When presenting the root body part to the user, the additional body
   parts within the multipart/related can be used:

       (a) For those recipients who only have e-mail but not full
           Internet access.

       (b) For those recipients who for other reasons, such as firewalls
           or the use of company-internal links, cannot retrieve the
           linked body parts through the net.

          Note that this means that you can, via e-mail, send HTML which
           includes URIs which the recipient cannot resolve via HTTPor
           other connectivity-requiring URIs.

       (c) For items which are not available on the web.

       (d) For any recipient to speed up access.

   The type parameter of the "Content-Type: Multipart/related" MUST be
   the same as the Content-Type of its root.

   When a sending MUA sends objects which were retrieved from the WWW,
   it SHOULD maintain their WWW URIs. It SHOULD not transform these URIs
   into some other URI form prior to transmitting them. This will allow
   the receiving MUA to both verify MICs included with the email
   message, as well as verify the documents against their WWW
   counterpoints.

   In certain special cases this will not work if the original HTML
   document contains URIs as parameters to objects and applets. In such
   a case, it might be better to rewrite the document before sending it.
   This problem is discussed in more detail in the informational RFC
   which will be published as a supplement to this standard.

   This standard does not cover the case where a multipart/related
   contains links to MIME body parts outside of the current
   multipart/related or in other MIME messages, even if methods similar
   to those described in this standard are used. Implementors who
   provide such links are warned that mailers implementing this standard
   may not be able to resolve such links.




<span class="grey">Palme & Hopmann             Standards Track                    [Page 10]</span>

<span id="page-11" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   Within such a multipart/related, ALL different parts MUST have
   different Content-Location or Content-ID values.

<span class="h2"><a class="selflink" id="section-8" href="#section-8">8</a>. Format of Links to Other Body Parts</span>

<span class="h3"><a class="selflink" id="section-8.1" href="#section-8.1">8.1</a> General principle</span>

   A body part, such as a text/HTML body part, may contain hyperlinks to
   objects which are included as other body parts in the same message
   and within the same multipart/related content. Often such linked
   objects are meant to be displayed inline to the reader of the main
   document; for example, objects referenced with the IMG tag in HTML
   [RFC 1866=HTML2].  New tags with this property are proposed in the
   ongoing development of HTML (example: applet, frame).

   In order to send such messages, there is a need to indicate which
   other body parts are referred to by the links in the body parts
   containing such links. For example, a body part of Content-Type:
   Text/HTML often has links to other objects, which might be included
   in other body parts in the same MIME message. The referencing of
   other body parts is done in the following way: For each body part
   containing links and each distinct URI within it, which refers to
   data which is sent in the same MIME message, there SHOULD be a
   separate body part within the current multipart/related part of the
   message containing this data. Each such body part SHOULD contain a
   Content-Location header (see <a href="#section-8.2">section 8.2</a>) or a Content-ID header (see
   <a href="#section-8.3">section 8.3</a>).

   An e-mail system which claims conformance to this standard MUST
   support receipt of multipart/related (as defined in <a href="#section-7">section 7</a>) with
   links between body parts using both the Content-Location (as defined
   in <a href="#section-8.2">section 8.2</a>) and the Content-ID method (as defined in <a href="#section-8.3">section</a>
   <a href="#section-8.3">8.3</a>).

<span class="h3"><a class="selflink" id="section-8.2" href="#section-8.2">8.2</a> Use of the Content-Location header</span>

   If there is a Content-Base header, then the recipient MUST employ
   relative to absolute resolution as defined in <a href="./rfc1808">RFC 1808</a> [<a href="#ref-RELURL" title=""Relative Uniform Resource Locators"">RELURL</a>] of
   relative URIs in both the HTML markup and the Content-Location header
   before matching a hyperlink in the HTML markup to a Content-Location
   header. The same applies if the Content-Location contains an absolute
   URI, and the HTML markup contains a BASE element so that relative
   URIs in the HTML markup can be resolved.

   If there is NO Content-Base header, and the Content-Location header
   contains a relative URI, then NO relative to absolute resolution
   SHOULD be performed. Matching the relative URI in the Content-
   Location header to a hyperlink in an HTML markup text is in this case



<span class="grey">Palme & Hopmann             Standards Track                    [Page 11]</span>

<span id="page-12" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   a two step process. First remove any LWSP from the relative URI which
   may have been introduced as described in <a href="#section-4.4">section 4.4</a>. Then perform an
   exact textual match against the HTML URIs. For this matching process,
   ignore BASE specifications, such as the BASE element in HTML. Note
   that this only applies for matching Content-Location headers, not for
   URL-s in the HTML document which are resolved through network look up
   at read time.

   The URI in the Content-Location header need not refer to an object
   which is actually available globally for retrieval using this URI
   (after resolution of relative URIs). However, URI-s in Content-
   Location headers (if absolute, or resolvable to absolute URIs) SHOULD
   still be globally unique.

<span class="h3"><a class="selflink" id="section-8.3" href="#section-8.3">8.3</a> Use of the Content-ID header and CID URLs</span>

   When CID (Content-ID) URLs as defined in <a href="./rfc1738">RFC 1738</a> [<a href="#ref-URL" title=""Uniform Resource Locators (URL)"">URL</a>] and <a href="./rfc1873">RFC 1873</a>
   [<a href="#ref-MIDCID" title=""Content-ID and Message-ID Uniform Resource Locators"">MIDCID</a>] are used for links between body parts, the Content-Location
   statement will normally be replaced by a Content-ID header. Thus, the
   following two headers are identical in meaning:

   Content-ID: [email protected]
   Content-Location: CID: [email protected]

   Note: Content-IDs MUST be globally unique [<a href="#ref-MIME1" title=""MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies"">MIME1</a>]. It is thus not
   permitted to make them unique only within this message or within this
   multipart/related.

<span class="h2"><a class="selflink" id="section-9" href="#section-9">9</a> Examples</span>

<span class="h3"><a class="selflink" id="section-9.1" href="#section-9.1">9.1</a> Example of a HTML body without included linked objects</span>

   The first example is the simplest form of an HTML email message. This
   is not an aggregate HTML object, but simply a message with a single
   HTML body part. This message contains a hyperlink but does not
   provide the ability to resolve the hyperlink. To resolve the
   hyperlink the receiving client would need either IP access to the
   Internet, or an electronic mail web gateway.

      From: [email protected]
      To: [email protected]
      Subject: A simple example
      Mime-Version: 1.0
      Content-Type: Text/HTML; charset=US-ASCII







<span class="grey">Palme & Hopmann             Standards Track                    [Page 12]</span>

<span id="page-13" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


      <HTML>
      <head></head>
      <body>
      <h1>Hi there!</h1>
      An example of an HTML message.<p>
      Try clicking <a href="http://www.resnova.com/">here.</a><p>
      </body></HTML>

<span class="h3"><a class="selflink" id="section-9.2" href="#section-9.2">9.2</a> Example with absolute URIs to an embedded GIF picture</span>

    From: [email protected]
    To: [email protected]
    Subject: A simple example
    Mime-Version: 1.0
    Content-Type: Multipart/related; boundary="boundary-example-1";
                  type=Text/HTML; start=foo3*[email protected]

    --boundary-example-1
       Content-Type: Text/HTML;charset=US-ASCII
       Content-ID: <foo3*[email protected]>

       ... text of the HTML document, which might contain a hyperlink
       to the other body part, for example through a statement such as:
       <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif"
        ALT="IETF logo">

    --boundary-example-1
       Content-Location:
             <a href="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif">http://www.ietf.cnri.reston.va.us/images/ietflogo.gif</a>
       Content-Type: IMAGE/GIF
       Content-Transfer-Encoding: BASE64

       R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
       NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
       etc...

    --boundary-example-1--

<span class="h3"><a class="selflink" id="section-9.3" href="#section-9.3">9.3</a> Example with relative URIs to an embedded GIF picture</span>

      From: [email protected]
      To: [email protected]
      Subject: A simple example
      Mime-Version: 1.0
      Content-Base: <a href="http://www.ietf.cnri.reston.va.us">http://www.ietf.cnri.reston.va.us</a>
      Content-Type: Multipart/related; boundary="boundary-example-1";
                    type=Text/HTML




<span class="grey">Palme & Hopmann             Standards Track                    [Page 13]</span>

<span id="page-14" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


      --boundary-example-1
         Content-Type: Text/HTML; charset=ISO-8859-1
         Content-Transfer-Encoding: QUOTED-PRINTABLE

         ... text of the HTML document, which might contain a hyperlink
         to the other body part, for example through a statement such as:
         <IMG SRC="/images/ietflogo.gif" ALT="IETF logo">
         Example of a copyright sign encoded with Quoted-Printable: =A9
         Example of a copyright sign mapped onto HTML markup: &#168;

      --boundary-example-1
         Content-Location: /images/ietflogo.gif
         Content-Type: IMAGE/GIF
         Content-Transfer-Encoding: BASE64

         R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
         NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
         etc...

      --boundary-example-1--

<span class="h3"><a class="selflink" id="section-9.4" href="#section-9.4">9.4</a> Example using CID URL and Content-ID header to an embedded GIF</span>
<span class="h3">   picture</span>

      From: [email protected]
      To: [email protected]
      Subject: A simple example
      Mime-Version: 1.0
      Content-Type: Multipart/related; boundary="boundary-example-1";
                    type=Text/HTML

      --boundary-example-1
         Content-Type: Text/HTML; charset=US-ASCII

         ... text of the HTML document, which might contain a hyperlink
         to the other body part, for example through a statement such as:
         <IMG SRC="cid:foo4*[email protected]" ALT="IETF logo">

      --boundary-example-1
         Content-ID: <foo4*[email protected]>
         Content-Type: IMAGE/GIF
         Content-Transfer-Encoding: BASE64

         R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
         NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
         etc...

      --boundary-example-1--



<span class="grey">Palme & Hopmann             Standards Track                    [Page 14]</span>

<span id="page-15" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


<span class="h2"><a class="selflink" id="section-10" href="#section-10">10</a>. Content-Disposition header</span>

   Note the specification in [<a href="#ref-REL" title=""The MIME Multipart/Related Content- Type"">REL</a>] on the relations between Content-
   Disposition and multipart/related.

<span class="h2"><a class="selflink" id="section-11" href="#section-11">11</a>. Character encoding issues and end-of-line issues</span>

   For the encoding of characters in HTML documents and other text
   documents into a MIME-compatible octet stream, the following
   mechanisms are relevant:

   - HTML [<a href="#ref-HTML2" title=""Hypertext Markup Language - 2.0"">HTML2</a>, <a href="#ref-HTML-I18N" title=""Internationalization of the Hypertext Markup Language"">HTML-I18N</a>] as an application of SGML [<a href="#ref-SGML">SGML</a>] allows
     characters to be denoted by character entities as well as by numeric
     character references (e.g. "Latin small letter a with acute accent"
     may be represented by "&aacute;" or "&#225;") in the HTML markup.

   - HTML documents, in common with other documents of the MIME
     "Content-Type  text", can be represented in MIME using one of
     several character encodings. The MIME Content-Type "charset"
     parameter value indicates the particular encoding used. For the
     exact meaning and use of the "charset" parameter, please see
     [MIME-IMB <a href="#section-4.2">section 4.2</a>].

      Note that the "charset" parameter refers only to the MIME
      character encoding. For example, the string "&aacute;" can be sent
      in MIME with "charset=US-ASCII", while the raw character "Latin
      small letter a with acute accent" cannot.

   The above mechanisms are well defined and documented, and therefore
   not further explained here. In sending a message, all the above
   mentioned mechanisms MAY be used, and any mixture of them MAY occur
   when sending the document via e-mail. Receiving mail user agents
   (together with any Web browser they may use to display the document)
   MUST be capable of handling any combinations of these mechanisms.

   Also note that:

   - Any documents including HTML documents that contain octet values
     outside the 7-bit range need a content-transfer-encoding applied
     before transmission over certain transport protocols
     [MIME1, chapter 5].

   - The MIME standard [<a href="#ref-MIME1" title=""MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies"">MIME1</a>] requires that documents of "Content-Type:
     Text MUST be in canonical form before Content-Transfer-Encoding,
     i.e. that line breaks are encoded as CRLFs, not as bare CRs or bare
     LFs or something else. This is in contrast to [<a href="#ref-HTTP" title="R. Fielding">HTTP</a>] where <a href="#section-3.6.1">section</a>
     <a href="#section-3.6.1">3.6.1</a> allows other representations of line breaks.




<span class="grey">Palme & Hopmann             Standards Track                    [Page 15]</span>

<span id="page-16" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   Note that this might cause problems with integrity checks based on
   checksums, which might not be preserved when moving a document from
   the HTTP to the MIME environment. If a document has to be converted
   in such a way that a checksum integrity check becomes invalid, then
   this integrity check header SHOULD be removed from the document.

   Other sources of problems are Content-Encoding used in HTTP but not
   allowed in MIME, and charsets that are not able to represent line
   breaks as CRLF. A good overview of the differences between HTTP and
   MIME with regards to "Content-Type: Text" can be found in [<a href="#ref-HTTP" title="R. Fielding">HTTP</a>],
   <a href="#appendix-C">appendix C</a>.

   If the original document has line breaks in the canonical form
   (CRLF), then the document SHOULD remain unconverted so that integrity
   check sums are not invalidated.

   A provider of HTML documents who wants his documents to be
   transferable via both HTTP and SMTP without invalidating checksum
   integrity checks, should always provide original documents in the
   canonical form with CRLF for line breaks.

   Some transport mechanisms may specify a default "charset" parameter
   if none is supplied [<a href="#ref-HTTP" title="R. Fielding">HTTP</a>, <a href="#ref-MIME1" title=""MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies"">MIME1</a>]. Because the default differs for
   different mechanisms, when HTML is transferred through mail, the
   charset parameter SHOULD be included, rather than relying on the
   default.

<span class="h2"><a class="selflink" id="section-12" href="#section-12">12</a>. Security Considerations</span>

   Some Security Considerations include the potential to mail someone an
   object, and claim that it is represented by a particular URI (by
   giving it a Content-Location header). There can be no assurance that
   a WWW request for that same URI would normally result in that same
   object. It might be unsuitable to cache the data in such a way that
   the cached data can be used for retrieval of this URI from other
   messages or message parts than those included in the same message as
   the Content-Location header. Because of this problem, receiving User
   Agents SHOULD not cache this data in the same way that data that was
   retrieved through an HTTP or FTP request might be cached.

   URLs, especially File URLs, may in their name contain company-
   internal information, which may then inadvertently be revealed to
   recipients of documents containing such URLs.

   One way of implementing messages with linked body parts is to handle
   the linked body parts in a combined mail and WWW proxy server. The
   mail client is only given the start body part, which it passes to a
   web browser. This web browser requests the linked parts from the



<span class="grey">Palme & Hopmann             Standards Track                    [Page 16]</span>

<span id="page-17" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


   proxy server. If this method is used, and if the combined server is
   used by more than one user, then methods must be employed to ensure
   that body parts of a message to one person is not retrievable by
   another person.  Use of passwords (also known as tickets or magic
   cookies) is one way of achieving this. Note that some caching WWW
   proxy servers may not distinguish between cached objects from e-mail
   and HTTP, which may be a security risk.

   In addition, by allowing people to mail aggregate objects, we are
   opening the door to other potential security problems that until now
   were only problems for WWW users. For example, some HTML documents
   now either themselves contain executable content (JavaScript) or
   contain links to executable content (The "INSERT" specification,
   Java). It would be exceedingly dangerous for a receiving User Agent
   to execute content received through a mail message without careful
   attention to restrictions on the capabilities of that executable
   content.

   Some WWW applications hide passwords and tickets (access tokens to
   information which may not be available to anyone) and other sensitive
   information in hidden fields in the web documents or in on-the-fly
   constructed URLs. If a person gets such a document, and forwards it
   via e-mail, the person may inadvertently disclose sensitive
   information.

<span class="h2"><a class="selflink" id="section-13" href="#section-13">13</a>. Acknowledgments</span>

   Harald T. Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst,
   Lewis Geer, Roy Fielding, Al Gilman, Paul Hoffman, Richard W.
   Jesmajian, Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel
   LaLiberte, Ed Levinson, Jay Levitt, Albert Lunde, Larry Masinter,
   Keith Moore, Gavin Nicol, Pete Resnick, Jon Smirl, Einar Stefferud,
   Jamie Zawinski, Steve Zilles and several other people have helped us
   with preparing this document. I alone take responsibility for any
   errors which may still be in the document.
















<span class="grey">Palme & Hopmann             Standards Track                    [Page 17]</span>

<span id="page-18" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


<span class="h2"><a class="selflink" id="section-14" href="#section-14">14</a>. References</span>

Ref.            Author, title
---------       --------------------------------------------------------

[<a id="ref-CONDISP">CONDISP</a>]       R. Troost, S. Dorner: "Communicating Presentation
                Information in Internet Messages: The
                Content-Disposition Header", <a href="./rfc1806">RFC 1806</a>, June 1995.

[<a id="ref-HOSTS">HOSTS</a>]         R. Braden (editor): "Requirements for Internet Hosts --
                Application and Support", STD-3, <a href="./rfc1123">RFC 1123</a>, October 1989.

[<a id="ref-HTML-I18N">HTML-I18N</a>]     F. Yergeau, G. Nicol, G. Adams, & M. Duerst:
                "Internationalization  of the Hypertext Markup
                Language". <a href="./rfc2070">RFC 2070</a>, January 1997.

[<a id="ref-HTML2">HTML2</a>]         T. Berners-Lee, D. Connolly: "Hypertext Markup Language
                - 2.0", <a href="./rfc1866">RFC 1866</a>, November 1995.

[<a id="ref-HTTP">HTTP</a>]          T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext
                Transfer Protocol -- HTTP/1.0. <a href="./rfc1945">RFC 1945</a>, May 1996.

[<a id="ref-MD5">MD5</a>]           R. Rivest: "The MD5 Message-Digest Algorithm", <a href="./rfc1321">RFC 1321</a>,
                April 1992.

[<a id="ref-MIDCID">MIDCID</a>]        E. Levinson: "Content-ID and Message-ID Uniform
                Resource Locators". <a href="./rfc2111">RFC 2111</a>, February 1997.

[<a id="ref-MIME-IMB">MIME-IMB</a>]      N. Freed & N. Borenstein: "Multipurpose Internet Mail
                Extensions (MIME) Part One: Format of Internet Message
                Bedies". <a href="./rfc2045">RFC 2045</a>, November 1996.

[<a id="ref-MIME1">MIME1</a>]         N. Borenstein & N. Freed: "MIME (Multipurpose Internet
                Mail Extensions) Part One: Mechanisms for Specifying and
                Describing the Format of Internet Message Bodies", <a href="./rfc1521">RFC</a>
                <a href="./rfc1521">1521</a>, Sept 1993.

[<a id="ref-MIME2">MIME2</a>]         N. Borenstein & N. Freed: "Multipurpose Internet Mail
                Extensions (MIME) Part Two: Media Types". <a href="./rfc2046">RFC 2046</a>,
                November 1996.

[<a id="ref-NEWS">NEWS</a>]          M.R. Horton, R. Adams: "Standard for interchange of
                USENET messages", <a href="./rfc1036">RFC 1036</a>, December 1987.








<span class="grey">Palme & Hopmann             Standards Track                    [Page 18]</span>

<span id="page-19" ></span>
<span class="grey"><a href="./rfc2110">RFC 2110</a>                         MHTML                        March 1997</span>


[<a id="ref-PDF">PDF</a>]           Bienz, T., Cohn, R. and Meehan, J.: "Portable Document
                Format Reference Manual, Version 1.1", Adboe Systems
                Inc.

[<a id="ref-REL">REL</a>]           Edward Levinson: "The MIME Multipart/Related Content-
                Type". <a href="./rfc2112">RFC 2112</a>, February 1997.

[<a id="ref-RELURL">RELURL</a>]        R. Fielding: "Relative Uniform Resource Locators", <a href="./rfc1808">RFC</a>
                <a href="./rfc1808">1808</a>, June 1995.

[<a id="ref-RFC822">RFC822</a>]        D. Crocker: "Standard for the format of ARPA Internet
                text messages." STD 11, <a href="./rfc822">RFC 822</a>, August 1982.

[<a id="ref-SGML">SGML</a>]          ISO 8879. Information Processing -- Text and Office  -
                Standard Generalized Markup Language (SGML),
                1986. <URL:http://www.iso.ch/cate/d16387.html>

[<a id="ref-SMTP">SMTP</a>]          J. Postel: "Simple Mail Transfer Protocol", STD 10, <a href="./rfc821">RFC</a>
                <a href="./rfc821">821</a>, August 1982.

[<a id="ref-URL">URL</a>]           T. Berners-Lee, L. Masinter, M. McCahill: "Uniform
                Resource Locators (URL)", <a href="./rfc1738">RFC 1738</a>, December 1994.

[<a id="ref-URLBODY">URLBODY</a>]       N. Freed and Keith Moore: "Definition of the URL MIME
                External-Body Access-Type", <a href="./rfc2017">RFC 2017</a>, October 1996.

<span class="h2"><a class="selflink" id="section-15" href="#section-15">15</a>. Author's Address</span>

   For contacting the editors, preferably write to Jacob Palme rather
   than Alex Hopmann.

   Jacob Palme                          Phone: +46-8-16 16 67
   Stockholm University and KTH         Fax: +46-8-783 08 29
   Electrum 230                         E-mail: [email protected]
   S-164 40 Kista, Sweden

   Alex Hopmann                         E-mail: [email protected]
   Microsoft Corporation
   3590 North First Street
   Suite 300
   San Jose
   CA 95134
   Working group chairman:

   Einar Stefferud <stef@nma.com>






Palme & Hopmann             Standards Track                    [Page 19]

Additional Resources