General considerations

Docxpresso is a general tool to generate online reports and business documentation in PDF, ODF, Word and RTF formats that, among other possibilities, allows for the insertion and styling of its contents using HTML5 and CSS.

Even if you can generate with Docxpresso pretty sophisticated documents using HTML5 code, Docxpresso is not meant as a tool to exactly reproduce an arbitrary web page. If nice formatting is a must you may need to carefully craft the HTML and CSS code so you render a version of the web page that better adapts to the limitations of paged media.

In order to better support some general document components not covered by the HTML5 standard Docxpresso extends it in a simple way to include:

  • Footnotes and endnotes
  • Charts
  • Page numbering
  • Table of Contents
  • Math

Moreover we have adapted to what we considered as their natural “paged document” equivalents some of the standard HTML5 tags like:

  • header and footer,
  • section,
  • table header,
  • etcetera.

This also applies to a few CSS properties that have been reinterpreted to better suite the needs of standard paged media.

Let us get now deeper into details.

The html method

The html method is the one responsible for the insertion of HTML code.

The html method accepts the input HTML and CSS code as a string or as a URL and parses internally its contents to convert them into standard document elements.

Although the use of HTML+CSS code shows very convenient most of the times one should be aware that the parsing of complicated stylesheets aand intricate HTML code suck up resources and computing time, so if you are planning to generate a very long document and high performance is a must we strongly recommend to make direct use of the full Docxpresso API.

The html method public API is given by:

Signature

public html ($options)

Parameters

  • $options (type: array). This array has the following available keys and values:

    • baseURL (type: string). If set enforces the base URL used for relative paths, otherwise it will be autodetected.
    • context (type: array). This option is only taken into account if we need to fetch the HTML code via a http request. Its use is optional and if not set standard values, valid in the majority of cases, will be used.This option sets the http context (headers) for the http request. Depending on the server this option may be mandatory, i.e. the server will not return anything unless it receives this additional info. The keys and values are( see: http://www.php.net/manual/en/context.http.php for a more detailed information):
      • method (type: string, default:GET). The possible values are GET or POST.
      • header (type: array). An array with the required headers that may include among others: ‘Referer’, ‘User-agent’, ‘Referer’, ‘Connection’, …
      • proxy (type: array). The address of the proxy server (if any).
      • request_fulluri (type: boolean, default: false).
      • follow_location (type: integer). Follow Location header redirects. Set to 0 to disable. Default value is 1.
      • max_redirects (type: integer).The maximum number of redirects to follow. If 1 or less means that no redirects are followed (default is 20).
      • protocol_version (type: string, , default: 1.0). HTTP protocol version.
    • encoding (type: string). If set enforces the encoding to be used otherwise it will be autodetected. Whenever possible we recommend to use UTF-8 encoding (the native encoding of Docxpresso).
    • html (type: string). It can be a string of HTML + CSS code or the path to the the path to the HTML file if the ‘isFile’ option is set to true.
    • isFile (type: boolean, default: false). Set to true if the HTML code has to be retrieved from a external (remote or not) file. Default value is false, i.e. the HTML is given as a string.

Before getting into further details we would like to offer you an example that will help you to rapidly grasp the power of the html method:

<?php
/**
* This sample script inserts some (extended) HTML5 code into the document
*/
require_once 'pathToDocxpresso/CreateDocument.inc';
$doc = new Docxpresso\CreateDocument();
$format = '.pdf';//.pdf, .doc, .docx, .odt, .rtf
//html code
$html = '
<html>
<head>
<style>
p {font-family: Verdana; font-size: 10pt}
h1 {color: #b70000; margin-bottom: 12pt; font-family: "Century Gothic"; page-break-before: always}
footnote {font-family: Verdana; font-size: 8pt}
.headerTable {width: 15cm; border: none}
.headerImage {width: 5cm}
.headerTitle {width: 10cm; vertical-align: middle;}
.headerTitle p {font-size: 12pt; font-weight: bold; color: #567896; font-family: "Century Gothic"}
.docFooter {border-top: 1px solid #777; color: #555; text-align: right; font-family: Verdana; font-size: 10pt}
.red {color: #b70000; font-size: 8pt; font-weight: bold}
</style>
</head>
<body>
<header>
<table class="headerTable">
<tr>
<td class="headerImage"><p><img src="Docxpresso.png"/></p></td>
<td class="headerTitle"><p>Docxpresso Sample Document</p></td>
</table>
</header>
<h1>Sample document generated with HTML5</h1>
<p>This example only aims to illustrate how <strong>HTML5PDF</strong> renders a sample
web page in different document formats.</p>
<p>We include a footnote<footnote>Just some random text with a 
<span class="red"> little formatting</span>.</footnote>
and a simple pie chart so we get a little more sophisticated example:</p>
<chart type="pie" style="width: 15cm">
<legend />
<category name="First" value="30" />
<category name="Second" value="20" />
<category name="Third" value="25" />
<category name="Fourth" value="10" />
</chart>
<h1>Another page</h1>
<p>This is just to check how the header and footer are included in every 
single page with the correct page numbering.</p>
<footer>
<p class="docFooter">Page <page /></p>
</footer>
</body>
</html>
';
$doc->html(array('html' => $html));
$doc->render('sample_html' . $format);   
//echo a link to the generated document
echo 'You may download the generated document from the link below:<br/>';
echo '<a href="' . 'sample_html' . $format . '">Download document</a>';

DOWNLOAD:download pdfdownload docdownload docxdownload odtdownload rtf

Like you can check from the download links one may create a document with header, footer, headings, paragraphs, footnotes and even charts with a few lines of (extended) HTML5 code!!

Supported HTML5 tags

The parsed HTML5 tags include:

  • a: Defines a hyperlink.
  • abbr: Defines an abbreviation.
  • acronym: Defines an acronym.
  • address: Defines contact information for the author/owner of a document.
  • article: Defines an article.
  • aside: Defines content aside from the page content.
  • b: Defines bold text.
  • base: Specifies the base URL/target for all relative URLs in a document.
  • bdi: Isolates a part of text that might be formatted in a different direction from other text outside it.
  • bdo: Overrides the current text direction.
  • big: Not supported in HTML5. Use CSS instead. Defines big text.
  • blockquote: Defines a section that is quoted from another source.
  • body: Defines the document’s body.
  • br: Defines a single line break.
  • button: Defines a clickable button.
  • caption: Defines a table caption.
  • center: Defines centered text (supported although deprecated in HTML5).
  • chart: Docxpresso extension.
  • cite: Defines the title of a work.
  • code: Defines a piece of computer code.
  • col: Specifies column properties for each column within a ‘colgroup’ element .
  • colgroup: Specifies a group of one or more columns in a table for formatting.
  • command: Defines a command button that a user can invoke.
  • datalist: Specifies a list of pre-defined options for input controls.
  • date: Docxpresso extension.
  • dd: Defines a description/value of a term in a description list.
  • del: Defines text that has been deleted from a document.
  • details: Defines additional details that the user can view or hide.
  • dfn: Defines a definition term.
  • div: Defines a group of elements to be formatted via CSS.
  • dl: Defines a description list.
  • dt: Defines a term/name in a description list.
  • em: Defines emphasized text .
  • endnote: Docxpresso extension.
  • fieldset: Groups related elements in a form.
  • figcaption: Defines a caption for a ‘figure’ element.
  • figure: Specifies self-contained content.
  • footer: Defines a footer for a document or section.
  • footnote: Docxpresso extension.
  • form: Defines an HTML form for user input.
  • h1: Defines a HTML heading level 1.
  • h2: Defines a HTML heading level 2.
  • h3: Defines a HTML heading level 3.
  • h4: Defines a HTML heading level 4.
  • h5: Defines a HTML heading level 5.
  • h6: Defines a HTML heading level 6.
  • head: Defines information about the document.
  • header: Defines a header for a document or section.
  • hr: Defines a thematic change in the content.
  • html: Defines the root of an HTML document.
  • i: Defines italic text.
  • img: Defines an image.
  • input: Defines an input control.
  • ins: Defines a text that has been inserted into a document.
  • label: Defines a label for an ‘input’ element.
  • legend: Defines a caption for a ‘fieldset’ element.
  • li: Defines a list item.
  • math: Docxpresso extension.
  • mark: Defines marked/highlighted text.
  • menu: Defines a list/menu of commands.
  • menuitem: an element of a menu list.
  • nav: Defines navigation links.
  • ol: Defines an ordered list.
  • option: Defines an option in a drop-down list.
  • outline: Docxpresso extension. Set the format of the different TOC elements
  • output: Defines the result of a calculation.
  • p: Defines a paragraph.
  • page: Docxpresso extension.
  • q: Defines a short quotation.
  • ref: Docxpresso extension. Allows to get the page or text of a bookmarked element.
  • samp: Defines sample output from a computer program.
  • section: Defines a section in a document.
  • select: Defines a drop-down list.
  • small: Defines smaller text.
  • span: Defines a chunk of formatted text.
  • strike: Not supported in HTML5. Use ‘del’ instead. Defines strikethrough text.
  • strong: Defines important text.
  • sub: Defines subscripted text.
  • summary: Defines a visible heading for a ‘details’ element.
  • sup: Defines superscripted text.
  • tab: Docxpresso extension.
  • table: Defines a table.
  • td: Defines a cell in a table.
  • textarea: Defines a multiline input control (text area).
  • tfoot: Groups the footer content in a table.
  • th: Defines a header cell in a table.
  • time: Defines a date/time.
  • toc: Docxpresso extension.
  • tr: Defines a row in a table.
  • tt: Defines teletype text (deprecated in HTML5).
  • u: Defines text that should be stylistically different from normal text.
  • ul: Defines an unordered list.
  • var: Defines a variable.
  • wbr: Defines a possible line-break.

Supported CSS properties

The parsed CSS properties include:

Color properties

  • color: font color in hexadecimal/rgb format or standard CSS color list.
  • opacity: a number between 0 and 1 determining the transparancy of the element.

Background and Border Properties

  • background: shorthand for background properties.
  • background-color: a color in hexadecimal/rgb format or standard CSS color.
  • background-image: the absolute or relative url of the image to be used as background.
  • background-position: tha background image position. It can be given like a combination of the top, right, bottom, left and center properties or with standard CSS units.
  • background-repeat: specifies if the background image is to be repeated and how (there is no support for the repeat-x and repeat-y properties).
  • border: shorthand for all border properties.
  • border-bottom: shorthand for the bottom border properties.
  • border-bottom-color: bottom border color in hexadecimal/rgb format or standard CSS color.
  • border-bottom-style: bottom border style.
  • border-bottom-width: bottom border width in standard CSS units.
  • border-color: sets all border color properties in one shot.
  • border-left: shorthand for the left border properties.
  • border-left-color: left border color in hexadecimal/rgb format or standard CSS color.
  • border-left-style: left border style.
  • border-left-width: left border width in standard CSS units.
  • border-right: shorthand for the right border properties.
  • border-right-color: right border color in hexadecimal/rgb format or standard CSS color.
  • border-right-style: right border style.
  • border-right-width: right border width in standard CSS units.
  • border-style: sets all border style properties in one shot.
  • border-top: shorthand for the top border properties.
  • border-top-color: top border color in hexadecimal/rgb format or standard CSS color.
  • border-top-style: top border style.
  • border-top-width: top border width in standard CSS units.
  • border-width: sets all border width properties in one shot.

Basic Box Properties

  • bottom: distance from the bottom in standard CSS units.
  • display: sets the element display properties.
  • float: sets the relative positioning.
  • height: height in standard CSS units.
  • left: distance from the left in standard CSS units.
  • padding: shorthand for the different padding properties of an element.
  • padding-bottom: padding bottom in standard CSS units.
  • padding-left: padding left in standard CSS units.
  • padding-right: padding right in standard CSS units.
  • padding-top: padding top in standard CSS units.
  • position: sets the element type of positioning.
  • right: distance from the right in standard CSS units.
  • top: distance from teh top in standard CSS units.
  • visibility: sets the visibility of the element.
  • width: width in standard CSS units.
  • vertical-align: vertical alignment (only top, middle or bottom are parsed)
  • z-index: the z-index of an absolutely positiones element.

Flexible Box Layout

  • margin: shorthand for the different margin properties of an element.
  • margin-bottom: margin bottom in standard CSS units.
  • margin-left: margin left in standard CSS units.
  • margin-right: margin right in standard CSS units.
  • margin-top: margin top in standard CSS units.
  • max-height: maximum height in standard CSS units.
  • max-width: maximum width in standard CSS units.
  • min-height: minimum height in standard CSS units.
  • min-width: minimum width in standard CSS units.

Text Properties

  • hyphens: specifies how to go about splitting words.
  • letter-spacing: spacing between characters in standard CSS units.
  • line-break: specify how (or if) to break lines.
  • line-height: sets the line height in standard CSS units.
  • text-align: sets the type of text alignment (center, left, …)
  • text-align-last: specifies how to align the last line of a text.
  • text-indent: text indent in standard CSS units.
  • text-transform: convert to lower or uppercase the text of an element.

Text Decoration Properties

  • text-decoration: sets the text decoration of the element text.
  • text-decoration-color: text decoration line color.
  • text-decoration-line: text decoration line type.
  • text-decoration-style: text decoration line style
  • text-shadow: text shadow (incomplete support).

Font Properties

  • font: shorthand for font properties.
  • font-family: font family (Arial, Verdana, …).
  • font-kerning: sets the character kerning.
  • font-size: sets the font size in standard CSS units
  • font-style: sets the font style.
  • font-variant: sets the font variant.
  • font-weight: sets the font weight (bold, normal, …)

Writing Modes Properties

  • direction: specifies the text direction/writing direction (left or right)
  • writing-mode: the same as above but allows also to control the top/bottom direction.

Table Properties

  • border-collapse: sets the border collapse mode.
  • border-spacing: sets the border spacing (incomplete support).
  • empty-cells: determines if empty table cells are shown.

Lists Properties

  • list-style: shorthand for list properties
  • list-style-type: the list style type (disc, circle, …).

Transform Properties

  • transform: transformation to be applied to the element (only rotation is supported).

Basic User Interface Properties

  • outline: shorthand for the ouline properties
  • outline-color: outline color.
  • outline-style: outline style
  • outline-width: outline width in standard CSS units.

Multi-column Layout Properties

  • break-after: sets if there may be a break after an element.
  • break-before: sets if there may be a break after an element.
  • break-inside: sets if there may be a break within an element.
  • column-count: number of columns.
  • column-gap: separation between columns in standard CSS units.
  • column-rule: shorthand for the rule between columns.
  • column-rule-color: rule color
  • column-rule-style: rule line style
  • column-rule-width: rule line width.
  • column-width: width of a column in standard CSS units.
  • columns: shorthand for column properties.

Paged Media

  • orphans: determines the minimum number of orphan lines in a page.
  • widows: determines the minimum number of widow lines in a page.
  • page-break-after: sets if there may be a page break after an element
  • page-break-before: sets if there may be a page break before an element
  • page-break-inside: sets if there may be a page break within an element

Some commented examples

Here we include for your convenience a few commented Docxpresso scripts that mainly use HTML5 + CSS to generate standard document elements.

Nicely formatted table

It is very simple to generate nice tables with a few lines of code:

<?php
/**
* This sample script inserts some (extended) HTML5 code into the document
*/
require_once 'pathToDocxpresso/CreateDocument.inc';
$doc = new Docxpresso\CreateDocument();
$format = '.pdf';//.pdf, .doc, .docx, .odt, .rtf
//html code
$html = '
<html>
<head>
<style>
body {font-family: Calibri; font-size: 11pt}
.niceTable {border-collapse: collapse}
.niceTable td {border: 1px solid #657899; padding: 2px 5px; width: 5cm; margin: 0}
.niceTable th {vertical-align: bottom; border-bottom: 1px solid #657899 !important; padding: 2px 5px; width: 5cm; font-weight: bold; margin: 0}
.niceTable th.firstCol {font-style: italic; border: none; text-align: right; background-color: white}
.niceTable td.firstCol {font-style: italic; border: none; border-bottom: 1px solid #ffffff !important; text-align: right; background-color: white}
.odd {background-color: #d5e0ff}
</style>
</head>
<body>
<p>Just a nicely formatted table:</p>
<table class="niceTable">
<tr>
<th class="firstCol">Table title</th>
<th>Column 1</th>
<th>Column 2</th>
</tr>
<tr class="odd">
<td class="firstCol">Row 1</td>
<td class="odd">Cell_1_1</td>
<td class="odd">Cell_1_2</td>
</tr>
<tr>
<td class="firstCol">Row 2</td>
<td>Cell_2_1</td>
<td>Cell_2_2</td>
</tr>
</table>
</body>
</html>
';
$doc->html(array('html' => $html));
$doc->render('nice_table_html' . $format);   
//echo a link to the generated document
echo 'You may download the generated document from the link below:<br/>';
echo '<a href="' . 'nice_table_html' . $format . '">Download document</a>';

DOWNLOAD:download pdfdownload docdownload docxdownload odtdownload rtf

The table formatting is inspired in one of the typical Word table formats. Notice that some of the CSS styles are redundant, this is due to assure nice rendering in all possible formats. The .pdf, .odt and .doc formats do not require, for example, the reiterative inclusion of the odd class attribute in table cells.

Paragragraphs with multiple elements

It is equally simple to generate out of HTML5 + CSS code a paragraph that wraps around an image and that includes links, bolded words, etcetera:

<?php
/**
* This sample script inserts some (extended) HTML5 code into the document
*/
require_once 'pathToDocxpresso/CreateDocument.inc';
$doc = new Docxpresso\CreateDocument();
$format = '.pdf';//.pdf, .doc, .docx, .odt, .rtf
//html code
$html = '
<html>
<head>
<style>
body {font-family: Georgia; font-size: 11pt}
.Docxpresso {text-indent: 1cm;}
.Docxpresso img {float: right;}
</style>
</head>
<body>
<p class="Docxpresso"><img src="Docxpresso.png" alt="A nice picture of the team"/>
This paragraph nicely wraps around a floating image. It is also very simple
to include all other kind of inline embedded elements like <strong>Bolded text</strong>, <i>italics</i>
or <span class="coloured">coloured text</span>. You may also include <a href="http://www.Docxpresso.com">links</a>
and any other inline element.</p>
</body>
</html>
';
$doc->html(array('html' => $html));
$doc->render('paragraph_html' . $format);   
//echo a link to the generated document
echo 'You may download the generated document from the link below:<br/>';
echo '<a href="' . 'paragraph_html' . $format . '">Download document</a>';

DOWNLOAD:download pdfdownload docdownload docxdownload odtdownload rtf

In order to get optimal results for .rtf output one should explicitely include the width and height attributes into the img tag.
If our target output format is .docx we may fine tune the vertical positioning of the image by using the CSS top attibute. In this particular case one should use, to get exactly the same reults that one gets in the .pdf, .doc and .odt formats, the following CSS code: .Docxpresso img {float: right; top: -25pt}

HTML forms

A very convenient way to generate PDF forms is from HTML code. In the following example we create a very simple PDF form with an input field, a select dropdown menu and a checkbox:

<?php
/**
* This sample script inserts some (extended) HTML5 code into the document
*/
require_once 'pathToDocxpresso/CreateDocument.inc';
$doc = new Docxpresso\CreateDocument();
$format = '.pdf';//.pdf, .odt, other doc formats do not support forms
//html code
$html = '
<html>
<head>
<style>
body {font-family: Arial; font-size: 11pt}
input, select {margin-left: 10px}
.Docxpressoorm {border: 1px solid #333; padding: 15px 15px 0 15px; background-color: #f6f6ff; margin: 15px}
.Docxpressoorm p {margin-bottom: 10px;}
</style>
</head>
<body>
<form class="Docxpressoorm">
<p><label>Your name:</label> <input type="text" name="yourName" value="" /></p>
<p>
<label>Gender:</label> 
<select name="gender">
<option value="male">Male</option>
<option value="female" selected>Female</option>
<option value="other">Other</option>
</select>
</p>
<p><label>I like Docxpresso:</label> <input type="checkbox" name="like" value="0" checked /></p>
</form>
</body>
</html>
';
$doc->html(array('html' => $html));
$doc->render('form_html' . $format);   
//echo a link to the generated document
echo 'You may download the generated document from the link below:<br/>';
echo '<a href="' . 'form_html' . $format . '">Download document</a>';

DOWNLOAD:download pdfdownload docdownload docxdownload odtdownload rtf

To keep it simple we have not included a send button to further process the form data in a web server although there is no major problem in doing so. In the present case the data provided by the user is just saved in the PDF itself.

The Word and RTF standard do not have support for standard forms so the input fields are simply ignored, that is why those formats are not offered for download.

Advanced topics

If you dare, you may modify the Docxpresso default CSS stylesheet by hand by editing directly the Resources class located in the classes/Parser/HTML folder to better adapt it to your needs. Although it is planned for future releases to facilitate this procedure by the time being this is only way to modify the default stylesheet.