<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Knuth linebreaking elements for Formatting Objects</title><meta name="generator" content="DocBook XSL Stylesheets V1.68.1"/><meta name="description" content="In this article I develop a new abstract representation of a&#xA;paragraph of text for the purpose of line breaking. This&#xA;reprepresentation is a reformulation of the abstract representation of&#xA;Knuth and Plass, adapted to the features of FO texts."/></head><body><div class="article" lang="en"><div class="titlepage"><div><div><h1 class="title"><a id="d0e2"/>Knuth linebreaking elements for Formatting Objects</h1></div><div><div class="authorgroup"><div class="author"><h3 class="author"><span class="firstname">Simon</span> <span class="surname">Pepping</span></h3></div></div></div><div><p class="copyright">Copyright © 2006 </p></div><div><div class="legalnotice"><a id="d0e63"/><p><a href="http://creativecommons.org/licenses/by-nc-sa/2.5/" target="_top"><span class="inlinemediaobject"><img src="http://creativecommons.org/images/public/somerights20.png"/></span></a></p><p>This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/2.5/" target="_top">Creative Commons Attribution-NonCommercial-ShareAlike
2.5 License</a></p></div></div><div><div class="revhistory"><table border="1" width="100%" summary="Revision history"><tr><th align="left" valign="top" colspan="2"><b>Revision History</b></th></tr><tr><td align="left">Revision 1.0</td><td align="left">2006-04-01</td></tr><tr><td align="left" colspan="2">First publication</td></tr><tr><td align="left">Revision 1.1</td><td align="left">2006-04-29</td></tr><tr><td align="left" colspan="2">Small updates, together with a new release of the
software</td></tr><tr><td align="left">Revision 1.3</td><td align="left">2006-05-03</td></tr><tr><td align="left" colspan="2">Added box-penalty; added section on the basic
building block; changed ‘generalized’ to ‘FO’</td></tr><tr><td align="left">Revision 1.4</td><td align="left">2006-05-27</td></tr><tr><td align="left" colspan="2">Bells and whistles, and text-align
implemented</td></tr><tr><td align="left">Revision 1.5</td><td align="left">2006-06-24</td></tr><tr><td align="left" colspan="2">Text-align-last, a complex index, documentation of the implementation</td></tr></table></div></div><div><div class="abstract"><p class="title"><b>Abstract</b></p><p>In this article I develop a new abstract representation of a
paragraph of text for the purpose of line breaking. This
reprepresentation is a reformulation of the abstract representation of
Knuth and Plass, adapted to the features of FO texts.</p></div></div></div><hr/></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#d0e78">Introduction</a></span></dt><dt><span class="section"><a href="#d0e125">The basic building block</a></span></dt><dt><span class="section"><a href="#d0e148">Building blocks for FO texts</a></span></dt><dt><span class="section"><a href="#d0e222">Text alignment</a></span></dt><dt><span class="section"><a href="#d0e280">A superset of the original Knuth elements</a></span></dt><dt><span class="section"><a href="#d0e305">Examples</a></span></dt><dd><dl><dt><span class="section"><a href="#d0e308">A normal paragraph</a></span></dt><dt><span class="section"><a href="#d0e330">A paragraph with borders at the start and end of each
line</a></span></dt><dt><span class="section"><a href="#d0e363">A paragraph consisting of a single non-breaking space</a></span></dt><dt><span class="section"><a href="#d0e373">Spaces before and after a border</a></span></dt><dt><span class="section"><a href="#d0e394">A complex index</a></span></dt></dl></dd><dt><span class="section"><a href="#d0e545">The FO linebreaking algorithm</a></span></dt><dd><dl><dt><span class="section"><a href="#d0e548">Changes compared to Knuth and Plass</a></span></dt><dt><span class="section"><a href="#d0e564">Implementation</a></span></dt><dt><span class="section"><a href="#d0e708">Bells and Whistles</a></span></dt><dt><span class="section"><a href="#d0e784">The test class</a></span></dt><dt><span class="section"><a href="#d0e940">The print representation of the test result</a></span></dt><dt><span class="section"><a href="#d0e961">The complex index in the test class</a></span></dt></dl></dd><dt><span class="section"><a href="#d0e1008">Acknowledgements</a></span></dt><dt><span class="bibliography"><a href="#d0e1030">Bibliography</a></span></dt></dl></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="d0e78"/>Introduction</h2></div></div></div><p>The line breaking algorithm of Donald E. Knuth and Michael
F. Plass [<a href="#bib-KP">KP</a>] takes as its input an abstract
representation of the text. The building blocks of this abstract
representation are box, glue and penalty elements. A word or part of a
word is represented by a box, and a white space character or a series
of white space characters is represented by a glue element. A glue
element is a legal linebreak if it immediately follows a
box. Linebreak opportunities other than white space are represented by
penalties. Boxes have a width, glue elements have an elastic width,
which consists of a width, a stretch and a shrink, and penalties have
a penalty value and also a width. In addition to this abstract
representation, Knuth and Plass formulated rules to determine the
amount of demerits of a line. This amount is based on the penalty
value of the break point, on the amount of stretching or shrinking of
the line, on the class of the line compared to that of the surrounding
lines, and on the number of consecutive hyphenated lines. With this
representation and these rules, they were able to formulate a
total-fit algorithm which determines very good line breaks for a
paragraph of text.</p><p>The linebreaking algorithm of Knuth and Plass, together with its
abstract representation of the text using its three building blocks,
has been applied in the Formatting Object Processor <a href="http://xmlgraphics.apache.org/fop/" target="_top">FOP</a>, generally with
good results. FOP not only uses this algorithm to determine its
linebreaks but also to determine its page breaks. However, a more
refined line breaking process must take into account the precise
requirements of the <a href="http://www.w3.org/TR/xsl/" target="_top">XSL-FO</a> specification for
whitespace treatment and of the <a href="http://www.unicode.org/reports/tr14/" target="_top">Unicode Standard Annex
14</a> for linebreaking opportunities. Efforts to achieve this
have run into difficulties.</p><p>FOP's developers have identified the following problem cases,
which they could fit into the Knuth linebreak algorithm only with
great difficulty or not at all:</p><div class="orderedlist"><ol type="1" compact="compact"><li><p>…&lt;fo:inline border="…"&gt;some text &lt;/fo:inline&gt;
more … (note the spaces before and after the end of the inline
element)</p></li><li><p>&lt;fo:block&gt;&amp;nbsp;&lt;/fo:block&gt;</p></li><li><p>table header and footer repetition around line
breaks</p></li><li><p>white-space-treatment policy
(preserve/ignore-if-before/surrounding/after-linefeed) and
suppress-at-line-break property</p></li><li><p>letter spacing</p></li><li><p>The <a href="http://www.unicode.org/reports/tr14/" target="_top">Unicode Annex 14</a>
(UAX#14) specifies that linebreaks come between characters. According
to the <a href="http://www.w3.org/TR/xsl/" target="_top">XSL-FO</a>
specification by default white space is suppressed around line
breaks. In the Knuth algorithm, linebreaks come at the space, and the
space is suppressed. Normally, the two are equivalent, but not
always. For example, when white-space-collapse="preserve", the two are
not equivalent.</p></li></ol></div><p>Most of these problem cases can be treated with specially
crafted sequences of elements. But the suppression of white space
before a line break turns out to be inachievable. Knuth's algorithm
suppresses whitespace and penalties after a linebreak but not
before. Indeed, this non-suppression of whitespace before a linebreak
has been turned into a feature, upon which the procedure to obtain
centered and ragged-left texts relies. For the usual suppression of
white space around a linebreak, the algorithm relies on the fact that
a consecutive series of white space characters can be represented as a
single glue element, which is dropped when it is the selected
linebreak. This condition is not always satisfied in the texts of a FO
processor, see item 1 above.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="d0e125"/>The basic building block</h2></div></div></div><p>Every linebreak can be represented by the triple of ‘pre-break
text’, ‘post-break text’, ‘nobreak text’. Note that this is exactly
TeX's <code class="literal">\discretionary</code> control sequence. A paragraph
of text can be represented by a series of such triple elements,
provided their texts do not contain other linebreak opportunities than
the one they represent.</p><p>There are a few extreme cases in which the triple elements
reduce to simpler elements:</p><div class="orderedlist"><ol type="1"><li><p>The pre- and post-break texts are empty, and the
triple does not represent a linebreak opportunity. This case reduces
to a box. Note that the nobreak text is free to contain stretch or
shrink.</p></li><li><p>The nobreak text is empty. This case reduces to a
penalty. Note that it may contain both a pre-break and a post-break
text.</p></li><li><p>The pre- and post-break texts are empty, and the
triple does represent a linebreak opportunity. This case reduces to a
suppressible box which also is a penalty. Let us call it a
box-penalty. The glue element of Knuth and Plass is such a
box-penalty. Note that a box-penalty contains suppressible items and
the linebreak opportunity. In order to determine such box-penalty
elements, the process that builds the abstract representation of the
text must know the actual suppression policy.</p></li><li><p>It is convenient to define another element, which is
not directly derived from a triple element, viz. a suppressible box. A
suppressible box is a box which contains suppressible items. The box
may be suppressed if it is adjacent to a chosen linebreak, or has only
other suppressible boxes between itself and a chosen
linebreak. Whether the box is really suppressed is determined during
the linebreaking process. It depends not only on the chosen linebreak,
but also on the suppression policy. The building process can determine
suppressible items without knowledge of the actual suppression policy,
and the same building process can be applied regardless of the actual
suppression policy.</p></li></ol></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="d0e148"/>Building blocks for FO texts</h2></div></div></div><p>Knuth and Plass designed their set of building blocks for the
types of text they were dealing with: Western text with a single white
space character between words. The XSL-FO specification addresses a
much wider range of texts, among others non-Western texts. These texts
come with new features: non-collapsible sequences of white space
characters, suppression of characters before, after or around line
breaks. For such texts a modified set of building blocks is
required.</p><p>Here I propose such a set:</p><div class="orderedlist"><ol type="1" compact="compact"><li><p>Box, with elastic width. A box has two boolean
properties:</p><div class="orderedlist"><ol type="a" compact="compact"><li><p>suppress-at-linebreak, default value false. According
to the FO specification, in the default case in an FO text it is true
for the space character U+0020. The user may deviate from the default
and set it to false for the space character, and to true for other
characters.</p></li><li><p>is-BP, default value false. This property indicates
whether a box corresponds to a border and/or a padding width. It is
true for boxes which are generated by padding widths and
borders.</p></li></ol></div></li><li><p>Penalty, with a penalty value and two elastic
widths. When the penalty element is the chosen linebreak, it
contributes the first elastic width before the linebreak and the
second elastic width after the linebreak.</p></li><li><p>Box-penalty, with a penalty value, and three elastic
widths. When the box-penalty element is the chosen linebreak, it
behaves as a penalty, otherwise it behaves as a box.</p></li></ol></div><p>Penalties and box-penalties are legal breakpoints. Boxes
are not.</p><p>An elastic width is a set of (width, stretch, shrink).  In FO
terms this would be (optimum, maximum, minimum), where stretch =
maximum − optimum, shrink = optimum − minimum.</p><p>The actual treatment of boxes with
<code class="literal">suppress-at-linebreak=true</code>
is subject to the value of the policy property
<code class="literal">white-space-treatment</code>. Its possible values
are:</p><div class="variablelist"><dl><dt><span class="term"><code class="literal">ignore</code></span></dt><dd><p>All XML white space characters except linefeeds are
suppressed. This should be taken care of in the construction of the
abstract representation. The resulting abstract representation does
not contain suppressible boxes, and the suppression policy during
linebreaking is irrelevant.</p></dd><dt><span class="term"><code class="literal">preserve</code></span></dt><dd><p>Suppressible boxes are not suppressed before or after
linebreaks.</p></dd><dt><span class="term"><code class="literal">ignore-if-surrounding-linefeed</code></span></dt><dd><p>Suppressible boxes are suppressed both before
and after linebreaks.</p></dd><dt><span class="term"><code class="literal">ignore-if-before-linefeed</code></span></dt><dd><p>Suppressible boxes are suppressed before but not
after linebreaks.</p></dd><dt><span class="term"><code class="literal">ignore-if-after-linefeed</code></span></dt><dd><p>Suppressible boxes are suppressed after but not
before linebreaks.</p></dd></dl></div><p> Penalties are treated like boxes with
suppress-at-linebreak=true.</p><p>In the following text I will denote these building blocks for FO
texts as FO building blocks or FO elements.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="d0e222"/>Text alignment</h2></div></div></div><p>FO defines the usual four values for text alignment:
<code class="literal">start</code>, <code class="literal">center</code>,
<code class="literal">end</code> and
<code class="literal">justify</code>. <code class="literal">start</code> and
<code class="literal">end</code> are also known as ragged right or flush right
and ragged left or flush left. These alignments can easily be produced
by inserting penalties at every linebreak opportunity. For
<code class="literal">start</code> we insert a penalty whose width before the
linebreak is zero with a non-zero stretch value. For
<code class="literal">center</code> we insert a penalty whose widths before and
after the linebreak are zero with a non-zero stretch value. For
<code class="literal">end</code> we insert a penalty whose width after the
linebreak is zero with a non-zero stretch value. For
<code class="literal">justify</code> we do nothing special. Some linebreak
opportunities may already have a penalty for other reasons, for
example a hyphen. Then the stretch value for text alignment has to be
added to the widths of that penalty.</p><p>The stretch values are in principle arbitrary. Knuth and Plass
use a value of 18 machine units = 1 em. For the stretch values in the
<code class="literal">center</code> case, Knuth and Plass use the same value
before and after the linebreak. I prefer to use half that value, so
that the total stretch added to a line due to text alignment is the
same in the three non-justified cases. This has the consequence that
the linebreak calculations in the three cases are identical. Only the
stretch is distributed differently.</p><p>FO allows the user to specify a different value for the
alignment of the last line. If the user does not specify a value, then
it is equal to the text alignment value. The case of justified text
alignment is an exception. In this case the default value for the
alignment of the last line is <code class="literal">start</code>, which
corresponds to the usual incomplete last line.</p><p>A different value for the alignment of the last line cannot be
represented in the abstract representation of the paragraph, because
the building blocks have not yet been assigned to a line. It is
possible to take the alignment of the last line into account in the
algorithm, as follows. All linebreak opportunities have a penalty
whose width after the linebreak contains a stretch component
corresponding to the text alignment value. For the linebreak
opportunity of the last line but one, the value of this stretch
component needs to be changed to that corresponding to the value for
the alignment of the last line. I store the difference between these
two values in the final penalty as the
<code class="literal">last-line-additional-before-stretch</code> member.</p><p>When in the main loop the current linebreak opportunity is the
last linebreak opportunity, I use this value. For each active node
that is considered in the inner loop, I increase the line length with
this additional stretch value. This provides the linebreak calculation
with the desired stretch for the last line. After the main loop, when
the best node for the paragraph has been determined, I modify the
penalty of its previous node, and add the additional stretch value to
its width after the linebreak. In this way, the following typesetting
phase sees an abstract representation that has all the correct stretch
values.</p><p>There is one feature that the above trick cannot get right. In
the justified case it is a good idea to use stretchable boxes for
white space characters. In the non-justified case it is better to use
fixed width boxes for them, because all the stretch is placed at the
start and/or end of the line. The stretch of the white space character
boxes in the last line corresponds to the setting of the text
alignment of the body of the paragraph and not to that of the last
line. As a consequence, this method will usually not find a
linebreaking solution for non-justified paragraphs with a justified
last line. In this case the last line will often not have any stretch,
and unless it happens to fit exactly, it will be infinitely bad. In my
opinion such combinations are unrealistic anyway.</p><p>It is possible to extend the trick so that also the stretch in
the last line agrees with the text alignment for that line. To achieve
that one assigns an alignment dependent expression to the stretch of
white space characters. The linebreaking calculation must know the
values of the text alignment of the body of the paragraph and of that
for the last line. During the linebreak calculations the alignment
dependent expression is evaluated according to whether the linebreak
opportunity is the last one of the paragraph or not. I will not
explore this complicated possibility.</p><p>When the parameters of the building blocks are made
line-dependent, one should take care that the dynamic programming
“principle of optimality” remains valid. This is the case here because
we modify the parameters only in the last line.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="d0e280"/>A superset of the original Knuth elements</h2></div></div></div><p>The FO building blocks presented above form a superset
of the original building blocks of Knuth and Plass. That means that
any set of Knuth elements can be mapped to a set of our FO
elements, with the same widths, elastic or not, and the same legal
linebreak points and penalty values.</p><p>The mapping is as follows. We map each Knuth box to an FO box
with the same width. The box has no stretch and shrink. We map each
glue to an FO box with the same elastic width. The box's property
suppress-at-linebreak is true. If the glue is a legal linebreak, that
is, if it immediately follows a box, we insert a penalty before the
corresponding FO box. The penalty has zero value and widths. Each
Knuth penalty is mapped onto an FO penalty with the same penalty
value. The width before the break is equal the original width. The
width after the break is zero. Because the Knuth approach only
suppresses glues and penalties after a linebreak, we use the
whitespace treatment policy ignore-if-after-linefeed.</p><p>It is obvious that the above mapping produces the same legal
linebreaks: Each glue that is a legal linebreak and each penalty are
mapped onto an FO penalty, and no other FO penalties
are generated. Each glue that is a legal linebreak is also mapped onto
a suppressible FO box following an FO
penalty. Therefore, if that FO penalty is a chosen linebreak,
the following FO box is suppressed, which is equivalent to
the dropping of the glue when it is a chosen linebreak.</p><p>After a linebreak a consecutive series of glues and penalties
are dropped, up to the first Knuth box. Such a series is mapped onto
a series of suppressible FO boxes and penalties. The glues
do not generate FO penalties in the mapping, because none of
them follows a box, and therefore none of them is a legal
linebreak. Because our whitespace treatment policy is
ignore-if-after-linefeed, all such suppressible FO boxes and
penalties are suppressed, up to the first non-suppressible FO
box, corresponding to the first Knuth box. This shows that also
the suppression of elements around linebreaks is equivalent in the two
cases.</p><p>This proves that any sequence of building blocks of Knuth and
Plass is equivalent to a sequence of FO building blocks, and
that the FO building blocks form a superset of the original
building blocks. Our FO approach can therefore deal with any
situation which can be dealt with using the approach of Knuth and
Plass.</p><p>Before closing this section, I want to point out a different
mapping between the two sets of building blocks. Its validity is
limited to the case of a paragraph consisting of words and single
white space characters. The words may be hyphenated. The alignment
policy for the paragraph is justified. This is the most common case of
western text. In this case the abstract representation of Knuth and
Plass consists of a series of boxes interrupted by a penalty for a
hyphen or a glue for a word space.</p><p>We map the boxes to FO boxes and the penalties to
FO penalties, as before. However, now we map the glues (which
are all legal linebreaks) to a suppressible FO box followed
by an FO penalty. And our whitespace treatment policy is
ignore-if-surrounding-linefeed. When a hyphenation penalty is a chosen
linebreak, both cases are identical. When a glue is a chosen linebreak
in the Knuth case, the corresponding FO penalty is the chosen
linebreak in the FO case, and the preceding suppressible
box is suppressed. This makes the behaviour in the two cases
identical.</p><p>This shows that for default text, the Knuth case can also be
mapped to the default whitespace treatment policy of <a href="http://www.w3.org/TR/xsl/" target="_top">XSL-FO</a> and to linebreak
opportunities following spaces, as prescribed in the <a href="http://www.unicode.org/reports/tr14/" target="_top">Unicode Annex
14</a>. It also shows that a Knuth glue can be thought of as a
combination of a suppressible box for the space with a penalty for the
linebreak opportunity. Only in special cases, such as centered text
alignment, does the approach by Knuth and Plass deviate markedly from
the XSL-FO and Unicode defaults.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="d0e305"/>Examples</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e308"/>A normal paragraph</h3></div></div></div><p>The list of elements is: </p><div class="informalexample"><p><code class="literal">(box, box(suppressible, elastic), penalty(0, 0,
0))+.</code></p></div><p> Linebreaks occur at some of
the penalties. When the whitespace treatment policy is
<code class="literal">ignore-if-surrounding-linefeed</code> or
<code class="literal">ignore-if-before-linefeed</code>, the suppressible box
before each linebreak (corresponding to a wordspace) is
suppressed. Otherwise, when the whitespace treatment policy is
<code class="literal">preserve</code> or
<code class="literal">ignore-if-after-linefeed</code>, each line ends in a
whitespace.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e330"/>A paragraph with borders at the start and end of each
line</h3></div></div></div><p>The list of elements is: </p><div class="informalexample"><p><code class="literal">(box, box(suppressible, elastic), penalty(0, (x,y,z),
(x,y,z)))+.</code></p></div><p> Here
<code class="literal">x</code>, <code class="literal">y</code>, and <code class="literal">z</code>
are the width, stretch and shrink of the border. Linebreaks occur at
some of the penalties. There is a border at the end and start of each
line. When the whitespace treatment policy is
<code class="literal">ignore-if-surrounding-linefeed</code> or
<code class="literal">ignore-if-before-linefeed</code>, the suppressible box
before each linebreak (corresponding to a wordspace) is
suppressed. Otherwise, when the whitespace treatment policy is
<code class="literal">preserve</code> or
<code class="literal">ignore-if-after-linefeed</code>, each line ends in a
whitespace.</p><p>The same approach can be used for borders and padding that are
repeated around page breaks, and for table headers and footers that
are repeated around page breaks. In the table case, the width of the
header and footer may be different for each pagebreak opportunity, due
to their interaction with the cell and row borders.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e363"/>A paragraph consisting of a single non-breaking space</h3></div></div></div><p>The list of elements is: </p><div class="informalexample"><p><code class="literal">box(elastic).</code></p></div><p> The only linebreak comes at the end of the
paragraph. The non-breaking space is not suppressible, and will be the
only (non-printing) content of the line.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e373"/>Spaces before and after a border</h3></div></div></div><p>The list of elements around the border is: </p><div class="informalexample"><p><code class="literal">…, box, box(suppressible,elastic), box(BP),
box(suppressible,elastic), penalty(0), ….</code></p></div><p> There is no linebreak opportunity before the border
because it may be considered as a close operator in the sense of
<a href="http://www.unicode.org/reports/tr14/" target="_top">UAX#14</a>. There is no
linebreak opportunity after the border, because it is forbidden by the
following space. If the penalty shown is chosen as a linebreak and the
whitespace treatment policy is
<code class="literal">ignore-if-surrounding-linefeed</code> or
<code class="literal">ignore-if-before-linefeed</code>, both suppressible boxes
shown will be suppressed. The line will end in a word followed by the
border.</p><p>This is the only example that can really not be solved with the
approach of Knuth and Plass. That is so because there is no way to
turn the space before the border into a glue. Therefore it must be
suppressed before a linebreak, which the approach of Knuth and Plass
does not provide.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e394"/>A complex index</h3></div></div></div><p>Under this title Knuth and Plass present their showcase, a
complex index with precise typesetting requirements. Can the complex
index be modelled with the FO building blocks? Above I have
proven than any situation which can be modelled with the building
blocks of Knuth and Plass, can be modelled with the FO building
blocks. Therefore, the answer is yes, the complex index can be
modelled by mapping Knuth and Plass' solution according to the above
derived mapping recipe. But in this section I will model it by
constructing a building block sequence that produces the same results
in the breaking and non-breaking cases as the sequence of Knuth and
Plass, with the default whitespace treatment policy
<code class="literal">ignore-if-surrounding-linefeed</code>.</p><p>Each index entry represents a separate paragraph for the
linebreaking algorithm. It consists of two parts, a names and a
references part.</p><p>The names part is modelled as follows.</p><div class="orderedlist"><ol type="1"><li><p>A white space character is represented by a
suppressible box with (min opt max) = (667 1000 1000). This
corresponds to (width stretch shrink) = (6 0 2), i.e. glue(6,0,2) in
the units of Knuth and Plass.</p></li><li><p>A linebreak opportunity is represented by a penalty
with a width before the linebreak equal to (1500 1500 4500). This
corresponds to (width stretch shrink) = (9 18 0),
i.e. glue(w<sub>2</sub>,18,0) in the units of Knuth and
Plass, with w<sub>2</sub> = 9.</p></li></ol></div><p>When the white space character is the chosen linebreak,
its box is suppressed, and the penalty's width-before is
inserted. These results agree with those of Knuth and Plass: </p><div class="orderedlist"><ol type="1"><li><p>When the white space character is not the chosen
linebreak, it contributes a glue(6,0,2).</p></li><li><p>When the white space character is the chosen linebreak,
it contributes a glue(w<sub>2</sub>,18,0) before the
linebreak.</p></li></ol></div><p>The references part is modelled as follows.</p><div class="orderedlist"><ol type="1"><li><p>A white space character is represented by a
suppressible box with (min opt max) = (667 1000 1000). This
corresponds to (width stretch shrink) = (6 0 2), i.e. glue(6,0,2) of
Knuth and Plass.</p></li><li><p>A linebreak opportunity is represented by a penalty
with a value of 999 and a width after the linebreak equal to (0 0
3000). This corresponds to (width stretch shrink) = (0 18 0),
i.e. glue(0,18,0) in the units of Knuth and Plass.</p></li></ol></div><p>When the white space character is the chosen linebreak,
its box is suppressed, and the penalty's width-after is inserted at the
start of the next line. These results agree with those of Knuth and
Plass: </p><div class="orderedlist"><ol type="1"><li><p>When the white space character is not the chosen
linebreak, it contributes a glue(6,0,2).</p></li><li><p>When the white space character is the chosen linebreak,
it contributes a glue(0,18,0) after the linebreak.</p></li></ol></div><p>The transition part is modelled as follows.</p><div class="orderedlist"><ol type="1"><li><p>A white space character, which is represented by a
suppressible box with (min opt max) = (667 1000 1000). This
corresponds to (width stretch shrink) = (6 0 2), i.e. glue(6,0,2) of
Knuth and Plass.</p></li><li><p>A linebreak opportunity, which is represented by a
penalty with a width before the linebreak equal to(min opt max) = (1500 1500
4500). This corresponds to (width stretch shrink) = (9 18 0),
i.e. glue(w<sub>2</sub>,18,0) in the units of Knuth and
Plass, with w<sub>2</sub> = 9.</p></li><li><p>A leader box, which is represented by a box with (min
opt max) = (0 3600 large-number). This corresponds to a leader box
with (width stretch shrink) = (3600 large-number 3600),
i.e. leaders(3w<sub>3</sub>, large-number,
3w<sub>3</sub>) in the units of Knuth and Plass, with
w<sub>3</sub> = 1200.</p></li><li><p>A linebreak opportunity, which is represented by a
penalty with a width before the linebreak equal to (min opt max) =
(7500 7500 7500) and a width after the linebreak equal to (min opt
max) = (0 0 3000). The width before the linebreak corresponds to
(width stretch shrink) = (45 0 0),
i.e. glue(w<sub>1</sub>,0,0) in the units of Knuth and
Plass, with w<sub>1</sub> = 45. The width after the
linebreak corresponds to (width stretch shrink) = (0 18 0),
i.e. glue(0,18,0) in the units of Knuth and Plass.</p></li></ol></div><p>When none of the linebreak opportunities is the chosen
linebreak, the penalties are dropped, and the result is as
follows:</p><div class="orderedlist"><ol type="1"><li><p>The white space character contributes a
glue(6,0,2).</p></li><li><p>The leader box contributes
leaders(3w<sub>3</sub>, large-number,
3w<sub>3</sub>).</p></li></ol></div><p>This is also the result of Knuth and Plass.</p><p>When the first linebreak opportunity is the chosen linebreak,
the white space character is suppressed, and the penalty's width-before
is inserted, and the result is as
follows:</p><div class="orderedlist"><ol type="1"><li><p>The penalty contributes a
glue(w<sub>2</sub>,18,0) before the
linebreak.</p></li><li><p>The leader box contributes
leaders(3w<sub>3</sub>, large-number,
3w<sub>3</sub>).</p></li></ol></div><p>This is also the result of Knuth and Plass.</p><p>When the second linebreak opportunity is the chosen linebreak, the result is as follows:</p><div class="orderedlist"><ol type="1"><li><p>The white space character contributes a
glue(6,0,2).</p></li><li><p>The leader box contributes
leaders(3w<sub>3</sub>, large-number,
3w<sub>3</sub>).</p></li><li><p>The penalty contributes a
glue(w<sub>1</sub>,0,0) before the
linebreak.</p></li><li><p>The penalty contributes a glue(0,18,0) after the
linebreak.</p></li></ol></div><p>This is also the result of Knuth and Plass.</p><p>The conclusion is that the complex index can easily be modelled
in terms of the FO building blocks with the default white space
treatment policy. The resulting sequences are simpler than those of
Knuth and Plass. This is the result of the property that a penalty may
contain a width after the linebreak.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="d0e545"/>The FO linebreaking algorithm</h2></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e548"/>Changes compared to Knuth and Plass</h3></div></div></div><p>The linebreaking algorithm of Knuth and Plass works for their
elements: boxes, glues and penalties. The FO building blocks
have a few properties which require changes to the
algorithm. </p><div class="orderedlist"><ol type="1"><li><p>Elements are not only suppressible after but also
before a linebreak. As a consequence the last element that contributes
to the line no longer coincides with the linebreak element. The
linebreak nodes must track this.</p></li><li><p>Penalties not only contribute width before but also
after the linebreak. As a consequence there is a contribution to the
linewidth before the first element of the line. The linebreak nodes
must track this.</p></li><li><p>The sequence of elements that are suppressed at the
start or end of a line may contain border and padding boxes,
i.e. boxes for which is-BP is true. These boxes are not
suppressed. The linebreak nodes must track this.</p></li></ol></div><p> In Knuth and Plass' case a linebreaking node needs to
track a line from its first element to the linebreaking element. In
the FO case, a linebreaking node needs to track a line from
its first to its last element, it needs to register the contributions
of the linebreaking elements of the previous and of this line, and it
needs to register the widths of the border and padding boxes before
the first and after the last line elements.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e564"/>Implementation</h3></div></div></div><p>Together with this essay I publish a simple implementation of
the FO linebreaking algorithm. It acts on an abstract representation
of a paragraph consisting of FO building blocks. It suppresses
elements both before and after a linebreak, according to the
white-space-treatment policy set on the paragraph. Its nodes are
modified to satisfy the above tracking needs.</p><p>The implementation follows the algorithm of Knuth and Plass
closely. It is, however, a completely new implementation, programmed
in an object oriented style. Knuth and Plass describe their algorithm
in great detail. Here I will highlight the main points.</p><p>Before I start the description of the algorithm, I will define a
few terms.</p><div class="glosslist"><dl><dt>Linebreak opportunity</dt><dd><p>A position at which a linebreak may occur. Called
legal linebreak by Knuth and Plass.</p></dd><dt>Feasible linebreak</dt><dd><p>A linebreak opportunity for which there exists a
tolerable paragraph layout up to that point.</p></dd><dt>Node</dt><dd><p>A programmatic object that is created for each
feasible linebreak. It contains information needed by the algorithm to
evaluate further linebreak opportunities, and by the typesetting
process to typeset the laid out paragraph.</p></dd><dt>Active list</dt><dd><p>A list of nodes for which there is a chance that the
line between the linebreak opportunity of the node and the current
linebreak opportunity is tolerable. At the start of the algorithm, a
node is created for the pseudo-linebreak opportunity at the start of
the paragraph, and the active list is seeded with it. For each
feasible linebreak a node is created and added to the active
list. Nodes may also be deactivated, i.e. removed from the active
list. Deactived nodes may still be referred to by active nodes and
therefore may still play a role in the layout.</p></dd><dt>Line elements</dt><dd><p>The consecutive series of building blocks in a line
that are not suppressed. The border and padding items in the
suppressed region are not counted as line elements.</p></dd></dl></div><p/><p>The core of the algorithm consists of two nested loops. The main
loop is a loop over the linebreak opportunities of the paragraph. For
each linebreak opportunity, a loop is made over the nodes in the
active list. For each active node, the line between the linebreak
opportunity to which the node refers, and the current linebreak
opportunity is evaluated. If the main loop has so far progressed
through the paragraph that the line from the active node to the
current linebreak opportunity has become too long and must be shrunken
beyond its minimum length, the active node is deactivated. Otherwise,
if the line from an active node to the current linebreak opportunity
falls within the tolerance, a node is created for it. At the end of
the nested loop all nodes created for this linebreak opportunity are
compared to each other. The best node represents the best layout of
the paragraph up to that linebreak opportunity. It is added to the
active list.</p><p>When the main loop has reached the end node of the paragraph,
the best node represents the best layout of the whole paragraph. It is
the result of the linebreaking calculation. Through its reference to
the previous node, i.e. the node describing the linebreak of the
preceding line, and of all nodes to their previous nodes, the program
is able to trace the linebreaks for all lines of the paragraph.</p><p>During the main loop it is possible that the active list becomes
empty, because nodes are removed and insufficient new nodes are
added. That means that the paragraph has no possible layout within the
requested tolerance. In this situation my implementation gives
up. Typesetting systems must define a strategy for such cases,
e.g. they accept the last considered line although it exceeds the
tolerance.</p><p>A node contains the following information:</p><div class="variablelist"><dl><dt><span class="term"><code class="literal">previous</code></span></dt><dd><p>The node of the linebreak of the preceding
line.</p></dd><dt><span class="term"><code class="literal">lineBeforeEndPos</code></span></dt><dd><p>The index of the paragraph element after the line
elements on the line before the linebreak</p></dd><dt><span class="term"><code class="literal">totalBoxWidthBefore</code></span></dt><dd><p>The total width of all boxes up to the element at
<code class="literal">lineBeforeEndPos</code></p></dd><dt><span class="term"><code class="literal">BPWidthBefore</code></span></dt><dd><p>The border/padding width after the line elements of
the line before the linebreak</p></dd><dt><span class="term"><code class="literal">lbPos</code></span></dt><dd><p>The index of the linebreak
opportunity</p></dd><dt><span class="term"><code class="literal">BPWidthAfter</code></span></dt><dd><p>The border/padding width before the line elements of
the line after the linebreak</p></dd><dt><span class="term"><code class="literal">lineAfterStartPos</code></span></dt><dd><p>The index of the first line element of the line after
the linebreak</p></dd><dt><span class="term"><code class="literal">totalBoxWidthAfter</code></span></dt><dd><p>The total width of all boxes up to the element at
<code class="literal">lineAfterStartPos</code></p></dd><dt><span class="term"><code class="literal">demerits</code></span></dt><dd><p>The amount of demerits of the line before the
linebreak</p></dd><dt><span class="term"><code class="literal">adjRatio</code></span></dt><dd><p>The adjustment ratio of the line before the
linebreak</p></dd><dt><span class="term"><code class="literal">adjClass</code></span></dt><dd><p>The adjustment class of the line before the
linebreak</p></dd><dt><span class="term"><code class="literal">lineNumber</code></span></dt><dd><p>The number of the line that is ended by the
linebreak</p></dd></dl></div><p>There are three items before the linebreak, and three
corresponding items after the linebreak. They hold calculated values
so that these need not be recalculated each time they are needed. The
last two items are needed by the ‘Bells and Whistles’ which will be
described below.</p><div class="figure"><a id="d0e703"/><p class="title"><b>Figure 1. The parts of a line</b></p><pre class="screen">
========================================================================

   |←previous.lbPos                                         lbPos→|
   +-+-------+--------------------------------------------+-------+-+
   | |  BAP  |               line elements                |  BAP  | | ← the line
   +-+-------+--------------------------------------------+-------+-+
    ↑    ↑   |                                            |   ↑    ↑
    |    |   |←lineAfterStartPos         lineBeforeEndPos→|   |    |
    |    |   |←totalBoxWidthAfter     totalBoxWidthBefore→|   |    |
    |    BPWidthAfter                             BPWidthBefore    |
    WidthAfter                                           WidthBefore

========================================================================
</pre></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e708"/>Bells and Whistles</h3></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a id="d0e711"/>Bells and Whistles</h4></div></div></div><p>Knuth and Plass add a few bells and whistles to their
algorithm:</p><div class="orderedlist"><ol type="1" compact="compact"><li><p>The assignment of extra demerits to consecutive
hyphenated lines.</p></li><li><p><a id="varlinelengths"/>The ability to deal with
variable line lengths.</p></li></ol></div><p>Under the title “More Bells and whistles” they present a
few more features which improve the linebreaking result.</p><div class="orderedlist"><ol start="3" type="1" compact="compact"><li><p><a id="adjclasses"/>The distinction of four classes
of lines, according to their tightness.</p></li><li><p><a id="looseness"/>The ability to loosen a
paragraph, that is, to make it one or more lines longer or shorter
than the optimal solution.</p></li></ol></div><p>All these bells and whistles are implemented.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a id="d0e732"/>Fitness classes</h4></div></div></div><p>The line classes in item <a href="#adjclasses">3</a> are called
fitness classes. When two consecutive lines differ by more than one
fitness class, they are considered contrasting lines and get an extra
amount of demerits. The adoption of fitness classes makes that the
inner loop no longer results in a single best node. There may be a
best node for each fitness class. The algorithm applies a small
optimization: If the minimum amount of demerits of a certain fitness
class differs from the minimum amount of demerits of all classes by
more than the extra amount of demerits for contrasting lines, the best
node of that class will never be part of the best layout. Therefore it
need not be remembered.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a id="d0e739"/>Variable line lengths</h4></div></div></div><p>Item <a href="#varlinelengths">2</a> adds some complexity to
the algorithm. The inner loop now may only compare nodes for the same
line number with each other. Therefore it must only run over active
nodes with the same line number. It may result in a best node for each
line number for which the linebreak opportunity is a feasible
linebreak.</p><p>Knuth and Plass implement this by keeping the nodes in the
active list ordered on line number. They let the inner loop run over a
part of the active list whose nodes have the same line number. I have
chosen a more explicit mechanism. The program holds a list of active
lists. The inner loop runs over one active list. The best node found
by the inner loop is added to the next active list. This inner loop is
repeated for all active lists.</p><p>There are always a limited number of line lengths; let us call
this number <code class="code">L</code>. Because the lines with line number
<code class="code">L</code> and higher have the same length, their nodes may be
compared with each other. Therefore we may put the active nodes with
line number <code class="code">L − 1</code> and higher together.</p><p>The program holds a list of <code class="code">L</code> active lists. The
first list contains only the start node, which pseudo-ends line number
−1. The following property holds: At any time during the process each
active list contains only nodes with the same line number. At the
start this is trivially true because it is true for the first active
list. The algorithm adds to each active list only nodes whose line
number is 1 higher than the line number of the preceding active
list. This keeps the above property true. The last active list is an
exception. In the last active list we collect the active nodes with
line number <code class="code">L − 1</code> and higher.</p><p>When the main loop has reached the final penalty, all best nodes
are compared together, whether we have reached line <code class="code">L</code> or
not. As before, the algorithm results in a single best node.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a id="d0e772"/>Looseness</h4></div></div></div><p>For item <a href="#looseness">4</a> the algorithm needs to know
the line number of the final nodes. When it has determined the best
node, it applies the required looseness by selecting the best node
whose line number is ‘looseness’ higher or lower than that of the
overall best node. Such a node is not always available, because there
may not be a tolerable paragraph layout with that number of
lines. Then the best node is selected whose line number is 1 closer to
that of the overall best node, etc.</p><p>The implementation reuses the mechanism of variable line
lengths. But there is no highest line number <code class="code">L</code>. We keep
the active nodes with different line numbers separate for all line
numbers, and the number of active lists is unbounded. This ensures
that we always select the best node for a certain line number, and in
the end know the best final node for each line number.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e784"/>The test class</h3></div></div></div><p>The implementation contains a test class which can be used from
the command line: <code class="code">java
nl.leverkruid.spepping.gkplinebreaking.XMLTestCase
file.xml</code>. Here <code class="code">file.xml</code> is an XML file which
contains <code class="code">p</code> elements in the no-name namespace. These
<code class="code">p</code> elements may have one or more of the following
attributes:</p><div class="orderedlist"><ol type="1"><li><p><code class="code">tolerance</code>, with an integer value
indicating the desired value for the tolerance. The default value is
5.</p></li><li><p><code class="code">white-space-treatment</code>, with the values
listed above. The default value is
<code class="code">ignore-if-surrounding-linefeed</code>. The value
<code class="code">ignore</code> is not implemented, because the test class does
not suppress white space during building of the abstract
representation.</p></li><li><p><code class="code">linewidth</code>, with an integer value
indicating the desired value for the linewidth. Each character gets a
width of 1000. Therefore a linewidth of 50000 is equal to the width of
50 characters. If there is no <code class="code">linewidth</code> attribute, the
<code class="code">linewidths</code> attribute is used. If that is also absent, the
default value of 50000 is used.</p></li><li><p><code class="code">linewidths</code>, with a space-separated list
of integer values. The first value is the desired linewidth of the
first line, and so on. The last value is used for all remaining
lines. The <code class="code">linewidths</code> attribute is not used if there is
also a <code class="code">linewidth</code> attribute.</p></li><li><p><code class="code">looseness</code>, with an integer value
indicating the desired number of lines by which the paragraph should
be loosened (positive value) or tightened (negative value). The
default value is 0. Due to Java, positive values must be entered
without a plus sign.</p></li><li><p><code class="code">text-align</code>, with the same values as the
FO attribute of the same name. In the centered case, Knuth and Plass
use the same stretch value as in the ragged-left and -right cases, but
apply it both to the start and to the end of the line. I apply half
that value to both ends. As a result, the stretch value in a line is
the same in all three non-justified cases.</p></li><li><p><code class="code">text-align-last</code>, with the same values as
the FO attribute of the same name.</p></li><li><p><code class="code">ragged-stretch</code>, with an integer
value. Knuth and Plass use a value of 1 em, which corresponds to 3
times their width of a hyphen, for the stretch of the linebreak
penalty for the non-justified cases. I use a similar default value,
viz. 3 times 1000. But this attribute allows one to experiment with
other values. A high value (up to Java's
<code class="code">Integer.MAX_VALUE</code>) produces a markedly different
result.</p></li></ol></div><p>
The <code class="code">p</code> elements may of course contain text. The text may
contain <code class="code">span</code> elements in the no-name namespace. These
<code class="code">span</code> elements may have three attributes:</p><div class="orderedlist"><ol type="1"><li><p><code class="code">BAP-width</code>, whose value should be one
integer or a list of three integers separated by spaces, indicating
the desired value for the width of the border and padding. If the
value consists of three integers, they indicate, respectively, the
minimum, optimum and maximum, of that width. The default value is
0.</p></li><li><p><code class="code">BAP-conditionality</code>, with value "discard"
or "retain", indicating the desired conditionality for the border and
padding. The default value is "discard".</p></li><li><p><code class="code">letterspacing</code>, whose value should be one
integer or a list of three integers separated by spaces, indicating
the desired value for the letterspacing. If the value consists of
three integers, they indicate, respectively, the minimum, optimum and
maximum of the letterspacing. The default value is no
letterspacing.</p></li></ol></div><p>
The <code class="code">span</code> elements may of course contain text. It is a
limitation of the implementation of the test class that a
<code class="code">span</code> element should not be followed immediately by
another <code class="code">span</code> element.</p><p>Any other elements and attributes will be ignored. The
<code class="code">p</code> elements mimick <code class="code">fo:block</code> elements, and the
<code class="code">span</code> elements mimick <code class="code">fo:inline</code> elements in
the <code class="code">fo:block</code> element.</p><p>Linefeeds are treated according to the setting
<code class="code">linefeed-treatment = "space"</code>. White space is treated
according to the setting <code class="code">white-space-collapse =
"false"</code>. <code class="code">white-space-treatment = "ignore" is not
applied.</code></p><p>Linebreaks are determined by a <code class="code">LineInstance</code> of
class <code class="code">BreakIterator</code> of the ICU library. The ICU4J library
must be in the classpath. It can be obtained from <a href="http://icu.sourceforge.net/" target="_top"><code class="code">http://icu.sourceforge.net/</code></a>. Hyphenation
must be inserted into the test file explicitly, by use of hard (U+2D)
or soft (U+AD) hyphens. Hard hyphens will of course always be
printed.</p><p>It is hoped that this arrangement allows easy testing of a
wide range of texts.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e940"/>The print representation of the test result</h3></div></div></div><p>The test class uses several symbolic notations to give an
impression of the linebreaking result.</p><div class="orderedlist"><ol type="1"><li><p>The end of a line is indicated by a vertical stroke
‘|’. This allows one to see unsuppressed spaces at the end of the
line.</p></li><li><p>The stretch is indicated by four spaces in start and
end text alignment, and by two spaces on either side of the line in
centered text alignment. When the adjustment ratio is zero, the
stretch evaluates to zero and it is not printed.</p></li><li><p>The infinite stretch in the last line is indicated by
six spaces in start and end alignment of the last line, and by three
spaces on either side of the line in centered text
alignment of the last line.</p></li><li><p>The start and end of a span are indicated by angle
brackets ‘〈’ and ‘〉’.</p></li><li><p>Letterspacing is indicated by spaces between the
letters. The space at the start and end of a word is indicated by a
‘+’. In this way it can be seen whether the latter is suppressed
around a linebreak.</p></li></ol></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="d0e961"/>The complex index in the test class</h3></div></div></div><p>The test class has additional provisions to allow index test
cases similar to the complex index example. Each index entry consists
of a <code class="code">names</code> and a <code class="code">references</code> element. It is
good XML practice to embed a pair of these elements in an entry
element, and to use an index element as the container of the entry
elements. But the test class does not enforce that. It acts on each
<code class="code">names</code> and each <code class="code">references</code> element. It will
only produce good results when one <code class="code">names</code> element is
followed by a <code class="code">references</code> element. A
<code class="code">references</code> element finishes the entry.</p><p>The <code class="code">names</code> element recognizes two attributes:
<code class="code">linewidth</code> or <code class="code">linewidths</code>, and
<code class="code">tolerance</code>, with the same values as for the paragraph
element. These values are also applied to the following
<code class="code">references</code> element. The index example has a large number
of parameters, but the test class has no provisions to modify them in
the test file by attributes. All parameters are coded as static values
of the test class.</p><p>Due to a special treatment of infinite stretch values in the
width calculations, the test class is able to use a leader box with an
infinite stretch. In the test print-out the leaders are represented by
an ellipsis …. Note that there is no space between the leaders and the
following text, but there is a linebreak opportunity at that
point.</p><p>There are two possible ways to implement hanging indentation: by
means of a width-after in the penalties, of by means of different line
lengths. The test class does not implement the penalties method. The
user has to indicate the desired amount of indentation by indicating a
second linewidth which is smaller than the first by that amount. The
test print-out does not know that the shorter linewidth is an
indentation. The linebreak calculations, however, do use the desired
line lenghts.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="d0e1008"/>Acknowledgements</h2></div></div></div><p>This work would not have been possible if I had not been a
member of the development team of <a href="http://xmlgraphics.apache.org/fop/" target="_top">Apache FOP</a>. Through
the work of the team I learnt much about programming of digital
printing. My work directly builds on the work of <a href="http://xmlgraphics.apache.org/fop/team.html#lf" target="_top">Luca
Furini</a>, who brought the linebreaking algorithm of Knuth and
Plass to FOP, and provided its implementation both in linebreaking and
in page breaking. I was much inspired by the work of <a href="http://xmlgraphics.apache.org/fop/team.html#jm" target="_top">Jeremias
Märki</a>, who implemented much of FOP, partially based on Luca's
work. My work was most directly prompted by the work of <a href="http://xmlgraphics.apache.org/fop/team.html#mm" target="_top">Manuel
Mall</a>, who tirelessly insisted on the remaining problems in
FOP's white space handling, and by that of <a href="http://xmlgraphics.apache.org/fop/team.html#ad" target="_top">Andreas
Delmelle</a>, who together with Manuel worked on a solution for
these problems.</p><p>Finally, I owe many thanks to Donald E. Knuth and
Michael F. Plass for their linebreaking algorithm. Their clear
description of it made much of my implementation easy; I simply
followed their rules.</p></div><div class="bibliography"><div class="titlepage"><div><div><h2 class="title"><a id="d0e1030"/>Bibliography</h2></div></div></div><div class="bibliomixed"><a id="bib-KP"/><p class="bibliomixed">[KP] 
		<span class="author"><span class="firstname">Donald</span> <span class="othername">E.</span> <span class="surname">Knuth</span></span> and <span class="author"><span class="firstname">Michael</span> <span class="othername">F.</span> <span class="surname">Plass</span></span>, <span class="title">Breaking Lines into Paragraphs</span>,
<span class="bibliomset"><i>Software—Practice and Experience</i>,
<span class="volumenum">11</span> (<span class="pubdate">1981</span>)
<span class="pagenums">1119–1184</span></span>; <span class="bibliomset">reprinted in:
<i>Digital Typography</i>, Ch. 3,
pp. <span class="pagenums">67–155</span></span>.
	</p></div></div></div></body></html>