Here I present a modified set of building blocks (a.k.a. Knuth elements) for the abstract representation of the text of a paragraph for linebreaking. This modified set is adapted to the features of FO texts.
The linebreaking algorithm of Knuth and Plass, using its three building blocks, the Knuth elements box, glue and penalty, has been applied in the Formatting Object Processor FOP, generally with good results. But a more refined line breaking process must take the precise requirements of the XSL-FO specification for whitespace treatment and of the Unicode Annex 14 for linebreaking opportunities into account. Efforts to achieve this have run into difficulties.
Knuth and Plass designed their set of building blocks for the types of text they were dealing with: Western text with single whitespace between words. The XSL-FO specification addresses a much wider range of texts, among others non-Western texts. These texts come with new features: non-collapsible whitespace, suppression of characters before, after or around line breaks. For such texts a modified set of building blocks is required.
Here I propose such a set:
Box, with elastic width. A box has two boolean properties:
suppress-at-linebreak, default value false. According to the FO specification, in the default case in an FO text it is true for the space character U+0020. The user may deviate from the default and set it to false for the space character, and to true for other characters.
is-BP, default value false. This property indicates whether a box corresponds to a border and/or a padding width. It is true for boxes which are generated by padding widths and borders.
Penalty, with a penalty value and two elastic widths. When the penalty element is the chosen linebreak, it contributes the first elastic width before the linebreak and the second elastic width after the linebreak.
Box-penalty, with a penalty value, and three elastic widths. When the box-penalty element is the chosen linebreak, it behaves as a penalty, otherwise it behaves as a box.
Penalties and box-penalties are legal breakpoints. Boxes are not.
My essay “Knuth linebreaking elements for Formatting Objects” gives a detailed account of this approach. (XHTML, HTML)
In order to test my ideas in practice, I have written a simple implementation in Java of this approach: