Formatter Concept

1. Introduction.

The prototype demonstrates that a separate system of managers is unnecessary. Managers _were_ used in the prototype, and worked well, but the FO to manager relationship is 1:1.

An FO may fall into the following categories:

  1. it generates and returns no areas at all (auxiliary formatting objects);
  2. it returns areas from its children;
  3. it generates areas of its own (and possibly returns areas from its children, if any).

2. Generated Areas.

If FOs generate areas they may be differentiated further by the types of areas that they generate (? means 0 or 1, * is infinity):

  1. page-sequence Note: the page-sequence actually generates all these areas. This is done with the assistance of the layout-master-set page and region masters.
  2. block-container
  3. block
  4. table-and-caption, table, list-block, list-item, footnote-body
  5. inline-container, external-graphic, instream-foreign-object
  6. character
  7. bidi-override, inline, basic-link
  8. leader, page-number, page-number-citation
  9. table-caption, table-cell
  10. float

3. Properties.

Refer to the properties document for detailed information.

4. Inheritance.

All formatting objects will ultimately extend an foFormattingObject base class. This class will be responsible for property inheritance, and mapping of corresponding properties (absolute <-> relative). All formatting objects share this functionality.

Formatting objects will maintain a list of properties that apply to them specifically.

5. Layout.

The prototype demonstrates that if a manager approach is adopted, that because of the differences outlined in para 2, coupled with the extensive variability of properties across FOs and their subsequent effects, that one ends up with almost as many different kinds of managers as there are FOs. This argues for keeping layout logic in each FO - certainly from a model standpoint this is easily justified.

There is a substantial amount of layout logic that can be placed in the foFormattingObject class. It will comprise a number of different functions - the point is that no one single layout function can serve more than a few FOs, let alone all of them. So it is unlikely that there will be extra levels of inheritance between foFormattingObject and the actual formatting object classes.

5.1 Pseudo-FOs.

Runs of text (PCDATA) will become special formatting objects in their own right. An unbroken run of PCDATA in the FO should appear to the formatter core as a formatting object with properties as specified by fo:character, influenced by the parent (real) FO.

5.2 Lifecycle.

Here is a brief summary of the lifecycle of a formatting object:

  1. SAX element start
    1. check the parent content model to see if this element can be added in this position

      Rationale: it is impossible to completely express the FO tree relationships with XML DTDs, and very difficult to do so with schemas (probably also not possible), because they use references to _descendants_.

      NOTE (EB): in SGML there was the "exclusion operator" which was able to express such conditions. "-(float)" was a way to write you didn't want to see "float" elements at any level below. I don't know if this is still possible with XML, and if yes, whether they used it in the FO DTD or not. I suggest to do at least a test, because it's useless to do such tests if the validating parser can do them for you.

      NOTE (CS): Created a xsd schema for basic validation of content. the relationships above are not described, so those constraints must be done through code. Extension elements in other namespaces not supported yet. Due to "any property can be present on any fo" interpretation of the spec (http://lists.w3.org/Archives/Public/xsl-editors/2004JanMar/0021.html) the attributes are restricted as follows:

      • Any attribute can appear on any fo
      • If it does appear it must have a value (non zero-length)
      • The values can only be restricted as xs:string, because of expressions
      • "Any" attribute may appear if in another namespace.

    2. construct the formatting object and store the attributes as specified properties (do not actually construct any property classes);
    3. make a deep copy of the parent inheritable property set;
    4. iterate over the relevant properties for this FO, by name, and create a property instance for each, using inheritance as required, and updating the inheritable property set from iii as required;
    5. reconcile properties according to special constraints on this FO, if any;
    6. compute the values of corresponding properties (XSL Recommendation Section 5.3.x). Final results are 3 sets of properties, composed of relevant values: the specified properties, the inheritable set, and the computed/actual properties.
    7. Add the FO as a child to its parent, and also maintain a pointer to the parent in the child FO. NOTE (EB): all points from ii-vii could be in a different order, and could even be all constructor duties.
    8. Call the layout() method on the FO. This causes the following sub-sequence of actions, for an FO that generates areas:
      1. create the first (or only) area with appropriate traits. This becomes the current area for this FO.
      2. if a column or page break is called for, via a property, call for either a newPage() or a newColumn() on the current page-sequence FO. This has the effect of re-calling layout() on all incomplete FOs from the page-sequence on down to the current FO, hence creating exactly those new areas on the new page that are required, and resuming _this_ sequence of actions at step (1), but with the break conditions now satisfied. The current area (and current reference area) for this FO will change. Immediately prior to actually calling newPage() or newColumn() we call finish_area() on this FO (see (c) below)).
      3. locate the _parent_ reference area. This is straightforward - get the _current_ reference area maintained by the parent FO. If the area we created in 1) is a reference area, set that as the _current_ reference area for this FO (this is what matters for areas returned by child FOs).
      4. dimension the current area from 1) using dimensions (remaining space in the stacking direction, especially) from the _parent_ reference area, the allocation rectangle for this area (as determined by properties on this FO; Section 4.2.3 in the XSL spec), and the stacking constraints between this area and the preceding one (e.g resolved spaces).

        This dimensioned area may have a fixed size in both dimensions, or, as is more often the case, may have a zero current size in one dimension, with a known maximum value.

        The area, once dimensioned, will have a block-progression-direction and an inline-progression-direction, a current and a maximum dimension in the block-progression-direction, and a current and maximum dimension in the inline-progression-direction.

      5. This concludes the layout() for this FO, for its current area.
  2. Operations proceed with child FOs of this FO. According to _their_ layout() methods they will ask this FO for its _current_ reference area (Step 3 in viii above), and will also ask this FO for available space. If space is available this FO will furnish dimensions, but it will _not_ make any adjustments to the dimensions of its current area yet.

    If space is _not_ available (e.g. a line area will not fit into the current block area) then we do exactly viii/2 above. This causes layout() to be called on all ancestors up to page-sequence level, hence re-creating new areas as required, in a new column and/or page. The current area for this FO will change. Immediately prior to actually calling newPage() or newColumn() we call finish_area() on this FO (see (c) below)).

    NOTE: the primary reason why more generic managers will not work well is because there are numerous constraints that vary from FO to FO. It is when a child FO is being added to an FO that the parent must satisfy these extra constraints (e.g special rules for tables, lists).
  3. SAX element end
    1. Add the _unassigned_ areas of incomplete child FOs to the current area. Mark them as assigned.
    2. Call finish_area() on this FO. This has 2 main purposes:
      1. It calculates final positioning, in the sense of the conceptual area tree (i.e left-position, right-position, top-position, bottom-position, left-offset, top-offset), and determines other final trait values on the current area;
      2. It calls a dimension resizing method on the parent FO. This causes the parent FO to actually adjust its current dimensions for _its_ current area (refer to last sentence in first para of (b), above, for context).

Further discussion: this sequence of operations (in particular, child areas are added as late as possible, and parents are only resized at that time) allows us to defer consideration of keeps in detail for now, with reasonable assurance that the required mechanism can be retrofitted. (AHS: I have a decent mental idea of how this would work).

6. Output

The conceptual area tree produced by the core formatter will be fed to a SVG translator, on a page by page basis, as pages are completed. The formatter must deliver completed pages. The cache is responsible for satisfying the user agent delivery strategy.

The SVG translator will deliver a minimum, adequate subset of SVG. (To be determined). The cache will store these simplified SVG pages. This format will be both an in-memory and on-disk representation.