2 Level Cache

During the formatting process, forward references can delay the completion of a page. If all the pages have "page 1 of n" on them using page-number-citation, then all the area trees must remain in memory until all the pages have been processed. To prevent this there will be 2 caches for intermediate files, to conserve memory.

Cache #1: Area Tree

If a page cannot be completed while its content is being parsed, it's area tree must be stored until all forward references can be resolved. The idea behind this is that once the forward reference has been resolved, the adjacent areas positions / widths can be adjusted to allow for "page 1 of 9" or "page 1 of 99" or similar.

Because the area tree will contain significantly more information than it's equivalent in the output, persistance of the area tree will be a much more expensive operation than persistance of the output format. To reduce this, the page will be broken into "viewports" that correspond to the viewport / reference pairs produced by the 5 page regions. Each viewport can be stored individually, making it possible to convert as much of the page as possible into pseudo-svg in the output, because many of the viewports are not affected by the forward reference.

Example:(TODO - replace with real picture)
|   1   |
| |   | |
|2| 3 |4|
| |   | |

(Output cache explained later) The formatter is creating the page above, and has the area tree for the whole page in memory. If viewport 5 contains "page 1 of ?" where ? is a page-number-citation, viewports 1 - 5 can all be converted to pseudo-svg output, with a ? placeholder in viewport 5, and sent to the output cache. Once that is done the formatter may discard the area trees for regions 1 - 4 because they will not be needed anymore. Viewport 5 is then stored in the area tree cache, and the formatter continues with the next page. The user agent determines whether to render the page immediately (quick print), or store the output in the output cache for later printing, or both. When the formatter has resolved the forward reference and is ready to update viewport 5, the following sequence will occur:

  1. The formatter will request that the output cache return the "incomplete" page.
  2. Load the area tree for viewport 5 from the area tree cache.
  3. Update the area tree and make adjustments if necessary.
  4. Convert the area tree to pseudo-svg and update the output viewport
  5. Push the "complete" page back to the user agent (for rendering, storing, both)
  6. discard the area tree.

If the user agent chose not to store the incomplete page, step 1 would fail, and the rest of the sequence will be skipped. This may be useful for quick draft printing.

NOTE(CS): We may need a "trigger" mechanism to indicate that a page can be completed.


Next steps

Note: if the formatter tries to pull from the cache a page that has already been deleted upon user agent's decision, it simply means that that page needs not be reworked and that the user agent was satisifed with the "uncomplete" version. It could be useful for example for fast printing.

Cache #2: Output cache


The main part of the formatter analyses successive fo:page-sequences, but it cannot deliver to the user agent the pages in increasing page number order. For example, forward references require to keep pages on hold until these forward references are resolved.

On the opposite, a renderer like a PDF renderer based on PDFlib might need to, or find simpler, to process the pages in sequence.

A viewer like a KDE viewer might in turn just require the pages that the user is currently seeing (for example, lower part of page 5, entire page 6, and upper part of page 7). Then, long after the formatter has finished formatting the entire document, it might require another page (for example page 8) as the user is browing through the document.

Other user agents might not care at all of the order in which pages are delivered to them.

Instead of writing the logic in each user agent to handle such situations, it has been chosen to implement, inside the formatter, a cache that can deliver pages in any order.

Requirements for the formatter

The formatter is supposed to pass every page to the cache, exactly one time for each page, but in any order.

It means that the formatter should not:

The formatter can of course directly pass a completed page to the cache if there is no missing information like page-number-citation that would prevent from completing it in the first place.

The formatter must indicate to the cache when it starts formatting a page with registerPage() method. This page is considered as "not ready" until the formatter tells the cache with pageReady() method that the cache can take any appropriate action with that page. On return of pageReady() method, the page description can safely be deleted.

Since the list of pages that are being worked on is in the cache, the formatter can retrieve this information from the cache, and does not need to maintain a list of its own.

The formatter must keep in mind that pages are numbered starting at 0 (old C convention)!NOTE(CS) This must change

Requirements for the user agent

The user agent communicates with the cache with two functions of the formatter API:

LIBXSLFO_API void foRenderPage
	(const foUserAgent *userAgent,
	 const char *filename,
	 foNextDelivery firstDelivery);

void (*pageFunc)
	(const foPageDescription *pageDescription,
	 int pageNum, bool complete,
	 bool &keepPageForLaterUse,
	 foNextDelivery &nextDelivery);

The "firstDelivery" and "nextDelivery" structures are made of a number :

"keepPageForLaterUse" tells the cache that it might still be needed later on, so it should stay in the cache on disk rather than being deleted.

File names

The cached pages are stored in files named: