This directory contains complete code and user documentation for the preproff
program, which transforms nroff/troff files in preparation for running off.
The program was developed at Hitachi Computer Products (America), Inc., in its
Open Systems Division, which is responsible for porting and adapting OSF
offerings.  Hitachi Computer Products has given the program to OSF so it can
use and distribute it as part of its documentation build tools.  In addition,
OSF licensees can adapt the program to the needs of their sites, and this file
offers guidelines for doing so.  The program is offered as is, with no
warranty attached, and so forth and so on.

  The sections of this file are:

	What to do before compiling, on any kind of porting effort
	How to change options
	How to add or delete operations from the program
	Justification for the chosen implementation of this program
	Known constraints and design flaws

--------------------------------------------------------------------------------

What to do before compiling, on any kind of porting effort

   Change the strings in site_specific.h.

   If you want to change the options available to the user (or the options run
   by default) read on.

How to change options

   This program has been designed to allow a great deal customization at each
   site.  So if you know you only want one feature (such as the rmifdef or -d
   option), or if you want to provide a particular set of features for
   processing all books at your site, you can make changes in one place in the
   source code -- the site_specific.c file.  This section shows how to pick
   and choose among the current options.  If you want to add new features, you
   also have to read the following section, "How to add or delete operations
   from this program."

   As you read this explanation, look at the array in site_specific.c.  The
   members of the structure are defined in filter.h.

   The program accepts options on three levels.   Each is controlled by members
   in the site_specific.c array.  So you can control the use of that option by
   flipping its element in the array from TRUE to FALSE, or vice versa:

      1.  Defaults -- features that take effect if the user enters no
	  options at all on the command line.  These defaults also apply if
	  the user enters some options but does not enter -standard.

	  Currently, nothing runs by default.  All input passes through
	  unchanged.

	  To make an option run by default, change the on_by_default member of
	  its element to TRUE.  For instance, if you replace the two lines for
	  the "soelim" option in site_specific.c to the following, the -soelim option
	  will always run unless the user explicitly specified -nosoelim.

	  "soelim",	"Read in files within .so requests, so this program can act on them too",
				TRUE,	     TRUE,	    TRUE,	    FALSE,

      2.  Standard options -- combinations of features that every user should
	  invoke when running off documents at your site.  These options are
	  invoked by the -standard option, and override the defaults described
	  in level 1.  You, as the person with possession of the source code,
	  can decide what features are invoked by -standard.

	  To make an option standard, change the standard_turns_on member of
	  its element to TRUE.  To eliminate an option from the standard set,
	  change its standard_turns_on member to FALSE.  For instance, the
	  following lines make "soelim" non-standard:

	  "soelim",	"Read in files within .so requests, so this program can act on them too",
				TRUE,	     FALSE,	     FALSE,	      FALSE,

      3.  Individual options -- the user can always override the defaults and
	  the -standard features by specifying other options.  The default or
	  -standard features can be turned off by specifying an option
	  beginning with "-no".

	  If you definitely do not want anyone at your site to use a specific
	  option, change its available member to FALSE.  It will not be
	  accepted by the program and will not be displayed in a help message.
	  Here is an example making the -soelim option totally unavailable:

	  "soelim",	"Read in files within .so requests, so this program can act on them too",
				FALSE,	      FALSE,	      FALSE,	      FALSE,

    Needless to say (we're all used to working with documentation, aren't we?)
    you should change the user documentation in the howto file, when you make
    a change to the program.

How to add or delete operations from the program

   This program lumps a lot of useful activities into one lex-generated filter
   which makes a single pass through an input stream.  For the application
   this program is supporting -- the conversion of general OSF input to
   proprietary troff source -- this approach might be a little more efficient
   than the traditional UNIX practice of passing text through a series of
   single-purpose filters.  Makefile writers (and possibly individual users)
   will find it easier to specify different options to a single command than
   to order a set of filters on a command line.

   But there are certainly maintenance problems created by the single-pass
   approach:

	* You have to be extremely careful about interactions between
	  activities.  This is particularly true since lex(1) has a
	  complicated manner of choosing between possible actions.  Thus, in
	  this program, for every string that kicks off some activity, you
	  also have to include a rule in the <IGNORE_LINES> state.  To be more
	  precise, when the lexical analyzer is supposed to be ignoring lines,
	  any string that can fire an action in that state will still fire
	  that action; therefore you have to override it by preceding the
	  rule with an <IGNORE_LINES> rule.

	* The program has to maintain several stacks (of input files for .so
	  requests, of #ifdef lines, and of its own internal %Start states)
	  which really go beyond the bounds of what lex(1) is meant to do.
	  The code reaches into the internally-generated code of lex.yy.c to
	  hack up my own way to maintain stacks of input files and states.
	  While OSF has found that the program compiles and runs on different
	  systems, this might not always be true.  The program is probably
	  completely unadaptable to GNU's flex, for example.

	* The program has grown much larger and more complicated than a
	  lexical analyzer should be.  This is because it was originally
	  created to handle a couple small tasks that required more
	  sophisticated parsing than a sed or awk script could handle -- then
	  the program proved so useful that the Hitachi site kept adding
	  features incrementally.  By now, any addition you make will probably
	  cause lex(1) to run out of buffers, and you will have to increase %p
	  or add other such buffer indicators.  The program also takes a long
	  time to run through lex(1).

    You might decide, therefore, to break this program up and use only one or
    two of the features in the enum processing_type list (see filter.h).
    Considering how big this file is, lex(1) does an excellent job and the
    program quite pretty robust.

    If you want to add a feature to this program, here is a brief checklist of
    things you have to do.  It's quite terse; you must really use the existing
    features as a model.

	* Create another line in both enum processing_type (defined in
	  filter.h) and the ARG_STRUCT filter array (defined in
	  site_specific.c).

	* Add an action that conditionally checks the actual_value member of
	  the filter array, and does what you want when the flag is set.
	  Remember to pass through the yytext unchanged when the flag is clear.

	* If you have to put the lexical analyzer into a new state, be sure to
	  maintain the state stack through PUSH and POP in the
	  monitor_state_stack() call.

	* Remember to change the user documentation.

    You can take out a feature to make the executable file a little smaller
    and faster, but that has dangers that complement the dangers of adding a
    feature.  So if you want to delete something, make it unavailable through
    the site_specific.c file, as described under "How to change options."

Justification for the chosen implementation of this program

    This program does two things by brute force, in lex(1) and in supporting
    functions, that most programs would use yacc(1) to do.  First, the program
    maintains a stack of lex(1) start states (the strings in <> angle
    brackets), which requires some playing around with internal lex(1)
    variables.  Second, the program does its own rather complicated expression
    evaluation for #ifdef and related directives.

    Certainly it would have been better to use yacc(1) for traditional parsing
    and stack maintenance.  But the lex/yacc model of parsing does not support
    the type of application this program represents.  Normally, lex(1)
    destroys the input or breaks it into tiny tokens so that yacc(1) can
    create a structure out of totally different material.  But this program
    passes 99% of its input through unchanged. To break the input into tokens
    would be absurd.  The alternative way to use yacc(1) would be a bit of
    hack itself: having yacc(1) parse the hard parts like #ifdef and .so
    lines, and then having yacc(1) communicate the results back to lex(1)
    through global variables.  The logistics of trying to get yacc(1) to do
    this at the right boundaries (just before lex(1) processes a newline) are
    too much of a headache.  Still, maybe someone can figure out a way to
    solve these problems someday -- and the resulting program will be shorter,
    easier to maintain, and more portable.

Known constraints and design flaws

    * Stored pathnames, for use in error messages, are restricted to 80
      characters.  This does not restrict in any way the length of a pathname
      for actual processing.  It just means that a ridiculously long pathname
      will be truncated to the first 80 characters within an error message.

    * The program enforces an unwritten rule that every #ifdef and its
      corresponding #endif must be in the same file.  By doing so, the program
      provides the user with the useful service of issuing an error message at
      the end of the file, when an #endif is missing.  The alternative -- to
      wait for the end of all input before reporting a mismatch -- would gives
      users a highly dubious enhancement of being able to put #ifdef and
      #endif in different files, strung together.

      However, unclosed #ifdef blocks can cause the wrong file name to appear
      in an error message.  That is, if file1 reads in file2 through a .so
      command, and file2 contains an #ifdef without a corresponding #endif,
      the program reports the error, but gives a line number from file1 along
      with the name of file2.  This is because the errors are reported from
      the UNWIND command of the monitor_ifdef_stack function, while filenames
      are maintained by the monitor_file_stack function.  Getting them to
      coordinate error information because of a rare user error would be far
      too much trouble.

      Report bugs or suggestions to:

		Hitachi Computer Products, Inc.
		Open Systems Software Development
		Technical Publications Group
		Reservoir Place
		1601 Trapelo Road
		Waltham, Mass.	02154

		e-mail:
		   uunet!hicomb!nancy
                   hicomb!nancy@uunet.uu.net
