Implementation notes:



Specification as-immplemented
-----------------------------


The [Disclaimer] text block has a recommended maximum of no more than 96 lines.
This is neither checked nor enforced.


The [Redistribution Notes] text block has a recommended maximum of no more 
than 24 lines.  This is neither checked nor enforced.


The ICM Family Description] text block has a recommended maximum of no more
than 4 lines.  This is neither checked nor enforced.


Unless specifically restricted by the specificiation, as is the case with
a 'node name', user defined identifiers may use a sequence of any legal 
dark printing character, bounded by white space.  Thus, '*&^%$' is a 
legal name, and would be accepted as a 'model name'.  Why anyone would 
actually want to do this is quite beyond me, but by my reading of the 
specification it is not actually ruled out.


The parser requires a Model_pinmap at both ends of the main path in a 
[Tree Path Description].  It is not clear to me that this is required by 
the specification or not, but it seems to be.  The first Model_pinmap is
expressly required, the second is implied by the wording "A second 
Model_pinmap is used at the end of the path description to referene the
pin map used for the other end of the interconnect".  Unlike the first,
this wording does NOT say it is required, but seems to make a pretty clear
statement that it is there.


Fork/Endfork blocks are allowed to nest within enclosing Fork/Endfork
blocks in a [Tree Path Description].  There is no wording to prohibit
this.  The wording about Forks is that "The sections (zero or more) 
between the Fork and Endfork subparameters are connected together as in 
any other path description."  Since the main path description expressly 
allows the use of Forks, I conclude that Fork paths also allow the use 
of Forks.  The parser permits Fork/Endfork nesting to any depth, and
therefore the construction of a generalized binary tree of interconnected
paths.  Naturally, the parser's internal representation of this structure
is also a binary tree.


SLM_* Modes must use matrices that consist solely of diagonal elements.
The parser does not enforce this by limiting the representation of these 
matrices to the "Diagonal_matrix" format.  Instead, it permits any 
representation to be used, and then expressly tests the resulting matrix 
for diagonality.

The various technical notes concerning the values of RLGC matrices from
pages 54-59 are not tested.  These comments are either assertions on
the meaning of the data, or require action by a simulator to evaluate 
properly.

The [End ICM Section] requires a test string argument.  This is probably
a hold-over from earlier revisions of the specification, when this
value was required to match the Section name given on the [Begin ICM Section]
keyword.  Since it's in the spec, the parser does require that this
string be present, but will only warn if it does not match.


Technical
---------

The parser is implemented in two parts; a lexical analyzer implemented
in LEX, and a parser implemented in YACC.

These tools are not ideally suited to the ICM language, as the language
itself is far from regular.  Fundamentally, this results in the lexical
analyzer needing to know far more about the structure of the language than 
it might otherwise.  Ideally, the lexical analyzer should be stateless.

This one is not stateless;  it does retain a certain limited knowledge
of what it has recently seen, particularly  with regard to what the last
keyword was, and thus what it's legitimate subparameters are.  

It also possesses a number of token processing substates to decode several
kinds of values such as numbers, strings, and names of various types.
The lexical analyzer cannot, however, readily determine in all cases
when these substates are appropriate to the current context.

There are a lot of shift/reduce and some reduce/reduce conflicts
in the YACC grammar.  Nearly all of them arose when an extensive series of
error recovery points were added to the grammar.  The actions that
FLEX takes to resolve these are indeed the ones I intend.  I see no 
good way to get rid of these conflicts without sacrificing a great deal 
of error recovery capability.  Suggestions welcome.


Hinting
-------

The analyzer does not maintain extensive knowledge of the contextual 
state of the parse.  What it does retain it little more than 'the current 
keyword is X', 'the current subparameter is Y', and a hint regarding the
possible nature of the next token.  This limited information is entirely 
inadequate to parse complex structures such as tables, multi-line lists, 
and optional arguments.

The parser possess much richer contextual state, and is capable of
making these sorts of distinctions.  Based on it's context, the parser
may drive hints to the analyzer whenever the analyzer's default behaviour
is inadequate to a particular situation.  The hint results in the lexical 
analyzer pushing itself into a substate to try to find a token of the 
hinted type.  This may fail.  If so, the analyzer pops its state and 
reverts to it's default interpretation of the input stream.

Of course, this will result in a parse error when an unexpected token
is returned to the parser.  The parser has to be prepared to deal with
this condition.  But then, that was always it's task in the first place.

For the most part, this works quite well, and severely limits the amount
of state that the lexical analyzer must retain.  Unfortunately, there
are some circumstances in which it's rather difficult to ensure that
correct hinting occurs to the lexical analyzer before the next token
is fetched.  

In particular, whenever the parser needs to obtain a 'look ahead' token 
to determine what state it should enter next, hinting becomes problematic
as the parser simply fetches the look-ahead token without regard to 
any hinting.  If you think about it, that is all it can really do.  There
is no way to hint if a choice must be made among several possibilities
based on what comes next.

This is suprisingly rare, though.  It's dealt with by coercing an
appropriate hint in advance of such a decision point.  Coercion is 
accomplished by embedding in-line hint 'actions' directly within the 
ruleset at the appropriate points.  Fortunately there aren't very many 
places where this sort of ad-hoc hackery is necessary.

For the most part, all the necessary hints can be emitted automagically
by simply defining targets such as 'number', which emit a hint and then 
immediately fetch the required token.  There are only a handful of such 
rules that are needed; the rest of the ruleset simply uses them whenever
it's appropriate to do so.

If the parser and the lexical analyzer should ever disagree as to what type 
of token to expect next, the parser obviously wins.  The analyzer sets 
it's internal hint before returning to the parser; the parser establishes 
it's hint, if any, before calling the analyzer again, overriding
whatever the analyzer may have pre-established.

Certain tokens can never be overridden regardless of the presence of any
hints.  For example, [keywords] are always recognized as such regardless 
of what else may be going on.


regarding the parse ruleset
---------------------------

Trade offs have been made within the parser to maximize both the
clarity of error messages, and the ability to recover from syntax errors.

While it is possible to construct a ruleset that rigidly expresses the
allowed syntax of the ICM language, the result is that the parser will
usually give up on the first error, or worse, like a compiler, generate 
nonsense errors for every remaining line of the input file as it struggles
to find some match with it's ruleset.

As a result, the ruleset as implemented is primarily expressed as iterators 
that gather up sections of the ICM input file, and expressly check whether 
or not a particular element is legitimate within a given section.

For example, the first iterator breaks down the ICM input into it's header,
the models, and the sections.  The header is then broken down by another
iterator looking for the various legal elements of a header.  Finally
a validation routine tests the proposed header to ensure it is proper, that
everything is there that should be there, and that any necessary ordering
rules were observed while it was being assembled.  Similar tactics are
employed with the ICM models and sections.  

The result is that the parser can more easily resyncronize itself when a 
syntax error is detected, and it can recognize more of the input file 
even if some input must be discarded.


String token values
-------------------

The lexical analyzer maintains a table of strings that it constructs on the
fly.  This table is stored in a balanced b-tree.  The routines for doing this 
are general purpose, but are easily capable of storing millions of strings
and can locate the stored copies extremely rapidly.

Only ONE UNIQUE COPY of any string is ever stored in the b-tree.  Attempts
to store a string redundantly simply result in the lookup and return of 
the pointer to the previously stored string.

Whenever a string value is returned by the analyzer, it actually returns the 
pointer of the stored string.  The parser may depend on the stability of 
this pointer;  it will never be deleted.  Of course, the parser should neither
free nor modify these strings in any way.  They do not belong to it.

This allows the parser to treat string pointers as if they are symbolic IDs.
It is legitimate for the parser to compare two occurance of a string for
equality by simply checking if they both in fact possess the same pointer.  

Routines are provided for the parser to register it's own strings with the
analyzer as well so that they too may be used in this manner.  This is 
exploited in a number of places.

For example, the parser may check that a returned string token is 'Yes' by 
looking up the 'Yes' in the analyzer's symbol table and then determining if
the address returned is in fact the same as the token in question.  
This type of comparison is not an error, although admittedly it does look
strange if you don't understand what's going on, as the first impulse is
to use strncmp().
