SETTING UP THE CODE

This directory contains a bottom-up chart parser as described in Natural Language Understanding, 2nd edition. Once you have the files
loaded in your directory, you will need to edit the function "loadf" in 
LOADP.
The function "loadf" is used from then on to load files in the system. 
Edit the file so that it looks in the appropriate
directory.  Once this function is defined, the system can be loaded by
loadiung the LOADP file.

Note that your lisp system should be case-insensitive, as case is
not consistent in the code. This is typically the default. If not, you
can set this by (setf (readtable-case *readtable*) :upcase).


EXAMPLES FROM THE BOOK

Once the parser is loaded, there are functions that will load in the example
grammars found in the book. The load files are systematically named. The
function (loadChapter4), for instance, loads in all the grammars and lexicons 
found in chapter 4. The grammars and lexicon are stored in variables that are
named by the figure number in the text. The variable *lexicon4-4*, for instance,
gives the lexicon in figure 4.4, and *grammar4-5* gives grammar 4.5. Typically,
the main example grammar and lexicon in the chapter are pre-defined as indicated.
If you want to try other grammars listed in the text, you can define whatever
combination of grammars you wish manually using the variables. A complete list
of such functions is
   (loadChapter4)    basic grammars with features
   (loadChapter5)    questions and relative clauses
   (loadChapter7)    stochastic parsing
Chapter 9 includes semantic interpretation and is organized by section
   (loadSection9-3)  basic semantic interpretation
   (loadSection9-5)  more semantic interpretation
   (loadSection9-6)  semantic interpretation using features only
   (loadSection9-7)  sentence generation

What follows is the documentation for the parser


      ===============================================================
      ===============================================================
      ||     A LEFT-CORNER BOTTOM UP PARSER WITH FEATURES          ||
      ===============================================================
      ===============================================================


GRAMMATICAL RULES

The basic syntax for a grammatical rule is
	(<lhs constit> <rule id> <rhs constit 1> ... <rhs constit n>)
A constituent is specified in the form
	(<atomic category> (<feat1> <val1>) ... (<featn> <valn>))
Values may be atoms, other feature-value lists, or variables. A variable is 
can be written in several different forms:
        ?<atomic name> - an unconstrained variable (e.g., ?A)
	(? <atomic name>) - same as first version (e.g., (? A))
	(? <atomic name> <val1> ... <valn>) - a variable constrained to be one of 
                                    the indicated values (e.g., (? A 3s 3p))

Constituents on the left hand side may be designated as head constituents by 
enclosing them in a list with the first element being the atom HEAD.

For example, the following is the S rule
 	((s (agr ?a)) -1> (np (agr ?a)) (head (vp (agr ?a)))

This is rule has as its  left hand side an S with the AGR feature the variable 
?a, a rule identifier of "-1>", and two constituents on the right hand side: 
an NP and a VP, both with an AGR feature that must unify with the S AGR feature. 
The VP is the head feature.

GRAMMARS

A grammar is a list of rules, together with a prefix that indicates what input 
format is being used. The rules above are in what is called cat format, where 
the cat feature is not explicitly present. For example, here is a small grammar 
in CAT format.

(setq *testGrammar1*
  '(cat
    ((s (agr (? a))) 1 (np (agr (? a))) (vp (agr (? a))))
    ((np (agr (? a))) 2 (art (agr (? a))) (n (agr (? a))))
    ((vp (agr (? a)) (vform (? v))) 3 (v (agr (? a)) (vform (? v)) (subcat _none)))
    ((vp (agr (? a)) (vform (? v))) 4 
          (v (agr (? a)) (vform (? v)) (subcat _np)) (np))))

The Head feature input format allows the user to specify head features for each 
category. If this format is used, every rule should have at least one head on 
the left hand side. The specification of headfeatures for the grammar using the 
form
      (Headfeatures (<cat1> <feat1.1> ... <feat1.m>) ...(<catn> <featn.1> ... 
<featn.k>)).
Here is the same grammar as *grammar1*, but in head feature format 

(setq *testGrammar2*
     '((headfeatures (VP vform agr) (NP agr))
        ((s (agr (? a))) 1 (np (agr (? a))) (head (vp (agr (? a)))))
        ((np) 2 (art (agr (? a))) (head (n (agr (? a)))))
        ((vp) 3 (head (v (subcat _none))))
        ((vp) 4 (head (v (subcat _np))) (np))))

These two grammars would generate exactly the same grammar in the internal 
format.

For improved tracing (and for use in later extensions), you should declare all 
lexical categories by setting the variable *lexical-cats*. This variable is 
preset by the system to the following, so if this covers all your categories, 
you need not do anything.

(setq *lexical-cats* 
      '(n v adj art p aux pro qdet pp-wrd name to))

LEXICON FORMAT

A lexicon consists of a list of word entries of form
     (<word> <constit>)
where the constit is in abbreviated format as described above
Here's a sample lexicon

(setq *Lexicon1*
  '((dog (n (agr 3s) (root dog)))
    (dogs (n (agr 3p) (root dog)))
    (pizza (n (agr 3s) (root pizza)))
    (saw (v (agr (? a1)) (vform past) (subcat _np) (root see)))
    (barks (v (agr 3s) (vform pres) (subcat _none) (root bark)))
    (the (art (agr 3s) (root the)))))

DEFINING AND INSPECTING THE ACTIVE LEXICON AND GRAMMAR

The following functions are provided to define which grammars and lexicons are 
to be used by the parser. These functions also convert the input formats into 
the internal formats. For defining the grammar, there are two functions. One 
redefines the active grammar to a new grammar while the other adds additional 
rules to the active grammar.

(make-grammar *testGrammar1*) - defines the active grammar to the expanded 
                          version of *grammar1*

(augment-grammar *grammar2*) - adds the rules in *grammar2* to the active 
                          grammar.

There are two functions for accessing the active grammar:

(get-grammar) returns the complete active grammar

(show-grammar) prints the grammar out in a slightly better format.
             This may take an number of aoptional arguments, in
             which case it prints just the rules whose ID matches
             one of the arguments. e.g., (show-grammar '-s1> '-s2>)
             would print out just those rules with ID -s1> or -s2>.

For the lexicon, a similar set functions is provided

(make-lexicon *lexicon1*) - defines the active lexicon to be the expanded 
                          version of *lexicon1*

(augment-lexicon *lexicon2*) - adds the lexical entries in *lexicon2* to the 
                          active lexicon

There are two functions for accessing the active lexicon

(get-lexicon) returns the complete active lexicon 

(defined-words) returns a list of all the words defined

To examine the lexical entries for a particular word, just call the
parser on that word and then inspect the chart.

RUNNING THE PARSER

The parser is called using the function BUparse with a list of words as its 
argument, e.g.,

  (BU-parse '(The dog saw the pizza))

SEARCH  OPTIONS

You can choose whether the parser should stop when the first sentence is found
or whether to find all interpretations:

(set-find-all) - find all interpretation (the default)
(set-best-only) - stop when the first complete interpretation is found


TRACING FUNCTIONS

There are two levels of tracing provided:

(traceon) calling this will cause concise trace of each constituent as it is 
          added to the chart (the default)
(verboseon)  calling this traces each arc extension as well.

(verboseoff) this stops the extended tracing (actually it is identical
             to calling (traceon) 

(traceoff) disables tracing

Individual rules can be traced using the function 

(trace-rule <id>) - traces all activity involving the rule with 
             the indicated id. For example (trace-rule '-5-8-1>) 
             would produce a trace message whenever an arc derived
             from rule -5-8-1> was generated or extended.


DISPLAYING THE CHART

There are several ways to access the chart built from the last parse:

(show-chart) prints out the complete chart, showing all chart entries as
               they were built bottom-up.

(show-best) - prints out a list of the best constituents found,
              where "best" is measured by length of constituent.
              If the parse is successful this is, of course, the
              constituents that span the entire sentence. The function
              takes optional args indicating the start and end position.
              e.g., (show-best 3) prints out the best constituents
              from position 3 to the end of the sentence, and 
              (show-best 2 8) would show the best constituents
              between positions 2 and 8.

(show-answers) like (show-best) except that it prints out  the whole
               parse tree for the best constituents. This function also
               binds the variables in the subconstituents as it
               prints giving a better view of the overall parse tree 
               than the full chart where the constituents are shown
               as they were defined bottom-up. The function also take
               optional argument just like (show-best).

EXAMPLE SESSION

To see the system in action, try the following with the variables as defined
above.
(make-grammar *testGrammar1*)
(make-lexicon *lexicon1*)
(BU-parse '(The dog saw the pizza))
(show-chart)
(show-answers)

GAPS

To use the GAP feature, you must enable it using the function
(enableGaps). This will make the parser insert the appropriate gap
feature propagation in all rules that do not have an explicit GAP feature
defined. The value of the gap feature is an embedded constituent. To
indicate a that a feature value is a constituent, a special flag "%" is
used. Thus the rule for a simple wh-question might be as follows:

        ((s)
        -5-8-3>
        (np (wh q) (gap -) (agr ?a))
        (head (s (inv +) (gap (% np (agr  ?a))))))

i.e., the value of the gap feature is a constituent of category NP,
with an AGR feature set to the value ?a.

DEBUGGING GRAMMARS

One nice feature of bottom up algorithms is that they allow for some effective 
debugging strategies. If you find that a sentence doesn't parse that you think 
should. Try each of the constituents one at a time and see what the parser 
produces. If you don't get the right analysis, then try additional subparts of 
the sentence until you find the answer. For example,. say you try
   (bu-parse '(the angry man ate the pizza))
and it doesn't produce a complete interpretation. You might then try
    (bu-parse '(the angry man))
and see if this produces an appropriate NP analysis. Say it does. Then try
    (bu-parse '(ate the pizza))
ands see if this produces the appropriate VP analysis. If it doesn't, then 
there's probably a problem with your VP -> V NP rule, or your lexical entry for 
"ate". If it does produce the right interpretation, then the problem must be 
with your S -> NP VP rule (e.g., maybe the rule is missing, or maybe a feature 
equation is wrong, etc.).


STOCHASTIC PARSING

The existing chart parser can also be used as a stochastic parser.
Details on doing this can be found in the README file in the directory STAT.


NOTES:
Please report all bugs and suggestions for improving the system or the documentation to James Allen at james@cs.rochester.edu.
