STOCHASTIC PARSING

The existing chart parser can also be used as a stochastic parser.
The probabilistic information required is stored in four tables
containing 
    the bigram probabilities
    context independent rule probabilities
    context dependent rule probabilities
    lexical generation probabilities

A sample set of tables can be found in the file "stats",
which contains information derived from the examples in Chapter 7.
The function "loadChapter7" loads a lexicon and grammar that is a 
superset of grammar 7.17, plus these statistics. With the file
loaded you can then select the parsing mode as follows:

    (use-CF-probabilities) - parse using the context independent rule probabilties
        and lexical probabilities generated by the forward algorithm (i.e.,
        the parser described in Section 7.5.
    (use-CS-probabilities) - parse using the context dependent rule probabilities
        as described in Section 7.7.
    (no-probabilities) - parse without probabilistic information (default mode)

As before, you can also select whether to stop as the first parse found with (set-best-only)
or find all interpretations with (set-find-all), which is the normal default.

TRAINING GRAMMARS

Once you have a grammar and lexicon defined, you can create training
data by parsing sample sentences by hand and saving the correct interpretations
in a file. For example, the sentence
  Jack is a man.
has two readings according to your grammar:

(show-answers)

 THE COMPLETE PARSES FOUND
S33:<S ((INV -) (VFORM PRES) (AGR 3S) (1 NP26) (2 VP32))> from 0 to 4
  NP26:<NP ((AGR 3S) (1 NAME23))> from 0 to 1
    NAME23:<NAME ((LEX JACK) (AGR 3S) (ROOT JACK1))> from 0 to 1
  VP32:<VP ((VFORM PRES) (AGR 3S) (1 V23) (2 NP30))> from 1 to 4
    V23:<V ((LEX IS) (ROOT BE1) (VFORM PRES) (AGR 3S))> from 1 to 2
    NP30:<NP ((AGR 3S) (1 ART23) (2 N25))> from 2 to 4
      ART23:<ART ((LEX A) (AGR 3S) (ROOT A1))> from 2 to 3
      N25:<N ((LEX MAN) (ROOT MAN1) (AGR 3S))> from 3 to 4

S31:<S ((INV -) (VFORM PRES) (AGR 3S) (1 NP26) (2 VP31))> from 0 to 4
  NP26:<NP ((AGR 3S) (1 NAME23))> from 0 to 1
    NAME23:<NAME ((LEX JACK) (AGR 3S) (ROOT JACK1))> from 0 to 1
  VP31:<VP ((VFORM PRES) (AGR 3S) (1 V23) (2 NP31))> from 1 to 4
    V23:<V ((LEX IS) (ROOT BE1) (VFORM PRES) (AGR 3S))> from 1 to 2
    NP31:<NP ((AGR 3S) (1 N24) (2 N25))> from 2 to 4
      N24:<N ((LEX A) (ROOT A1) (AGR 3S))> from 2 to 3
      N25:<N ((LEX MAN) (ROOT MAN1) (AGR 3S))> from 3 to 4

The intended meaning is S33.

The function WRITE-TREE will print out the tree in a format suitable
for reading by the training process.

? (write-tree 's33)

(S -7-17-1> ((INV -) (VFORM PRES) (AGR 3S) (1 NP26) (2 VP32))
  (NP -7-17-10> ((AGR 3S) (1 NAME23))
    (NAME JACK1 ((LEX JACK) (AGR 3S) (ROOT JACK1))))
  (VP -7-17-3> ((VFORM PRES) (AGR 3S) (1 V23) (2 NP30))
    (V BE01 ((LEX IS) (ROOT BE1) (VFORM PRES) (AGR 3S)))
    (NP -7-17-8> ((AGR 3S) (1 ART23) (2 N25))
      (ART A1 ((LEX A) (AGR 3S) (ROOT A1)))
      (N MAN1 ((LEX MAN) (ROOT MAN1) (AGR 3S))))))

The file "corpus" contains a set of trees for the grammar provided 
for chapter 7. The trees are in a list assigned to the global variable *corpus*.

The actual probabilities tables are computed using the function DOCORPUS, which
takes one argument, a list of trees to train on.

e.g., (DOCORPUS *corpus*)


(DUMP-STATS) prints out the statistics in a form that could be
used to reset the tables later. To be useful, of course, you would
want to dump these stats out to a file.


