Skip to content

Latest commit

 

History

History

cmn

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Mandarin Chinese Grammar 

Authors: Nat Hillard, Dan Flickinger, Chunlei Yang

Date:   13-Oct-09, CSLI, Stanford University
First version under revision control

16-July-11
Completed analysis of full MRS test suite

------------------------------------------------------------------------------
LinGO Grammar Matrix

$Id: README,v 1.13 2008-05-23 01:44:21 sfd Exp $

NB: This version of the Matrix requires LKB version
$Date: 2008-05-23 01:44:21 $ or later.  

Changes since v 0.6:

1. Head types

   We had previously hesitated to posit head types, as we
   expect the exact subhierarchy under the type head to be
   language-specific, even for those head types that are
   found cross-linguistically.  However, this hesitation
   was hampering our ability to develop lexical types for
   the Matrix.  In this version, we have included 10 basic
   head types, as well as types for all possible
   disjunctive combinations for the basic 10.  See further
   documentation near 'head' in matrix.tdl.  Note that the
   disjunctive types are stored in a separate file
   head-types.tdl.

   We still have not posited any head features.  In order
   to add features to one of the head types (disjunctive
   or otherwise) already defined in the Matrix, you'll
   need to use the new 'type addendum' syntax.  (This is
   why a current LKB is required.)  Type addenda can be
   used to add information (not override information)
   associated with previously defined types.  They have :+
   in place of :=.  Type addenda can add parent types,
   constraints, or documentation strings, as long as they
   are consistent with the existing definition.  As with
   subtypes, it is good practice to refrain from stating
   redundant information on type addenda.  The LKB will
   print an error if a parent is declared redundantly.  No
   such checking exists for constraints.
	
   NB: The :+ syntax is not yet supported in the PET
   system.  

2. Lexical types (mostly from v 0.7)

   The Matrix now has a set of underspecified lexical types.
   Lexical entries are classified along multiple dimensions,
   including part-of-speech and subcategorization.  Particular
   grammars can cross-classify these types to create the
   lexical types they need.  We're particularly interested to
   learn about cases where additional types parallel to those
   posited in this part of the Matrix are required.

3. anti-synsem

   The type anti-synsem is still present, but no longer explicitly
   used in the Matrix.  (In previous versions of the Matrix,
   the mother of the head-subj phrase was SUBJ < anti-synsem >
   instead of SUBJ < >.  This and similar constraints are part
   of English-specific analyses.)

4. COMPS and the head-subject phrase

   In response to Ellingsen 2003, we have removed the requirement
   that complements be realized before subjects cross-linguistically.
   We expect this to be true in many, but not all, languages.

5. MC and ROOT

   Again in response to Ellingsen 2003, we have removed many
   of the constraints on the value of MC (`main clause'). 
   This feature is meant to be used for phenomena which are
   restricted to either main (MC +) or subordinate (MC -) 
   clauses.  The particular constraints present in earlier versions
   of the Matrix were part of English-specific analyses.

   The feature ROOT has been removed.  Its intended purpose
   was redundant with the combination of MC and root conditions.

6. Changes to the script

   The script (lkb/script) has been changed somewhat, including:

	-- (index-for-generator) is run at load time,
	since most Matrix grammars are small and since
        it is now relatively easy to create grammars which
        can be used for parsing and generation.  We have 
        found that the generator can be very useful in 
        detecting flaws in a grammar.  For a grammar with
        a large lexicon, this may be too slow.  If so,
        just comment it out in the script.

	-- There is a short expression at the end which
    	can be used to change the default string (or indeed
	list of strings) in the parse dialogue to something
	appropriate for your language.  Uncomment this expression
	and customize it appropriately.  

	-- The default script no longer uses a cache file for
 	the lexicon.  For lexicons with >1000 words, the 
	cache may be a good idea.  See the comments in the
	script file for how to invoke caching.

7. labels.tdl

   With the addition of head types, we are now able to include
   a basic set of node labels.  As noted in labels.tdl, these
   are provided only for the convenience of the grammar developer,
   and do not have any theoretical status.  As such, even though
   they are provided as part of the Matrix distribution, they
   should be customized (edited) without hesitation.

8. Other minor technical changes

   The value of INSTLOC is now string (formerly "instloc") for
   compatibility with recent versions of the LKB.

   The head daughters of head-modifier phrases are no longer
   required to have empty COMPS lists.  Furthermore, additional
   features are matched between modifiers' MOD values and 
   the head daughters' SYNSEMs.

   The feature ARG-S has been moved from the type local
   to the type word-or-lexrule and renamed as ARG-ST to
   differentiate it a bit better from ARGS.

9. Some semantic types (adv-relation, prep-mod-relation, 
   verb-ellipsis-relation, and unspec-compound-relation)
   have been removed, as these types did not add any features
   nor express any constraints.  In their place, existing
   argn relations should be used, with appropriate PRED values.
   adv-relation and verb-ellipsis-relation should be replaced
   with arg1-ev-relation, prep-mod-relation with arg12-ev-relation
   and unspec-compound-relation with arg12-relation.

------------------------------------------------------------------------------

LinGO Grammar Matrix v 0.6, October 15, 2003 (dpf)

This is a minor tuning of version 0.5, including a refinement of the KEYS
attributes, more normalization of predicate names (especially for messages),
some bug fixes in the syntactic rule schemata, and a few additional lexical
types.  Details on these changes will be available in the soon-to-be-released 
Matrix Users' Guide.

For those who have already developed a grammar baaed on the Matrix, the
following changes will have to be made manually in your language-specific
files in order to make them consistent with this version:

1. Changes in feature geometry

   a. SYNSEM.LOCAL.LKEYS ==>
      SYNSEM.LKEYS

      The feature LKEYS has been moved up from LOCAL to SYNSEM, to shorten
      this frequently mentioned path.  (NB: As a related change, the type
      'local-basic' which formerly introduced LKEYS has been deleted.)

   b. LKEYS.--KEYREL ==>
      LKEYS.KEYREL

      LKEYS.--ALTKEYREL ==>
      LKEYS.ALTKEYREL

      These two attributes are the only pointers to relations in the RELS
      list for lexical types, and since they are not shortcuts, the leading
      hyphens have been dropped.  (NB: Since the attributes --COMPKEY and
      --OCOMPKEY are just shortcuts, the leading hyphens for these two names
      remain as a reminder).

   c. CAT.HEAD.KEYS.MESSAGE ==>
      CONT.MSG

      Since the message value of a headed phrase is not always identified 
      with that of its head daughter, it was an error to make the attribute
      MESSAGE a head feature.  This attribute is now moved to CONT, and its
      name shortened for convenience to MSG.


2. Strings and symbols: RULE-NAME
   
   The type 'symbol', which along with 'string' was a subtype of 'atom', has
   been dropped, since the distinction between symbols and strings is not
   useful, and was a source of potential confusion.  So any attributes whose
   values were of type 'symbol' should be changed to be of type 'string',
   and values assigned to these attributes should be converted accordingly.
   In particular, in subtypes of 'rule', the value of the attribute RULE-NAME
   should be changed to be enclosed in double quotes; e.g.
          [ RULE-NAME 'subj-head ] ==>
          [ RULE-NAME "subj-head" ]

3. KEYS attributes

   The attributes KEY and ALTKEY can have as values subsorts of the type
   'predsort' (the same kinds of values allowed for the attribute PREDSORT
   in semantic relations).  These attributes enable a word or phrase to be
   semantically selected by a predicate, and as head features they propagate
   up from the lexical head of the phrase.  For example, a verb can select
   for a prepositional phrase headed by a particular preposition, as long as
   the preposition has lexically assigned a specific value (a subtype of
   predsort) to its SYNSEM.LOCAL.CAT.HEAD.KEYS.KEY attribute, and the verb 
   similarly constrains the KEY value of its PP complement (accessed via the 
   SYNSEM.LKEYS.--COMPKEY or --OCOMPKEY of the verb).  Note that the values
   of KEY and ALTKEY are of the same type as the values of the PRED attribute
   within semantic relations, but it is not always the case that the KEY
   value of a sign is identified with the PRED value of one of the relations
   in its RELS list.  Typically, closed-class lexical entries may identify
   KEY and PRED values, but open-class lexical entries won't, since their
   KEY value will be some underspecified subtype of 'predsort' (e.g.
   'noun_rel' or 'verb_rel')

4. Relations and messages

   The revisions introduced in version 0.5 for improved MRSs have led to a
   potential confusion in naming of relations and their PRED values, so we
   introduce a simple naming convention where all subtypes of the type
   'relation' bear the suffix "-relation" as part of their name, and all
   values of the attribute PRED (subsorts of 'predsort') within relations 
   bear the suffix "_rel" as part of their name.

   In keeping with the reduction to a small number of subtypes of 'relation',
   the value of the attribute MSG is now always the relation subtype 'message'
   with appropriate values in the PRED attribute of the 'message' relation,
   drawing from subtypes of the type 'predsort'.  For example, in the type
   'imperative-clause', the following change has been made: 

   imperative-clause := clause &
     [ SYNSEM.LOCAL.CAT.HEAD.KEYS.MESSAGE command ].    ==>

   imperative-clause := clause &
     [ SYNSEM.LOCAL.CONT.MSG.PRED command_m_rel ].

5. Rules

   This version incorporates several corrections and improvements to the
   definitions of lexical and syntactic rules proposed by colleagues working 
   on the Japanese and Norwegian grammars, as follows:

   a. In the definition of 'lex-rule', the order of appending of the RELS 
      lists has been reversed, for convenience.
   b. The type 'basic-head-subj-phrase' no longer inherits from the type
      'head-compositional' - this was an error preventing coherent MRSs.
   c. The type 'basic-extracted-comp-phrase' no longer identifies the LEX value
      of mother and daughter - this too was an error making the rule unusable.
   d. The type 'basic-head-mod-phrase-simple' no longer identifies the value
      of HOOK on mother and nonhead daughter, since this is no longer uniform
      for scopal and intersective modifiers.  Instead this identification is
      done in the type 'scopal-mod-phrase'; in contrast, the type
      'isect-mod-phrase' now inherits from 'head-compositional', identifying
      the HOOK values of mother and head daughter.
   e. In a related change, the type 'extracted-adj-phrase' is now restricted
      to extracting intersective modifiers, so that the value of HOOK can be
      correctly constrained.

6. Lexeme types

   In order to capture the usual configuration of semantic constraints for
   open-class lexical entries, the types 'lex-item', 'norm-lex-item', and
   'lexeme' have been added.  Some closed-class lexical entries, like those
   for determiners in English, do not conform to the constraints in 
   'norm-lex-item', but most lexical entries will.  We further add the
   constraint that the outputs of lexeme-to-lexeme rules will conform to the 
   constraints in 'norm-lex-item'.  We look forward to feedback, as always.
  
------------------------------------------------------------------------------
Grammar matrix v 0.5, August 15, 2003 (dpf)

This is an upgrade of version 0.4 of the grammar matrix, with some
further normalization of relation names and MRS feature geometry to be
consistent with the Copestake et al.  paper, "Introduction to MRS",
being readied for publication.

If you have already developed a grammar based on the matrix, you will
need to make at least one set of manual adjustments to your
language-specific grammar files, since the location of the KEYS
attribute has changed, and the constraints on its attributes have also
changed.  The KEYS attributes had been used in matrix-derived grammars
for two distinct purposes, first to simplify the notation when defining
lexical types, and second to express constraints on semantic selection
within phrases.  The first usage was a convenient shorthand notation
which is irrelevant to phrasal signs, while the second is crucial in
constraining phrases.  These two notions are now distinct in the
matrix, with the attribute LKEYS now containing these 'shorthand'
attributes convenient for defining lexical types, and the attribute
KEYS now made a HEAD feature.  The attributes in KEYS are also more
strictly constrained, with KEY and ALTKEY no longer taking whole
relations as values, but only semantic sorts (see the User Guide for
elaboration).  Likewise, the MESSAGE attribute now simply takes a
'message' type (or the distinguished type 'no-msg') as its value,
rather than a difference list.

Obligatory changes to make to language-specific grammar files:

(1) Where your grammar used the KEY and ALTKEY attributes to constrain
    the properties of a selected constituent (complement, specifier, 
    subject, or modifier), change these values of KEYS.KEY and 
    KEYS.ALTKEY to be subtypes of the type 'semsort'.  See the User 
    Guide for elaboration.
(2) Change these paths for SYNSEM.LOCAL.KEYS.KEY and ...ALTKEY to be
    SYNSEM.LOCAL.CAT.HEAD.KEYS.KEY and ...ALTKEY
(3) Where your grammar used the KEY and ALTKEY attributes to constrain
    the value of a lexical type's own semantic relations, change these
    paths for SYNSEM.LOCAL.KEYS.KEY and ...ALTKEY to be
    SYNSEM.LOCAL.LKEYS.--KEYREL and ...--ALTKEYREL
(4) Change the value of KEYS.MESSAGE by removing the diff-list brackets.
(5) Change the paths SYNSEM.LOCAL.KEYS.MESSAGE to
    SYNSEM.LOCAL.CAT.HEAD.KEYS.MESSAGE
(6) Change the values for --COMPKEY and --OCOMPKEY to be the semantic
    sort of the relevant complement, rather than the type of a relation
    (again, see the User Guide for elaboration of semantic sorts).
(7) Change the paths SYNSEM.LOCAL.KEYS.--COMPKEY and ...--OCOMPKEY to
    SYNSEM.LOCAL.LKEYS.--COMPKEY and ...--OCOMPKEY

In addition, you may need to make further adjustments, depending on
whether you have made explicit reference to the affected features or
types, which have been changed as follows:

(a) Deleted feature

    The feature E-INDEX was introduced into the matrix for v 0.4,based 
    on its use in the ERG at the time for treating the semantics of 
    predicative PPs and gerunds.  However, improved analysis of English 
    has removed the current motivation for this attribute in HOOK, so 
    it has been deleted from the matrix in order to be consistent with 
    the emerging MRS documentation.

(b) Renaming of type 'mrs-thing', and changes to its subtypes

    The name of the supertype of 'individual' and 'handle' has been 
    renamed from 'mrs-thing' to 'semarg' (for 'semantic argument').  
    Also, one of its subtypes 'non-expl' has been deleted, since it was 
    confusingly redundant with the type 'event-or-ref-index'.  
    Corresponding adjustments have been made to the type hierarchy under 
    'semarg', though the leaf types remain the same.

(c) Renaming of other relations

    To support a more consistent naming convention for relations, any 
    relation or predicate whose name formerly ended in "-rel" now has a 
    name which is like the previous one except that the hyphen ("-") is 
    always replaced with an underscore ("_").  An explanation of the 
    naming conventions can be found in the Matrix User Guide.

------------------------------------------------------------------------------
Grammar matrix v 0.4, March 10, 2003

This is a minor upgrade of the first version of the grammar matrix
(v 0.3), designed to standardize the feature geometry and naming
conventions for MRS feature structures, and to enable stronger
principles of semantic composition, as presented in Copestake,
Lascarides, and Flickinger (2001).

If you have already developed a grammar based on the matrix, you will
need to make the following manual adjustments to your language-specific
grammar files:

(1) Renamed features 
    Summary: Naming conventions now made consistent with soon-to-be-published
    standard reference on MRS.
    Recommended procedure: Do a global replace for each of the following in
    all of your *.tdl files:
    LISZT    -->  RELS
    H-CONS   -->  HCONS
    TOP      -->  LTOP
    HNDL     -->  LBL
    SC-ARG   -->  HARG
    OUTSCPD  -->  LARG
    SOA      -->  MARG
    RESTR    -->  RSTR
    BV       -->  ARG0
    EVENT    -->  ARG0
    INST     -->  ARG0
    LABEL    -->  WLINK

(2) Introduction of HOOK attribute
    Summary: The externally visible attributes of an MRS are now grouped
    within a single attribute called HOOK, which is consistently used in
    constructions to identify the properties of the semantic head daughter
    with those of the phrase.  The features in HOOK include the familiar
    LTOP (formerly TOP), INDEX, and E-INDEX, as well as a new feature XARG
    which is unified with the semantic index of the controlled argument of
    a phrase (to simplify the definition of e.g. equi and raising types)
    Recommended procedure: In each of your *.tdl files, search for each 
    occurrence of the three features LTOP, INDEX, and E-INDEX, and insert 
    HOOK into the path preceding each feature.  In some cases, you will see 
    that you can simplify the re-entrancies in your feature structures by 
    referring to HOOK instead of individually referring to each of the three
    attributes separately.  In addition, consider revising your lexical types 
    for equi and raising predicates to make use of the new XARG feature, 
    which should enable you to avoid reference to arguments of arguments.
    
(3) Naming of argument roles (ARG1, ARG2, ARG3, ARG4)
    Summary: Each relation now assigns its first (least oblique) argument
    to ARG1, its next argument to ARG2, and so on.   The major change from
    the first version of the matrix is to assign objects of transitive verbs
    to ARG2 rather than ARG3, and similarly for objects of prepositions.
    Recommended procedure: In each of your *.tdl files, search for ARG3, and 
    consider replacing it with ARG2.  Check all other role name assignments 
    to ensure that role names are assigned consistently.

(4) Basic relation types
    Summary: The inventory of basic relation types has been simplified.
    Recommended procedure: Review the subtypes that your grammar defined for
    the original basic relation types, and revise them to employ the new
    relation types, consistent with the changes made in step (3) above.  Note
    that a basic relation type has been added for quantifiers: quant-rel.

(5) Deleted features (--TOPKEY)
    Summary: Some semantics-related features proved to be unnecessary
    Recommended procedure: None required, unless your grammar makes use of
    the feature --TOPKEY, in which case you may choose to introduce this
    feature as part of your language-specific inventory of features.


------------------------------------------------------------------------------
[Original notes for v 0.3]

This is an extremely preliminary first cut at the grammar
matrix.  It has not been tested except by being loaded into
the LKB.  It contains the following:

-- basic types which define the feature geometry
-- types for MRS semantics
-- underspecified supertypes of lexical rules
-- underspecified supertypes of phrase structure rules

Of these, the last were the most hastily thrown together.
They are basically taken from the syntax.tdl file of the LinGO
English grammar, and then simplified by removing constraints
that are either likely to be specific to English or are
related to the LinGO analysis of coordination.

These phrase structure rule types are of necessity underconstrained
and merely instantiating them will surely lead to a grammar
with gross overgeneration.  Thus, it is expected that they
will be either augmented directly, or via subtypes that fill
in some of the missing constraints.  One clear example of this
is the lack of constraints on the HEAD values in phrase structure
rules.  Since it's not clear what the appropriate 'universal'
head type hierarchy will/could be, I've refrained from even 
defining types like 'verbal'...

Similarly, certain parts of the type hierarchy might need to be
modified.  In an ideal world, the matrix type hierarchy would only
need to be extended at the bottom for individual grammars.  However,
it is not clear that this is possible or desirable even in principle,
and it is certainly not the case for this preliminary first version!
(Defaults may help here...)

The single biggest gap in the matrix is the utter lack of
lexical types.  I hope that it can be useful even with this
huge lacuna.  Since the matrix holds so closely to the LinGO
grammar, the lexical types of the LinGO grammar should be 
used as models for creating lexical types.  Note in particular
that the rules assume lexical threading of NON-LOCAL features.
Beware that some feature names (notably HNDL) and many type names
differ between the matrix and the LinGO grammar, even when they
are logically and mnemonically related.

Future versions of the matrix should include further documentation
as well as more types (especially lexical types).  Revisions to
the types included in this version should also be anticipated, 
since it seems extremely unlikely that this first guess as to
what's universally useful will turn out to be entirely correct.

Stephan Oepen has kindly cleaned up the collateral .lsp files
included in this distribution of the matrix.  Take a look at
lkb/script for information on how various files (including .tdl
files) are included, and which aspects of the grammar should
be encoded in which .tdl files.

April 5, 2002 (erb)

Added improvements to supertypes for lexical rules.  "Derivational"
lexical rules are now lexeme-to-lexeme, rather than word-to-word
(a hold-over from PAGE).  Lexeme-to-lexeme rules can be spelling
changing, and apply _inside_ lexeme-to-word, or inflectional rules,
as expected.

June 18, 2002 (oe)

Fix generator support (by adding a suitable `mrsglobals.lsp'); more
cleaning up of `script' and related collateral files.