From: Christopher Potts Date: July 28, 2011 22:13:49 MDT To: Class list Subject: Computational Pragmatics: updates based on your hypotheses Computational Pragmaticists! In the final part of class on Tuesday, you all hypothesized a lot of interesting relationships between various pieces of information in the Penn Discourse Treebank.  The ideas were extremely valuable, both for getting to know the corpus and for the experiments I want to discuss tomorrow.  I'm writing to let you know about updates to the codebase and PDTB write-up based on this discussion.  (I'll also review this new stuff at the start of class tomorrow.) It would be helpful if you downloaded the following files before class.  If you downloaded them before, please do so again, so that you get the updates: http://compprag.christopherpotts.net/code-data/pdtb.py http://compprag.christopherpotts.net/code-data/pdtb_functions.py Overview of the updates: 1. Rajesh's comments made it clear that it is valuable to have general methods for identifying the linear relationship between Arg1 and Arg2. I've now extended pdtb.py with a full set of methods for this: http://compprag.christopherpotts.net/pdtb.html#relarg 2. The discussion helped me see that we need a general method for exploring relationships between annotations.  I've updated pdtb_functions.py with a function contigencies() that should facilitate such exploration. The basic idea is that you give this function two functions on datum objects and it calculates observed/expected values for the relationships between those two variables (function values).  Here's a description with implementations of four of the suggestions people made in class: http://compprag.christopherpotts.net/pdtb.html#contingencies The examples: * Relate Relation to the primary semantic class * Relate Relation to Arg order * Relate Arg order to primary semantic class * Relate Explicit connectives by precedence argument order * Relate negation (im)balance to primary semantic class Any additional suggestions? I think we should warm-up tomorrow by running some new comparisons. ---Chris