Readme file for RCD.pl
14 Oct 2004
Adam Albright (albright@mit.edu)

This file is a simple implementation of Tesar & Smolensky's (1996) Recursive Constraint Demotion algorithm.  It makes use of the Comparative Tableau notation (Prince 2000, 2002) 

It takes input files of tableaus, in the same format as Bruce Hayes' OTSoft tableaus.
Input files should be in tab-delimited text format.
It may be easiest to edit files in Excel, and save them as tab-delimited text.
The format can be inferred by reading the program, but for reference:

Lines 1 and 2: constraint names
Both start with three empty fields (three tabs)
Line 1 = "full" constraint names,
Line 2 = "abbreviated" constraint names (used to print tableaus, etc., where space might be an issue)
For the purposes of RCD.pl, there is no real reason to make the names different (though I do so in TesarSmolensky6.txt, just to provide both Tesar & Smolensky's constraint names and more intuitive names.


The remainder of the file is tableaus of input forms, candidates, and constraint violations.

Column 1: inputs (URs)
Column 2: candidates
Column 3: 1 = winner, blank (or 0) = loser
Columns 4-n  constraint violations (blank = 0, otherwise give the number of violations)

All candidates for a particular input must come together (that is, you cannot list some candidates for a particular UR, then candidates for a different UR, then come back to the first UR)
It does not matter whether the winner comes first, last, or somewhere in the middle.
The UR is listed only on the first line of each "tableau"; the first field is blank for subsequent candidates.

This is, perhaps, clearest from an example:

			*si	*s	*sh	F(ant)
			*si	*s	*sh	F(ant)
sa	sa	1		1		
	sha				1	1
sha	sha	1			1	
	sa			1		1
si	si		1	1		
	shi	1			1	1
shi	shi	1			1	
	si			1		1


The program itself has a fairly straightfoward format.  It starts by asking for a filename (so you do not need to hard-code it, and change the program each time you want to use a different input file.)
(The command <STDIN> reads a line from the terminal, just like reading one from a file)
It then reads in the input file, storing the data in arrays.
(The violations are stored in a three-dimensional array-- $violations[$input][$candidate][$constraint])

It then turns the tableaus into mark-data-pairs, by comparing each losing candidate against the winner, subtracting to see which form the constraint prefers (if any), and marking them as W or L.

It then ranks, recursively (in a subroutine)
Ranking procedure:
	Loop through all constraints that have not been placed in a higher stratum
		For each constraint, check all still-unexplained mark-data-pairs
		If the constraint ever favors losers, check whether there's a W in a higher stratum
		If not, demote to the next lower stratum
	Once all constraints have been checked, go through active mark-data pairs and see which are now explained
	Run RCD on remaining constraints and mark-data pairs (recursion)

If one application of the RCD (one attempt to build a stratum) doesn't explain any new mark-data pairs, things are hopeless; bail.

Once a ranking has been found, the program prints out the final strata and quits.
(A better program would print out tableaus of how the data is correctly derived with the ranking that has been discovered, and have the ability to run the grammar on test forms that aren't part of the data.  I did not spend time implementing this at the moment, because OTSoft does it already.)


******PLEASE NOTE: this implementation was created for class demonstration purposes, and as such, it has been tested primarily on small, simple files.  Although its implementation of the RCD algorithm should be sound, it has not been tested extensively, and comes with no guarantees.  If you are interested in using this script to carry out phonological analysis (as opposed to constructing class demos), it would be wise to confirm its results with a more thoroughly debugged program, such as OTSoft (by Hayes, Zuraw and Tesar)