Document Distance: Data Sets
Problem Definition | Data Sets | Programs: v1 - v2 - v3 - v4 - v5 - v6 | Programs Using Dictionaries
Here are nine sample text files, mostly from Project Gutenberg for use as input files for the document distance problem:
- t1.verne.txt: Verne's In the Year 2889 (TXT)
- t2.bobsey.txt: Hope's The Bobsey Twins on Blueberry Island (TXT)
- t3.lewis.txt: Lewis and Clark's History of the Expedition under the Command of Captains Lewis and Clark (Vol. I) (TXT - 1MB)
- t4.arabian.txt: Anon's The Arabian Nights Entertainments Complete (TXT - 2.9MB)
- t5.churchill.txt: Churchill's The Complete Works of Winson Churchill (TXT - 9.1MB)
- t6.onemillion.txt: List of one million integers (from 000000 to 999999) (TXT - 7.6MB)
- t7.tenmillion.txt: List of ten million integers (from 0000000 to 9999999) (TXT - 86MB)
- t8.shakespeare.txt: The Complete Works of William Shakespeare (TXT - 5.3MB)
- t9.bacon.txt: Essays by Francis Bacon (TXT)