Homework #3
One interesting application of statistics is in trying to identify who wrote historical works that were published using pseudonyms. A classic paper on this topic is “ On Sentence Length as a Statistical Characteristic of Style in Prose: With Application to Two Case of Disputed Authors†by G. Udny Yule, published in the journal Biometrika ( Vol. 30, No. ¾ Jan., 1939, pp. 363 – 390). In that paper, Yule identified the length of sentences as a feature that tends to remain consistent across written works by the same author.
For this project, you are going to figure out how to estimate the distribution of sentence lengths for a book of your choosing.
Find a book that is mainly text and relatively uncluttered with pictures, graphs, ect.
Choose a random sample of 20 sentences from throughout the book and count how many words they have in them.
Use two different sampling measures: simple random sampling, stratified sampling, cluster sampling, or systematic sampling.
For your two sampling measures, make a table showing how many sentences there were for each length (two words, three words, ect.) and answer the following questions: