Skip to content

Analysis and raw data for Safi CT symposium paper (Arxiv.org:1204.2043)

License

Notifications You must be signed in to change notification settings

mmadsen/SAA2012

Repository files navigation

README: Analysis for Mark E. Madsen's 2012 SAA Paper

Abstract

Improving the Fit Between Model and Data: Extreme Value Analysis of Unbiased Transmission

Increasingly, neutral models form the basis for explanatory models in archaeology. Many of these models assume that class frequencies represent a synchronic sample. The assumption, however, is rarely justifiable since archaeological deposits are time-transgressive, and the result of time-averaging changes the distribution of observables. Class richness, for example, is an additive “maximum” value for accretional assemblages. Class frequencies, therefore, should be modeled using extreme value distributions of transmission processes, not central limit behavior. This change points to improvements that can be made to numerical methods that evaluate frequency expectations due to random copying.

Description

This repository contains the data files and R analysis scripts used to create graphs and results appearing in my SAA symposium paper, and in the draft paper posted to Arxiv.org (http://arxiv.org/abs/1204.2043).

The data files are in the rawdata directory, in a gzipped tar file (for maximum compression). Inside there are data files combining all samples for Slatkin tests, Kn measurements, T_f or IQV values, and trait lifetimes.

These are not the raw TransmissionFramework output, which occupy many gigabytes of raw text files. The raw data files given here are the result of using the analysis scripts in the TransmissionFramework "scripts" directory against the output directories produced in simulation runs.

The R analysis scripts may require libraries you do not have loaded in a base R installation, such as Hadley Wickham's superb and indispensable Plyr, reshape, and ggplot2 libraries. In ALL cases, however, the libraries were installable from the R console (or within an R IDE, I strongly recommend R Studio) without separate download.

Each R analysis script operates separately, bringing in the appropriate raw data file, reshaping/renaming the data frame as needed, performing analyses, and constructing graphs and tables.

Any table in the paper draft which uses R output is generally autogenerated from within R itself, for replicability. Very few tweaks were necessary at all for formatting purposes, given the superb xtable library.

Mathematica Files

The mathematica directory contains several notebooks which record various analyses of the neutral unbiased model. Many of the results are not directly used in this paper, but one of the notebooks has an analysis of mean trait lifetime and expected K_n from equations given by Ewens 2004.

Much of the probability distribution analysis will require a Mathematica add-on called MathStatica. I used Version 2.5 Gold, with a modified bugfix supplied directly by Colin Rose, the author of the package. This package is needed to perform symbolic and numerical calculations on the exact PMF of K_n, instead of the approximations given by Ewens. You may not be able to calculate that notebook if you do not have MathStatica. Which I urge anyone who works with stochastic models and arbitrary PDFs to get.

About

Analysis and raw data for Safi CT symposium paper (Arxiv.org:1204.2043)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published