man/flatfile.Rd

\docType{methods}
\name{flatfile}
\alias{flatfile}
\title{Flat files}
\description{
\code{Distance} allows loading data as a "flat file" and
analyse data (and obtain abundance estimates) straight
away, provided that the format of the flat file is correct.
One can provide the file as, for example, an Excel
spreadsheet using \code{read.xls} in \pkg{gdata} or CSV
using \code{\link{read.csv}}.
}
\details{
Each row of the data table corresponds to one observation
and must have a the following columns:
\tabular{ll}{\code{distance} \tab observed distance to
object \cr \code{Sample.Label} \tab Identifier for the
sample (transect id) \cr \code{Effort} \tab effort for this
transect (e.g. line transect length or number of times
point transect was visited) \cr \code{Region.Label} \tab
code for regional strata (see below) \cr \code{Area} \tab
area of the strata}

Note that in the simplest case (one area surveyed only
once) there is only one \code{Region.Label} and a single
corresponding \code{Area} duplicated for each observation.

The example given below was provided by Eric Rexstad.
}
\examples{
\donttest{
library(Distance)
# Need to have the gdata library installed from CRAN, requires a system
# with perl installed (usually fine for Linux/Mac)
library(gdata)

# Need to get the file path first
# Going to the path given in the below, one can examine the format
minke.filepath <- system.file("minke.xlsx",package="Distance")

# Load the Excel file, note that header=FALSE and we add column names after
minke <- read.xls(minke.filepath, stringsAsFactor=FALSE,header=FALSE)
names(minke) <- c("Region.Label", "Area", "Sample.Label", "Effort","distance")
# One may want to call edit(minke) or head(minke) at this point
# to examine the data format

# Due to the way the file was saved and the default behaviour in R
# for numbers stored with many decimal places (they are read as strings
# rather than numbers, see str(minke)). We must coerce the Effort column
# to numeric
minke$Effort <- as.numeric(minke$Effort)

## perform an analysis using the exact distances
pooled.exact <- ds(minke, truncation=1.5, key="hr", order=0)
summary(pooled.exact)


## Try a binned analysis
# first define the bins
dist.bins <- c(0,.214, .428,.643,.857,1.071,1.286,1.5)
pooled.binned <- ds(minke, truncation=1.5, cutpoints=dist.bins, key="hr", order=0)

# binned with stratum as a covariate
minke$stratum <- ifelse(minke$Region.Label=="North", "N", "S")
strat.covar.binned <- ds(minke, truncation=1.5, key="hr",
                         formula=~as.factor(stratum), cutpoints=dist.bins)

# Stratified by North/South
full.strat.binned.North <- ds(minke[minke$Region.Label=="North",],
                  truncation=1.5, key="hr", order=0, cutpoints=dist.bins)
full.strat.binned.South <- ds(minke[minke$Region.Label=="South",],
                     truncation=1.5, key="hr", order=0, cutpoints=dist.bins)

## model summaries
model.sel.bin <- data.frame(name=c("Pooled f(0)", "Stratum covariate",
                                   "Full stratification"),
                            aic=c(pooled.binned$ddf$criterion,
                                  strat.covar.binned$ddf$criterion,
                                  full.strat.binned.North$ddf$criterion+
                                  full.strat.binned.South$ddf$criterion))

# Note model with stratum as covariate is most parsimonious
print(model.sel.bin)
}
}