Final_Proj_Outline.Rmd

---
title: "Final_Project_Outline"
author: "Akhila, Brock, Jeff, Kyle"
date: "10/28/2019"
output: html_document
---
# Final Project Outline

##Tasks and Roles

- Kyle:
  - select: get rid of columns not going to use 
  - filter : remove cases with all missing
  - rename: change variable names 
  
- Jeff:
  - table of company, size (number of cases)
  - reorder: columns 24-28 (make column 25 come after column 24)

- Akhila:
  - frequency count for each question, then bar plot OR recode from string to values
  - figure: likert scale plots 
      cols: Q24 -- has 6 different parts
      -through Q28 

- Brock:
  - figure of the heatmap


## Dataset

Our project dataset is derived from a survey designed to get employers
perspectives of individuals with disabilities in the workforce. 

![](Figures/survey_shot.png)

The survey was developed by researchers at the University of Oregon. US Business
Data was used to generate a pool of potential employers to participate in the
survey. The survey included demographic information and company practices.
Informed consent was received for each participating employer and study
procedures were approved by the University of Oregon Institutional Review Board.

The dataset contains 1,686 records  
![](Figures/data_shot.png)

## Data Preperation

Preliminary work will include some basic transformations as we move from raw to
tidy data. As we attempt to answer our research questions we may decide it is
best to work with only a subset of the data. In that case we would filter the
data. For example, we will examine missing data and determine whether we can
consider that data is missing at random. We may then filter our data to include
only employers who provided complete or near complete data. After we have
transformed our dataset we will begin the process of visualization. This will
include, but not limited to, examining bar charts, histograms, line charts and
scatterplots using _ggplot_ . We will also examine basic descriptive data (e.g.,
means, Standard deviations, correlations, etc.). The visual and descriptive
analysis will help inform on possible data reduction approaches. For example, we
may reduce the number of variables we examine by creating scales via exploratory
or confirmatory factor analysis.

## Meeting Final Requirements:

### Draft Data Preparation Script (25 points)
At the end of Week 8, we will have a draft of the data preparation complete,
including moving the data from its raw to tidy form and a variety of exploratory
data visualizations. Final project will use the following functions:  

* `gather`
* `separate`
* `select`
* `filter`
* `spread`
* `group_by`
* `summarize`

### Peer Review of Data Preparation Script (25 points)
During our peer review, we will note:  
(a) at least three areas of strength,  
(b) at least one thing you learned from reviewing their script, and  
(c) at least one and no more than three areas for improvement.  

### Final Project Presentation (25 points)
The final presentation will be equal parts “journey” and substantive
findings/conclusions/results. We will present for approximately
10 minutes each (20-40 minutes per group), although the time may change
depending on the enrollment of the class.

### Final Paper (110 points)
The final project will:   
(a) be a reproducible and dynamic APA manuscript
produced with R Markdown, via the {papaja} package and include references to the
extant literature;   
(b) be housed on GitHub, with contributions from all authors
obvious;   
(c) demonstrate moving data from its raw “messy” format to a tidy data
format through the R Markdown file, but not in the final document;   
(d) include at least two exploratory data visualizations, and   
(e) include at least summary statistics of the data in tables, although fitted 
models of any sort are an added bonus (not literally, there are not extra points
for fitting a model).

#### The points for the final project are broken down as follows: 
Writing (abstract, intro, methods, results, discussion, references) 30 points (25%)   
Document is fully reproducible and housed on GitHub 25 points (21%)   
Demonstrate use of inline code 5 points (4%)  
Demonstrate tidying messy data 30 points (25%)   
Two data visualizations 20 points (10 points each) (17%)   
Production of at least one table (of summary statistics or model results) 10 points (8%)