Skip to content

FelipenerySilva/Analytics-for-Business-Intelligence-

Repository files navigation

Data Analytics for Business Intelligence

Repository containing a project in R of Data Analytics exercises questions for Business Intelligence.

The project was part of my Analytics for Business Intelligence course from the University of Newcastle (Uon) Master of Data Science Degree. Presented in the form of RMarkdown.

Question 1

A company producing snack foods uses an automated system to package 200g packets of chips. Management must ensure that the system (machine) is working properly; consistently underweight packets will result in consumer dissatisfaction, whilst consistently overweight packets result in additional costs being absorbed by the company. One of the specification requirements for the machine to be deemed to be working correctly is that the mean weight of the packets of chips is 200g. The quality assurance department obtained a random sample of 80 packets of chips and measured the content (in grams) of each packet in order to test the mean weight. The data is contained in file “Question1.csv”.

(i) Construct the 95% confidence interval for the population mean weight of the packets of chips.
(ii) What assumptions need to be met for this interval to be valid?
(iii) Interpret the 95% confidence interval in words.
(iv) Explain, using support from the 95% confidence interval, what you would conclude about whether the company’s system is meeting the specification requirement that the mean weight of the packets of chips is 200g.

Question 2

Two software auditing systems (System A and System B) were assessed for their ease of use by comparing the times taken by people (auditors) to complete a series of tasks using each system. A random sample of 100 auditors was selected and each auditor was assigned to a different randomly selected task. Each auditor used both systems A and B to complete his/her designated task. The two times were recorded (in minutes) for each auditor. The data is contained in file “Question2.csv”.

Implement the following six steps to test whether there is a difference, at a 5% significance level, between the ease of use of the two systems based upon the mean times to complete the tasks.

(i) Define the parameters and state the null and alternative hypothesis.
(ii) Check the assumptions for this hypothesis test.
(iii) Find the test statistic.
(iv) State the null distribution.
(v) Calculate the p-value.
(vi) Write the conclusion in plain language.

Question 3

In a study of multifunction inject printers, a random sample of 30 all-in-one printers were selected from a supplier and their printing rates (in pages per minute) and retail prices (in dollars) were recorded. The data is contained in file “Question3.csv”: Columns A and B are the printing rates and the printer prices, respectively.

(i) Does the scatterplot suggest a linear relationship between printing price and printer rate?
(ii) Fit a simple linear regression model with printer price as the response variable and printing rate as the predictor variable. Write down the regression equation and interpret the numbers in the equation.
(iii) Use the fitted regression equation to estimate the price of a printer with a printing rate of 2 pages per minute.
(iv) Is there sufficient evidence that there is a relationship between printer price and printing rate? Be sure to state the null and alternative hypotheses, the p-value and an appropriate conclusion.