Skip to content

honestus/AIDA

Repository files navigation

AIDA

This project consists of a Statistical purpose analysis on the Italian firms, it borns with an educational aim for the Statistical Methods for Data Science course(A.Y. 2017-2018) at the Università di Pisa.

We've been analyzing Italian firms by trying to answer very common claims in the economical/statistical field, that are:

  • what's the measure of size that best describes the firms size;
  • quantifying the correlation between different measures of the firms size;
  • how are the firms sizes distributed;
  • how is the firms growth distributed;
  • is the mean growth statistically different from zero;
  • is the growth distribution symmetric or asymmetric.

Then we've also tried to distinguish the behaviours within distinct subsamples of the whole dataset, such as: distinct subsectors, distinct years, distinct firms sizes.

Tools and Technologies used

All the analysis have been done with R (version 3.5.0).

To perform useful operations on our data we've used the dplyr package; for power law distribution we've used poweRlaw library. For plotting we've mostly used ggplot package. Any other needed package is listed in packages.txt file.

Files description

A brief description of the distinct directories and files you may find in this repository:

  • the data directory contains RData files that refer to our original data.
  • the files directory contains:
    • distrResults which contains all the RData files for the results of fitted distributions on distinct (sub)samples;
    • images which contains all the images of plotting, CIs etc.
  • utils.R is an R file for very general utilities(eg: loading needed packages, loading datasets into current workspace);
  • functions.R is an R script that contains several useful functions for analysis purposes;
  • first_analysis.R contains a very general analysis on the whole dataset, eg: basic statistics of the distinct features;
  • correlation.R contains correlation analysis and linear regression for Employee and Revenue attributes;
  • test_distr.R contains all the analysis done for Size distribution of the firms;
  • powerlaw.R has been written to further analyze the power law hypothesis on the firms size by using Employee attribute;
  • growth_rate_dist.R and all the remaining files which name starts by "growth"(one for each (sub)sample) contain analysis on the growth of the italian firms;
  • distributionResultsAnalysis contains the results that we've obtained and thus analyzed from files contained in "files/distrResults"
  • packages.txt contains a list of the packages needed to perform the analysis.

For deeper and clearer explanations about the procedures and the results, please read our final report.