methodo_prog_quanti_quanti.Rmd

---
title: "methodo_prog_quanti_quanti"
output: pdf_document
---

# 1. Distribution of sensory attributes

Fonctions d'importation : 
```{r}
experts <- read.table(file="data/perfumes_qda_experts.csv",header=TRUE, sep=",",quote="\"")
```
```{r}
experts <- read.csv(file = "data/perfumes_qda_experts.csv" )
```

Première approche d'un jeu de données:
```{r}
summary(experts)
```

Détection d'erreur de nature de variable et correction :
```{r}
experts$Product <- as.factor(experts$Product)
experts$Panelist <- as.factor(experts$Panelist)
experts$Session <- as.factor(experts$Session)
experts$Rank <- as.factor(experts$Rank)
```

Première initiation à une boucle : 
```{r}
for (col in colnames(experts[,1:4])){
  experts[, col] <- as.factor(experts[, col])
}
```

## Histogram

### CRAN functions 

Développement au fur et à mesure de `hist()`

```{r}
hist(experts$Spicy)
```
```{r}
hist(experts$Spicy, 
     breaks=50)
```
```{r}
hist(experts$Spicy, 
     breaks=50, 
     probability=TRUE)
```
```{r}
hist(experts$Spicy, 
     breaks=50, 
     probability=TRUE,
     main = "Histogram of Spicy")
```

### Using the `ggplot2` package

Initiation au mécanisme de ggplot :

```{r}
library(ggplot2)

ggplot(experts)+
  aes(x=Spicy)+
  geom_histogram()
```

Utilisation de `..density..` :
```{r}
ggplot(experts)+
  aes(x=Spicy, y=..density..)+
  geom_histogram()
```

Mécanisme de regroupement par `fill` : 
```{r}
ggplot(experts)+
  aes(x=Spicy, y=..density.., fill=Product)+
  geom_histogram()
```

Ajout de titre : 
```{r}
ggplot(experts)+
  aes(x=Spicy, y=..density.., fill=Product)+
  geom_histogram() +
  labs(title="Histogram of Spicy" ,x="Spicy values", y = "Frequency")
```

## Density

### CRAN functions 

Initiation au plot de base :

```{r}
d <- density(experts$Spicy) 
plot(d, main = "Density of Spicy") 
```

Ajouter des élements :

```{r}
hist(experts$Vanilla, probability = TRUE, main = "Histogram of Vanilla")
lines(density(experts$Vanilla))
```

Ajout de couleurs, plot de deux variables :

```{r}
plot(density(experts$Vanilla), col="blue", main="Vanilla and Floral density")
lines(density(experts$Floral), col="red")
```

Ajout d'une légende : 
```{r}
plot(density(experts$Vanilla), col="blue", main="Vanilla and Floral density")
lines(density(experts$Floral), col="red")

legend(x=10, y=0.2, legend=c("Vanilla", "Floral"),col=c("blue", "red"), lty=1)
```

### `ggplot2` functions 

Utilisation d'un nouveau type de représentation:

```{r}
ggplot(experts) + 
  aes(x=Vanilla) + 
  geom_density()+
  labs(title="Density of Vanilla" ,x="Vanilla values", y = "Density") 
```

```{r}
ggplot(experts) + 
 aes(x=Vanilla, y=..density..) + 
 geom_histogram()+
 geom_density() +
 labs(title="Density of Vanilla" ,x="Vanilla values", y = "Density")
```

```{r}
ggplot(experts) + 
  aes(x=Vanilla, color=Product) + 
  geom_density() +
  labs(title="Density of Vanilla for each product" ,x="Vanilla values", y = "Density")
```
Ajout d'un élément esthétique au sein même de la fonction de plot :
```{r}
ggplot(experts) + 
  aes(x=Vanilla, fill=Product) + 
  geom_density(alpha=0.5)  +
  labs(title="Density of Vanilla for each product" ,x="Vanilla values", y = "Density")
```

A ce stade il est initié : 
- l'import d'un document
- une boucle for 
- la conversion en variable catégorielle
- les plots de base de R avec leurs arguments principaux (hist, plot, lines, legend)
- le calcul d'une densité à partir de valeurs (density)
- le fonctionnement de ggplot (l'architecture de base + le plot de plusieurs graphes selon des modalités + la personnalisation de la légende, des titres et du remplissage couleur)

## Descriptors

### Creation of a dataframe

Création d'un data frame :
```{r}
descriptors <- data.frame("mean"=double(), "sd"=double(), "median"=double(), "q1"=double(), "q3"=double())
```

Calcul des indicateurs et ajout de ligne à un data frame :
```{r}
for (a in 5:16){
  me <- mean(experts[,a])
  sd <- sqrt(var(experts[,a]))
  med <- quantile(experts[,a], 0.5)
  q1 <- quantile(experts[,a], 0.25)
  q3 <- quantile(experts[,a], 0.75)
  descriptors <- rbind(descriptors, c(me, sd, med, q1, q3))
}
```

Modification des noms de colonnes et des lignes: 
```{r}
colnames(descriptors) <- c("mean", "sd", "median", "q1", "q3")
rownames(descriptors) <- colnames(experts[,5:16])
```

### Visualization of descriptors 

#### Boxplot 

##### CRAN function

Faire plusieurs graphes sur la même fenêtre et fonction boxplot() :
```{r}
par(mfrow=c(1,3))

for (attribute in colnames(experts[,5:7])){
  boxplot(experts[,attribute], main = attribute)
}
```

Ajout de ligne sur un graphe déjà affiché :
```{r}
par(mfrow=c(1,3))

for (attribute in colnames(experts[,5:7])){
  boxplot(experts[,attribute], main = attribute)
  abline(h=mean(experts[,attribute]))
}
```

##### ggplot2 function

Afficher plusieurs graphes ggplot avec `gridExtra` + geom_boxplot() :
```{r}
library(gridExtra)

g1 <-  ggplot(experts)+
          aes(y=Spicy)+
          geom_boxplot()

g2 <-  ggplot(experts)+
          aes(y=Heady)+
          geom_boxplot()

g3 <-  ggplot(experts)+
          aes(y=Fruity)+
          geom_boxplot()

grid.arrange(g1,g2,g3, nrow=1, ncol=3)
```

#### Density

##### CRAN function 

Affichage direct d'un graphe pour donner une représentation visuelle des indicateurs sur une densité, par d'exercice dessus mais code disponible:

```{r}
d<-density(experts$Vanilla)

# Plot the line
plot(d, main="Vanilla Distribution and quantiles")
q25 <- which.max(cumsum(d$y/sum(d$y)) >= 0.25)
q95 <- which.max(cumsum(d$y/sum(d$y)) >= 0.95)

# Plot the shading
polygon(c(-5, d$x[1:q25], d$x[q25]), c(0, d$y[1:q25], 0), col = 'lightblue')
polygon(c(d$x[q95], d$x[d$x > d$x[q95]], 15),c(0, d$y[d$x > d$x[q95]], 0),col = "lightblue")

# Plot the vline for mean
abline(v=mean(experts$Vanilla))
text(mean(experts$Vanilla),0.2, "mean", pos=2)

# Plot the vline for Q3
abline(v=d$x[q95])
text(d$x[q95],0.2, "Q3", pos=2)

# Plot the vline for Q1
abline(v=d$x[q25])
text(d$x[q25],0.2, "Q1", pos=2)
```

##### ggplot function 

Ajout d'une ligne sur un ggplot au moyen d'une variable stockée au préalable :

```{r}
mean_vanilla <- mean(experts$Vanilla)

ggplot(experts)+
  aes(x=Vanilla)+
  geom_density()+
  geom_vline(xintercept=mean_vanilla)
```

```{r}
q1 <- quantile(experts$Vanilla, 0.25)
q2 <- quantile(experts$Vanilla, 0.75)

ggplot(experts)+
  aes(x=Vanilla)+
  geom_density()+
  geom_vline(xintercept=q1)+
  geom_vline(xintercept=q2)
```

Ajout d'une couleur choisie :
```{r}
ggplot(experts)+
  aes(x=Vanilla)+
  geom_density(fill='red', alpha=0.5)+
  geom_vline(xintercept=mean_vanilla)
```

# 2. Product effect

Initiation à dplyr et ses verbes : 

```{r}
library(dplyr)

df <- experts %>%  
  # Select 3 products and 1 sensory attribute
  select(c(Product, Floral)) %>% 
  filter(Product == "J'adore ET" | Product == "Angel" | Product == "Chanel N5" ) %>%
  # Add the mean's column
  group_by(Product) %>% 
  mutate(mu=mean(Floral))
```

En ayant une colonne correspondant aux moyennes, plus besoin de les stocker dans une variable au préalable : 
```{r}
ggplot(df) + 
  aes(x=Floral, color=Product) + 
  geom_density() + 
  geom_vline(aes(xintercept=mu, color=Product)) + 
  labs(title="Density of Floral according three products")
```

Même chose mais avec un autre type de représentation :
```{r}
ggplot(df) + 
  aes(y=Floral, x= Product, color=Product) + 
  geom_boxplot() + 
  labs(title="Boxplot of Floral according three products")
```

# 3. Differences between products 

Nouvelle fonction de ggplot, `stat_summary` permettant de faire des calculs sans nécessiter à construire un df avant :
```{r}
ggplot(df)+
  aes(y=Floral, x= Product, color=Product) + 
  geom_boxplot() +
  stat_summary(mapping=aes(group=1), fun=mean, geom="line", color="black") + 
  stat_summary(fun=mean, geom="point")
```

Rien de nouveau, réutilisation de dplyr :
```{r}
df <- experts %>%  
  select(c(Product, Floral, Citrus, Spicy, Heady)) %>% 
  filter(Product == "J'adore ET" | Product == "Angel" | Product == "Chanel N5" )
```

Personnalisation des légendes et axes avec theme():
```{r}
# First sensory attribute
a1 <- ggplot(df)+
  aes(y=Floral, x= Product, color=Product)+geom_boxplot() + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  # Delete the x-axis:
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

# Second sensory attribute
a2 <- ggplot(df)+
  aes(y=Spicy, x= Product, color=Product)+geom_boxplot() + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

# Third sensory attribute
a3 <- ggplot(df)+
  aes(y=Citrus, x= Product, color=Product)+geom_boxplot() + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

# Fourth sensory attribute
a4 <- ggplot(df)+
  aes(y=Heady, x= Product, color=Product)+geom_boxplot() + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

grid.arrange(a1, a2, a3,a4, ncol=2, nrow = 2)
```

Rien de nouveau :
```{r}
a1 <- ggplot(df)+
  aes(y=Floral, x= Product, color=Product)+ 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

a2 <- ggplot(df)+
  aes(y=Citrus, x= Product, color=Product) + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

a3 <- ggplot(df)+
  aes(y=Spicy, x= Product, color=Product) + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

a4 <- ggplot(df)+aes(y=Heady, x= Product, color=Product) + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

grid.arrange(a1, a2, a3, a4, ncol=2, nrow = 2)
```

Utiliser une boucle pour appliquer un même calcul à chaque colonne (presque du copié collé des boucles déjà utilisées):

```{r}
for (attribute in colnames(df)[-1]){
  df[, attribute] <- df[, attribute]/sd(df[, attribute])
}
```

Rien de plus:
```{r}
a1 <- ggplot(df)+
  aes(y=Floral, x= Product, color=Product)+ 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

a2 <- ggplot(df)+
  aes(y=Citrus, x= Product, color=Product) + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

a3 <- ggplot(df)+
  aes(y=Spicy, x= Product, color=Product) + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

a4 <- ggplot(df)+
  aes(y=Heady, x= Product, color=Product) + 
  stat_summary(fun=mean, geom="line", aes(group=1), color="black") + 
  stat_summary(fun=mean, geom="point")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank())

grid.arrange(a1, a2, a3, a4, ncol=2, nrow = 2)
```

Créer un nouveau tableau à partir d'un autre avec dplyr et initiation à un nouveau verbe `summarize()`:
```{r}
means <- df %>% group_by(Product) %>% summarise(
  mean_Spicy=mean(Spicy),
  mean_Citrus=mean(Citrus),
  mean_Floral=mean(Floral)
)
```

Application plus large:
```{r}
df.means <- experts %>%  
  select(c(Product, Floral, Citrus, Spicy, Heady, Fruity, Green, Vanilla, Woody)) %>% 
  filter(Product == "J'adore ET" | Product == "Angel" | Product == "Chanel N5" | Product == "Coco Mademoiselle"| Product == "Aromatics Elixir"| Product == "Cinéma"| Product == "J'adore EP"| Product == "Shalimar")

for (attribute in colnames(df.means)[-1]){
  df.means[, attribute] <- df.means[, attribute]/sd(df.means[, attribute])
}

means.V2 <- df.means %>% group_by(Product) %>% summarise(
  mean_Spicy=mean(Spicy),
  mean_Citrus=mean(Citrus),
  mean_Floral=mean(Floral), 
  mean_Heady=mean(Heady), 
  mean_Fruity=mean(Fruity), 
  mean_Green=mean(Green), 
  mean_Vanilla=mean(Vanilla), 
  mean_Woody=mean(Woody)
)
```

# 4. Notion of metric

Nouvelles fonctions de R : as.matrix() et dist()

```{r}
spicy.matrix <- as.matrix(dist(means.V2$mean_Spicy))
citrus.matrix <- as.matrix(dist(means.V2$mean_Citrus))
floral.matrix <- as.matrix(dist(means.V2$mean_Floral))
heady.matrix <- as.matrix(dist(means.V2$mean_Heady))
fruity.matrix <- as.matrix(dist(means.V2$mean_Fruity))
green.matrix <- as.matrix(dist(means.V2$mean_Green))
Vanilla.matrix <- as.matrix(dist(means.V2$mean_Vanilla))
woody.matrix <- as.matrix(dist(means.V2$mean_Woody))
```

Application de fonctionnalités de ggplot déjà vues:
```{r}
a1 <- ggplot(means.V2)+
  aes(x = mean_Spicy, y=mean_Citrus, color=Product)+
  geom_path(aes(group=1),color="black")+
  geom_point()

a2 <- ggplot(means.V2)+
  aes(x = mean_Spicy, y=mean_Floral, color=Product)+
  geom_path(aes(group=1),color="black")+
  geom_point()

a3 <- ggplot(means.V2)+
  aes(x = mean_Citrus, y=mean_Floral, color=Product)+
  geom_path(aes(group=1),color="black")+
  geom_point()

grid.arrange(a1, a2, a3, ncol=2, nrow = 2)
```

Nouvelle fonction : data.frame() et cov()

```{r}
means.variables <- data.frame(means.V2, row.names = 1)
cov.att <- cov(means.variables)
```

Rien de plus:
```{r}
dist.prod <- as.matrix(dist(means.variables))
```

# 5. Structure 

Nouvelle fonction : heatmap()
```{r}
heatmap(cov.att)
heatmap(dist.prod)
```

*Functions used : heatmap()*

# 6. Inertia 

Application de calcul mathématiques en programmation : 
```{r}
Products_sc_Mat <- as.matrix(dist(scale(dist.prod))^2)
sum(Products_sc_Mat)/(2*dim(Products_sc_Mat)[1]*(dim(Products_sc_Mat)[1]-1))

Att_sc_Mat <- as.matrix(dist(scale(cov.att))^2)
sum(Att_sc_Mat)/(2*dim(Att_sc_Mat)[1]*(dim(Att_sc_Mat)[1]-1))
```

Pareil:
```{r}
G1 <- as.data.frame(cov.att) %>% select(mean_Citrus, mean_Floral, mean_Fruity, mean_Green)
G2 <- as.data.frame(cov.att) %>% select(mean_Spicy, mean_Heady, mean_Woody, mean_Vanilla)

Att_G1_sc_Mat <- as.matrix(dist(scale(G1))^2)
inertia.G1 <- sum(Att_G1_sc_Mat)/(2*dim(Att_G1_sc_Mat)[1]*(dim(Att_G1_sc_Mat)[1]-1))

Att_G2_sc_Mat <- as.matrix(dist(scale(G2))^2)
inertia.G2 <- sum(Att_G2_sc_Mat)/(2*dim(Att_G2_sc_Mat)[1]*(dim(Att_G2_sc_Mat)[1]-1))

inertia.G1+inertia.G2
```

Pareil:
```{r}
G1 <- as.data.frame(dist.prod) %>% select("Aromatics Elixir", Shalimar, "Chanel N5", Angel)
G2 <- as.data.frame(dist.prod) %>% select(Cinéma, "Coco Mademoiselle", "J'adore EP", "J'adore ET")

Att_G1_sc_Mat <- as.matrix(dist(scale(G1))^2)
inertia.G1 <- sum(Att_G1_sc_Mat)/(2*dim(Att_G1_sc_Mat)[1]*(dim(Att_G1_sc_Mat)[1]-1))

Att_G2_sc_Mat <- as.matrix(dist(scale(G2))^2)
inertia.G2 <- sum(Att_G2_sc_Mat)/(2*dim(Att_G2_sc_Mat)[1]*(dim(Att_G2_sc_Mat)[1]-1))

inertia.G1+inertia.G2
```

# 7. PCA

## FactoMineR

Nouvelle fonction : `PCA()` 

```{r}
library(FactoMineR)
res<-PCA(scale(means.variables), graph = FALSE, scale.unit = F)
res$ind$coord
res$var$coord
```

## Decomposition with svd() 

Calcul mathématiques à la main + fonction svd()

```{r}
svd <- svd(scale(means.variables))
diag <- diag(svd$d)

#Verification
svd$u%*%diag%*%t(svd$v)
scale(as.matrix(means.variables))

#Individuals coordinates
scale(means.variables)%*%svd$v
#Variables coordinates
t(scale(means.variables))%*%svd$u/sqrt(dim(means.variables)[1])

#Comparate with PCA()
res$ind$coord
res$var$coord
```

## Using Nipals algorithm

Ecrire une fonction sur R:
```{r}
NIPALS <- function(X){
  X = as.matrix(X)
  N = nrow(X)
  M = ncol(X)
  
  D = diag(1/N, N)
  Xini = X
  qrX=qr(X)
  rang = qrX$rank
  vec=matrix(0,nrow=M,ncol=rang) 
  t=X[,1]
  i=1
  
  p=t(X)%*%t%*%(1/(t(t)%*%t))
  p=p/as.numeric(sqrt(t(p)%*%p))
  print(rang)
  while (i<rang+1) {
    norm=1
    while(norm>0.000001){
      t=(X%*%p)%*%(1/(t(p)%*%p))
      p2=t(X)%*%t%*%(1/(t(t)%*%t))
      p2=p2/as.numeric(sqrt(t(p2)%*%p2))
      diff=p2-p
      norm=t(diff)%*%diff
      p=p2
      print(p)
      print(i)
    }
    vec[,i]=p
    X=X-(t%*%t(p))
    i=i+1
  }
  return(vec)
}

NIPALS(means.variables)
svd(means.variables)$v
```

# 7. Supplementary informations 


## Supplementary variables 
 
## Supplementary individuals


# 8. Ponderate ACP