Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/ayrna/orca
Browse files Browse the repository at this point in the history
  • Loading branch information
javism committed Jan 25, 2018
2 parents e8122be + d002acd commit 8e19e08
Show file tree
Hide file tree
Showing 4 changed files with 120 additions and 120 deletions.
86 changes: 46 additions & 40 deletions doc/orca-tutorial-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ end
hold off;
```
which generates the following figure:

![Projections of POM for melanoma](tutorial/images/POMMelanomaProjections.png)

As can be checked no pattern is projected beyond the last threshold, so that the last class is ignored. Note that POM is a linear model and this can limit its accuracy. We can check this in the confusion matrix:
Expand Down Expand Up @@ -120,6 +121,7 @@ legend(arrayfun(@(num) sprintf('C%d', num), 1:Q, 'UniformOutput', false))
hold off;
```
which generates the plot:

![Projections of POM for melanoma with colours](tutorial/images/POMMelanomaProjectionsColours.png)

As can be observed the three patterns from the last class are never correctly classified.
Expand All @@ -140,6 +142,7 @@ for i=1:size(info.model.thresholds,1)
end
hold off;
```

![Cumulative probabilities by this set of thresholds](tutorial/images/POMMelanomaCumProb.png)

```MATLAB
Expand All @@ -154,6 +157,7 @@ for i=1:size(info.model.thresholds,1)
end
hold off;
```

![Individual probabilities by this set of thresholds](tutorial/images/POMMelanomaProb.png)

As can be seen, those projections close to the thresholds can be classified in different classes according to the probability distribution. However, following the spirit of threshold models, the implementation of POM included in ORCA classify the patterns according to their position with respect to the thresholds.
Expand Down Expand Up @@ -342,6 +346,7 @@ end
legend('SVOREX');
hold off;
```

![Comparison of SVORIM and SVOREX](tutorial/images/SVORIM_SVOREX.png)

Fine tuning a bit the parameters, we can improve the results:
Expand Down Expand Up @@ -383,50 +388,46 @@ fprintf('REDSVM Accuracy: %f\n', CCR.calculateMetric(test.targets,info.predicted
fprintf('REDSVM MAE: %f\n', MAE.calculateMetric(test.targets,info.predictedTest));
```

To better understand the relevance of parameters selection process, the following code optimizes parameters `k` and `k` using a pair of train and validation data. Then, it plots corresponding heatmaps for `Acc` and `AMAE`. Note that the optimal combination may differ depending of the selected performance metric.
To better understand the relevance of parameters selection process, the following code optimizes parameters `k` and `C` using a 3Fold for each combination. Then, it plots corresponding validation results for `Acc` and `AMAE`. Note that the optimal combination may differ depending of the selected performance metric. Depending on your version of Matlab, a `contourf` or a `heatmap` is used for each metric.

```MATLAB
%% REDSVM optimization
clear T Ts;
Metrics = {@MZE,@AMAE};
Ts = cell(size(Metrics,2),1);
for m = 1:size(Metrics,2)
mObj = Metrics{m}();
fprintf('Grid search to optimize %s for REDSVM\n', mObj.name);
bestError=Inf;
T = table();
for C=10.^(-3:1:3)
for k=10.^(-3:1:3)
param = struct('C',C,'k',k);
info = algorithmObj.runAlgorithm(train,test,param);
error = mObj.calculateMetric(test.targets,info.predictedTest);
if error < bestError
bestError = error;
bestParam = param;
end
param.error = error;
T = [T; struct2table(param)];
fprintf('.');
end
>> if verLessThan('matlab', '2017a')
% Use contours
figure;
hold on;
for m = 1:size(Metrics,2)
mObj = Metrics{m}();
subplot(size(Metrics,2),1,m)
x = Ts{m}{:,1};
y = Ts{m}{:,2};
z = Ts{m}{:,3};
numPoints=100;
[xi, yi] = meshgrid(linspace(min(x),max(x),numPoints),linspace(min(y),max(y),numPoints));
zi = griddata(x,y,z, xi,yi);
contourf(xi,yi,zi,15);
set(gca, 'XScale', 'log');
set(gca, 'YScale', 'log');
colorbar;
title([mObj.name ' optimization for REDSVM']);
end
Ts{m} = T;
fprintf('\nBest Results REDSVM C %f, k %f --> %s: %f\n', bestParam.C, bestParam.k, mObj.name, bestError);
hold off;
else
% Use heatmaps
fprintf('Generating heat maps\n');
figure;
subplot(2,1,1)
heatmap(Ts{1},'C','k','ColorVariable','error');
title('MZE optimization for REDSVM');
subplot(2,1,2)
heatmap(Ts{2},'C','k','ColorVariable','error');
title('AMAE optimization for REDSVM');
end
fprintf('Generating heat maps\n');
figure;
subplot(2,1,1)
h = heatmap(Ts{1},'C','k','ColorVariable','error');
title('MZE optimization for REDSVM');
subplot(2,1,2)
h = heatmap(Ts{2},'C','k','ColorVariable','error');
title('AMAE optimization for REDSVM');
```

![REDSVM heapmap to show crossvalidation](tutorial/images/redsvm-melanoma-heatmap.png)
![REDSVM heatmap to show crossvalidation](tutorial/images/redsvm-melanoma-heatmap.png)

![REDSVM contourf to show crossvalidation](tutorial/images/redsvm-melanoma-contour.png)

## Kernel discriminant learning for ordinal regression (KDLOR)

Expand Down Expand Up @@ -492,6 +493,7 @@ end
legend(arrayfun(@(num) sprintf('C%d', num), 1:Q, 'UniformOutput', false))
hold off;
```

![Projection of KDLOR for the melanoma dataset](tutorial/images/KDLORProjectionMelanoma.png)
---

Expand Down Expand Up @@ -558,16 +560,20 @@ If we check the dataset used for POM:
>> scatter(newTrain.patterns(:,1),newTrain.patterns(:,2),7,newTrain.targets);
```

![Intermediate dataset of the custom ensemble](tutorial/images/ensembleMelanoma.png)
we can see that, although the correlation of both projections is quite high, some patterns can be refined by considering both projections.

---

***Exercise 3***: construct a similar ensemble but using different SVORIM projections with different parameters for the `C` value. The number of members of the ensemble should be a parameter.
***Exercise 4***: construct a similar ensemble but using different SVORIM projections with different subsets of input variables (a 40% of randomly chosen variables). The number of members of the ensemble should be as a parameter (try 50).

----

***Exercise 5***: construct a similar ensemble but using different SVORIM projections with different parameters for the `C` value.

---

***Exercise 4***: construct a similar ensemble but using different SVORIM projections with different subsets of patterns and different subsets of input variables (randomization). The number of members of the ensemble should remain as a parameter.

# References

Expand Down
Binary file added doc/tutorial/images/redsvm-melanoma-contour.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
119 changes: 64 additions & 55 deletions src/code-examples/exampleMelanomaTM.m
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
addpath ../Algorithms

if (exist ('OCTAVE_VERSION', 'builtin') > 0)
try
pkg load statistics
try
graphics_toolkit ("gnuplot")
catch
catch
error("This code uses gnuplot for plotting. Please install gnuplot and restart Octave to run this code.")
end
end
Expand Down Expand Up @@ -50,16 +51,14 @@
end
y1=get(gca,'ylim');
for i=1:size(info.model.thresholds,1)
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
end
hold off;

% Check confusion matrix
confusionmat(test.targets,info.predictedTest)

% Visualize the projection with colors

%clf
figure; hold on;
Q = size(info.model.thresholds,1)+1;
if (exist ('OCTAVE_VERSION', 'builtin') > 0)
Expand All @@ -74,7 +73,7 @@
end
y1=get(gca,'ylim');
for i=1:size(info.model.thresholds,1)
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
end
%legend('C1','C2','C3','C4','C5');
legend(arrayfun(@(num) sprintf('C%d', num), 1:Q, 'UniformOutput', false))
Expand All @@ -86,21 +85,21 @@
x = linspace(min(info.model.thresholds-3),max(info.model.thresholds+3),numPoints);
f = repmat(info.model.thresholds',numPoints,1) - repmat(x',1,Q-1);
cumProb = [1./(1+exp(-f)) ones(numPoints,1)]; %logit function
plot(x,cumProb,'-');
plot(x,cumProb,'-','LineWidth',1);
y1=get(gca,'ylim');
for i=1:size(info.model.thresholds,1)
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
end
hold off;

% Visualize the individual probabilities
figure; hold on;
prob = cumProb;
prob(:,2:end) = prob(:,2:end) - prob(:,1:(end-1));
plot(x,prob,'-');
plot(x,prob,'-','LineWidth',1);
y1=get(gca,'ylim');
for i=1:size(info.model.thresholds,1)
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
end
hold off;

Expand Down Expand Up @@ -158,14 +157,14 @@
plot(svorimProjections,test.targets, 'o');
y1=get(gca,'ylim');
for i=1:size(svorimThresholds,2)
line([svorimThresholds(i) svorimThresholds(i)],y1,'Color',[1 0 0]);
line([svorimThresholds(i) svorimThresholds(i)],y1,'Color',[1 0 0]);
end
legend('SVORIM');
subplot(2,1,2)
plot(svorexProjections,test.targets, 'o');
y1=get(gca,'ylim');
for i=1:size(svorexThresholds,2)
line([svorexThresholds(i) svorexThresholds(i)],y1,'Color',[1 0 0]);
line([svorexThresholds(i) svorexThresholds(i)],y1,'Color',[1 0 0]);
end
legend('SVOREX');
hold off;
Expand All @@ -185,37 +184,42 @@
%% Apply the REDSVM model
% Create the REDSVM object
algorithmObj = REDSVM();

% Train REDSVM
info = algorithmObj.runAlgorithm(train,test,struct('C',10,'k',0.001));

% Evaluate the model
fprintf('REDSVM method\n---------------\n');
fprintf('REDSVM Accuracy: %f\n', CCR.calculateMetric(test.targets,info.predictedTest));
fprintf('REDSVM MAE: %f\n', MAE.calculateMetric(test.targets,info.predictedTest));

%% REDSVM optimization
%% REDSVM optimization
clear T Ts;

Metrics = {@MZE,@AMAE};
setC = 10.^(-3:1:3);
setk = 10.^(-3:1:3);
% TODO: fix for Octave since table() is not supported
Ts = cell(size(Metrics,2),1);

% TODO: alternative Octave code
nFolds = 3;
CVO = cvpartition(train.targets,'KFold',nFolds);
for m = 1:size(Metrics,2)
mObj = Metrics{m}();
fprintf('Grid search to optimize %s for REDSVM\n', mObj.name);
bestError=Inf;
if (~exist ('OCTAVE_VERSION', 'builtin') > 0)
T = table();
end
for C=setC
for k=setk
param = struct('C',C,'k',k);
info = algorithmObj.runAlgorithm(train,test,param);
error = mObj.calculateMetric(test.targets,info.predictedTest);
for C=10.^(-3:1:3)
for k=10.^(-3:1:3)
error=0;
for ff = 1:nFolds
param = struct('C',C,'k',k);
info = algorithmObj.runAlgorithm(train,test,param);
error = error + mObj.calculateMetric(test.targets,info.predictedTest);

end
error = error / nFolds;
if error < bestError
bestError = error;
bestParam = param;
Expand All @@ -233,40 +237,44 @@
fprintf('\nBest Results REDSVM C %f, k %f --> %s: %f\n', bestParam.C, bestParam.k, mObj.name, bestError);
end

% Depending on matlab version we perform a different plot
if (exist ('OCTAVE_VERSION', 'builtin') > 0)
fprintf('This plot is not supported in octave\n');
fprintf('This type of graphic is not supported in Octave\n');
else
fprintf('Generating heat maps\n');
figure;
subplot(2,1,1)
if verLessThan('matlab','9.2')
Data = zeros(length(setC), length(setk));
for i=1:length(setC)
Data(i,:)= Ts{1}.error(Ts{1}.k==setk(i));
end
imagesc(Data);
colorbar;
else
heatmap(Ts{1},'C','k','ColorVariable','error');
end
title('MZE optimization for REDSVM');

subplot(2,1,2)
if verLessThan('matlab','9.2')
Data = zeros(length(setC), length(setk));
for i=1:length(setC)
Data(i,:)= Ts{2}.error(Ts{2}.k==setk(i));
end
imagesc(Data);
colorbar;
else
heatmap(Ts{2},'C','k','ColorVariable','error');
end
title('AMAE optimization for REDSVM');
if verLessThan('matlab', '9.2')
% Use contours
figure;
hold on;
for m = 1:size(Metrics,2)
mObj = Metrics{m}();
subplot(size(Metrics,2),1,m)
x = Ts{m}{:,1};
y = Ts{m}{:,2};
z = Ts{m}{:,3};
numPoints=100;
[xi, yi] = meshgrid(linspace(min(x),max(x),numPoints),linspace(min(y),max(y),numPoints));
zi = griddata(x,y,z, xi,yi);
contourf(xi,yi,zi,15);
set(gca, 'XScale', 'log');
set(gca, 'YScale', 'log');
colorbar;
title([mObj.name ' optimization for REDSVM']);
end
hold off;
else
% Use heatmaps
fprintf('Generating heat maps\n');
figure;
subplot(2,1,1)
heatmap(Ts{1},'C','k','ColorVariable','error');
title('MZE optimization for REDSVM');

subplot(2,1,2)
heatmap(Ts{2},'C','k','ColorVariable','error');
title('AMAE optimization for REDSVM');
end
end

%% Apply the KDLOR model
%% Apply the KDLOR model
% Create the KDLOR object
algorithmObj = KDLOR('kernelType','rbf');

Expand Down Expand Up @@ -298,13 +306,13 @@

y1=get(gca,'ylim');
for i=1:size(info.model.thresholds,1)
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
line([info.model.thresholds(i) info.model.thresholds(i)],y1,'Color',[1 0 0]);
end
%legend('C1','C2','C3','C4','C5');
legend(arrayfun(@(num) sprintf('C%d', num), 1:Q, 'UniformOutput', false))
hold off;

%% Apply the ORBoost model
%% Apply the ORBoost model
% Create the ORBoost object
algorithmObj = ORBoost('weights',true);

Expand Down Expand Up @@ -339,4 +347,5 @@
fprintf('SVORIM+SVOREX+POM Accuracy: %f\n', CCR.calculateMetric(test.targets,info.predictedTest));
fprintf('SVORIM+SVOREX+POM MAE: %f\n', MAE.calculateMetric(test.targets,info.predictedTest));

scatter(newTrain.patterns(:,1),newTrain.patterns(:,2),7,newTrain.targets);
scatter(newTrain.patterns(:,1),newTrain.patterns(:,2),7,newTrain.targets);

0 comments on commit 8e19e08

Please sign in to comment.