Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Previously Asked Questions (PAQ) #26

Open
tobybreckon opened this issue Sep 11, 2019 · 0 comments
Open

Previously Asked Questions (PAQ) #26

tobybreckon opened this issue Sep 11, 2019 · 0 comments
Assignees

Comments

@tobybreckon
Copy link
Owner

tobybreckon commented Sep 11, 2019

A set of collated previously (frequently) asked questions - for things not covered in the existing README or in the full accompanying (peer-reviewed) research papers [Dunnings and Breckon, In Proc. International Conference on Image Processing, IEEE, 2018] and [Samarth, Bhowmik and Breckon, In Proc. International Conference on Machine Learning Applications, IEEE, 2019].

Before posting an issue please check here first (in addition to the README and papers) ...

Q: How many epochs did you train the models for?
A: 150 (2018 for paper models), 30 (for 2019 paper models)

Q: Did you train the network architectures you use from scratch or use transfer learning from another dataset ?
A: Trained from scratch - because the bespoke simplified architectures we defined meant we couldn't simply copy across weights from the larger (original) architecture trained on say ImageNet. Similarly, pre-training these bespoke, simple architectures on a complex task such as ImageNet 1000 class object classification made no sense - they would perform very poorly indeed as they are much simpler, cut-down versions of the original.

Q: How were the labels encoded during training ?
A: The one-hot encoding was made alphabetically with 'fire' being the first class (0) and 'nofire' the second (1) such that:

  • fire= (1,0)
  • nofire = (0,1)

Q: Did you normalize your input pixels to the range 0 -> 1 ( /= 255)?
A: No we didn't - I checked all the original code and it appears this was not done in any of train/validation/test, but I accept it would have been a good idea.
To confirm - all our results are on the basis of inputs in the range 0->255 across all three colour channels. This is a clear oversight it appears. If you do normalize the dataset images for future work it may be better to try (p_normalised = (p
-127.5)/255) for given pixel p, to give you a -/+ normalized input as is common in the default training for this type/generation of network.

Q: When I train a model using {VGG.., ResNet.., DenseNet, ..., some other new very deep/complex net architecture} I don't get the same performance you report. Why is this ?
A: The whole premise of our papers is that simpler network architectures appear to outperform more complex ones for the fire detection task - hence our networks are experimentally defined sub-architectures of larger ones. You are seeing the exact point we make in the papers in action.

Q: Is there code available for network training?
A: No. In order to make the repository easier to maintain only inference code, trained models and the dataset is made available. The models we use are quite basic architectures that can be readily dropped into any mainstream tutorial on how to train a deep convolutional neural network for image classification.

Q: Can I use this with <insert some other tool/framework>?
A: Yes, you are free to do so but tools beyond the listed dependencies of the project (see README) are beyond the scope of what we can immediately provide any help/support on. Details in the README show how to convert the trained models for use with other deep learning frameworks.

Q: Does this network perform smoke detection (after all "there is no smoke without fire")?
A: No. As the name suggests it does fire detection. For smoke detection see this work this repository for example (many others exist).

Q: When I tried your network it doesn't get the fire detection correct all of the time ?
A: see the paper, statistical accuracy is ~93%+, not 100% (as fire detection is more difficult than one may imagine, especially for reducing false positives).

Q: When I use this network on completely different resolution imagery than what it was trained on (in the dataset), I get very different results?
A: The basic principles of machine learning tell us that any machine learning model will perform differently on data that is drawn from a different distribution than the model was trained on. You are seeing this principle in action.

Q: I have this problem .. < insert details > .. with getting this to work under Microsoft Windows. Can your help?
A: Sorry - support is provided for Linux and Unix based operating systems only. Perhaps try using the Linux subsystem for MS Windows or installing Linux.

Q: Did you experiment with and without the use of LRN and/or dropout?
A: Yes, we found the effect of both was negligible in the overall statistical performance of the network. Following best practice to avoid overfitting for network architectures of this type, both are used in the final trained models.

Q: I get an error with one of the import lines in Python or at the line using ximgproc in the superpixel example script. Can you help ?
A: Make sure all of the required dependencies are installed correctly and working including OpenCV with the additional modules built/enabled. You may find a subset of the tools in this test repo useful and also this OpenCV test script.

Q: I am quite new to deep learning and don't really understand part of your architecture. Can you help ?
A: The architectures we use are based on some of the most established, written about and discussed architectures in the field - in fact they are simplified versions of them. Please read the research papers, then read the references from the papers on the key architectures we reference - if you need background please refer to http://www.deeplearningbook.org/.

Q: Did you prepare the validation and test sets in the same way as you described for the training set?
A: Yes - see [Dunnings/Breckon 2018] paper p.4, column 1, para 2, "CNN training and evaluation ...."

Q: Are the accuracy/F/P scores averaged over all predicted fire superpixels or over all images for which superpixels had been classified?

A: No - as this is a binary problem (fire = {1,0}) they are calculated over all the superpixels for which we have ground truth (i.e. labelled as fire/no-fire in the available dataset).
This is done at a per-superpixel in Table 3 (lower) and per-image* in Table 3 (upper) of the [Dunnings/Breckon, 2018] paper for the dataset as set out in the dataset download provided (README.txt in the dataset download steers you to the exact training/test data).

*i.e. if image contains a fire superpixel then image contains a fire (processing on N superpixels, not whole image)

Q: How did you convert the ground truth annotation for images (was it based on bounding boxes?) to ground truth for individual superpixels for training?
A: We don't have ground truth annotation for whole/full images in terms of where the fire is in our own dataset (bounding box or binary mask) - all the image are only globally labelled as containing fire = {true, false}

We then split a subset of these into individual superpixels - all of them (!) - and manually sorted them as {superpixel of fire} and {superpixel not of fire}. As per the dataset (see available download) - each superpixel is stored "in place" in the full image frame (rest of image black, superpixel has original colour). This is used for training.

Q: How did you merge and convert positive superpixels for each image into a bounding rectangle for computing the similarity metric shown in the paper [Dunnings/Breckon, 2018]?
A: We create a binary image (+ve fire superpixels as +1, -ve fire superpixels as 0 - or similar) - we then compute concave contour of this region using the the douglas-peucker algorithm (OpenCV - findContours() - https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm) and then find the min (x,y) and max (x,y) values from this contour. From these 4 min/max x and y values (2 for x, 2 for y) we can compute the bounding rectangle which we then use for the similarity metric (S in our [Dunnings/Breckon, 2018] paper, computed as per ref [23]).

To then generate this similarity metric against images with bounding box ground truth - we used the ground truth annotation available from Steffens et al. [23], not our own dataset (in our [Dunnings/Breckon, 2018] paper S only appears in Table 3 lower against the Steffens et al. dataset).

Q: How do I use this repository code to train my own models ?
A: This repository provides inference / run-time code only, not training code (see earlier). If you wish to train your own model - follow a tutorial in the documentation for your deep machine learning environment of choice (e.g. Tensorflow/Pytorch/TfLearn/Keras/MXNet..) on how to (a) define your own custom model architecture and (b) train a model from using a custom dataset. Subsequently, use this knowledge to define a model architecture matching those described in our code/papers/diagrams and also to train from our fire dataset that you have downloaded (see README.txt file within the dataset download for details on which images to train on).

Q: When I convert / load the model into another deep learning framework the performance seems to differ ? (or it always returns "no fire" or always "fire")
A: Make sure that:

  • your use of one-hot labelling it the same as the one we used for {fire, not fire} (see Q/A above) and not reversed
  • the input pixels are in the original 0->255 range and not rescaled to -1->1 or 0->1 otherwise the network will fail (for frameworks/tools that subtract a mean channel value and rescale using a standard deviation, specify a zero mean and standard deviation of 1 for the input to retain the original 0->255 pixel range)
  • the channel ordering for your image is BGR (as is standard for OpenCV) and not RGB (as is often standard in other frameworks and image loaders) if you are using the FireNet, InceptionV1onFire or InceptionV3OnFire / InceptionV4OnFire binary models. For the superpixel InceptionV3OnFire / InceptionV4OnFire models use RGB channel ordering.

Q: How do I get this to work with TensorFlow 2.0?

A: The only available workflow to make this repo work on your system is via a Tensorflow 1.x virtual environment as TFLearn is not supported via TensorFlow 2.x. Install the required packages for a virtual environment as follows ...

virtualenv --system-site-packages -p python3 ./venv/tf-1.1.5-gpu
source ~/venv/tf-1.1.5-gpu/bin/activate
pip install tensorflow-gpu==1.15
pip install tflearn
....
python3 firenet.py models/test.mp4

This assumes you have opencv installed system wide with the extra modules enabled as required.

Q: In the 2019 paper that you use a 80:20 split for your dataset into training and validation and then have an additional set for cross validation - "An additional set of 2,931 images was used for cross validation." Is this correct, as you appear to have two validation datasets ?
A: Sorry this appears to be an error in the paper and should say "... dataset is split (80:20 split) into two portions for training and validation. An additional set of 2,931 images was used for statistical evaluation (i.e. testing)." It is slightly better stated in the 2018 paper, where the numbers match but we use a 70:30 split instead.

Q: I converted the ...V3/V4-OnFire models to protocolbuf (pb) and tflite using your scripts but the performance seems really bad - what I am doing wrong ? / why is this so ?

A: As per the caveat in the main README - if you need to convert the models to protocol buffer (.pb) format (used by OpenCV DNN, TensorFlow, ...) and also tflite (used with TensorFlow) then use the FireNet or InceptionV1-OnFire / SP-InceptionV1-OnFire versions at the moment as, due to a long-standing issue in TensorFlow with the export (freezing) of the Batch Normalization layers, the protocol buffer (.pb) format and .tflite versions of the ...V3-OnFire and ...V4-OnFire have significantly lesser performance (which is to be expected, given the approach we use to workaround the problem).

An alternative is to perhaps convert from the original format (tensorflow checkpoint) to others (such as PyTorch, MXNet, Keras, ...) using the extensive deep neural network model conversion tools offered by the MMdnn project.

Q: ... ?
A: ...

@tobybreckon tobybreckon self-assigned this Sep 11, 2019
@tobybreckon tobybreckon pinned this issue Sep 11, 2019
@tobybreckon tobybreckon changed the title Previous Asked Questions (PAQ) Previously Asked Questions (PAQ) Sep 11, 2019
Repository owner deleted a comment from netrosec Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant