Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzes regarding Duplicate Reimbursements #286

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

silviodc
Copy link

@silviodc silviodc commented Oct 6, 2017

Detecting duplicate Reimbursements using dhash.

The last commit has the notebook to detect duplicate Reimbursements. It uses hash and hamming distance.

The other files concerns future implementation as: CFMT block (Compact Fourier Mellin Transform) to be more precise during the detection.

It is related to issue: #32

…-amor

# Conflicts:
#	conda_requirements.txt
#	research/Dockerfile
#	research/requirements.txt
# Including pdf > png
# png > sift descriptors
# png > keras classifier
PDF to PNG ok
PNG to SIFT (error in opencv)
Change the workflow for png references
Download files OK
Split Files ok
Testing trianing ...
# Building Reference Dataset ok
# Building Keras model and evaluation OK
# PDF-> PNG OK
# Using dhash to detect near duplications.
@silviodc silviodc changed the title Duplicate Image Detection in Reimbursements First steps to detect Duplicate Reimbursements Oct 6, 2017
# Inclusion of Fourier transformation to detect rotation, zoom, and filters.
@silviodc silviodc changed the title First steps to detect Duplicate Reimbursements Analyzes regarding Duplicate Reimbursements Oct 6, 2017
@anaschwendler
Copy link
Collaborator

anaschwendler commented Oct 11, 2017

Hi @silviodc, thanks for the contribuition!

What I did to test this PR:

  1. Clone the project:
$ git clone git@github.com:datasciencebr/serenata-de-amor.git
  1. Change to serenata's folder:
$ cd serenata-de-amor 
  1. Change to @silviodc's branch:
$ git checkout -b silviodc-silvio-cardoso master
$ git pull https://github.com/silviodc/serenata-de-amor.git silvio-cardoso
  1. The steps to run the project:
$ conda update conda
$ conda create --name serenata_de_amor python=3
$ source activate serenata_de_amor
$ ./setup
  1. Open the jupyter notebook from the project:
$ jupyter notebook
  1. Access http://localhost:8888/notebooks/research/develop/2017-05-05-silvio-Detecting-duplicates.ipynb

I really liked your work on it, looks real impressive!
Is there something that you are aiming to do more?

There is only one thing that I'll ask you, and then for me we can merge it!

tensorflow>=1.2.1
h5py>=2.7.0
Pillow>=4.2.1
opencv-python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the version of the libraries that you are using? It helps in case they change something that make it not working ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@silviodc
Copy link
Author

Is there something that you are aiming to do more?

No. I guess i finished with these analyses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants