Capstone project.
Sebastian Thomas @ neue fische Bootcamp Data Science
(datascience at sebastianthomas dot de)
When ordering medicines, hospitals have to deal with a multitude of different article descriptions for identical products. With cat-AI-log, article duplicates and similar articles can be found and articles can be catalogued in product groups using human-assisted artificial intelligence.
cat-AI-log...
- finds similar articles in different forms
- recognizes dosage forms and physical quantities
- handles spelling mistakes of user
- handles spelling mistakes in data
- allocates known and similar articles correctly in many cases
Main analysis:
- Part 1: Data mining
- Part 2: Data preprocessing (Data cleaning, Feature engineering)
- Part 3: Exploratory data analysis
- Part 4: Predictive analysis
- Part 5: The search engine
- Part 6: Visualization
- Part 7: Unit tests
Side path Data Mining:
- Mining 1: pdf mining of IFA dosage forms
- Mining 2: html mining of DocMorris dosage forms
- Mining 3: Construction of a replacement dictionary from DocMorris to IFA
Main development:
- Spelling correction (Unit tests)
- Search (Unit tests)
- Quotient Extraction (Unit tests)
- Helper
- Another rough helper
Demo web frontend:
Due to publication restrictions, the data and the output of the project are not provided.
- iterative process of AI and specialist will improve data
- this in turn will improve AI
- ordering of search results could be improved
- with more data (e.g. active ingredients), recommender for generics could be built
- let cat-AI-log learn from former spelling mistakes to improve performance