Skip to content

Rishi500067313/Corona_data_analysis_using_hive

Repository files navigation

Corona_data_analysis_using_hive

Choose a domain where you can apply big data analysis. In the selected domain:

  1. Write a schema of some important tables.
  2. Identify 10 problems that an individual might face.
  3. Find a solution to these challenges using big data analysis. Explain your solution with the help of HiveQL.

My chosen domain is Government and Health Sector.

There are 5 tables of different data of specifically current COVID-19 pandemic, they are named as: -

  1. Total_India_case
  2. Datewise_India_case
  3. Daily_casedeath
  4. CFR
  5. Total_confirm_permillion

The Question and teir solutions are as follows:

  1. What is the total number of confirmed deaths due to current pandemic in India? This is calculated using function sum () on table total_india_case in hive whose result came out to be 489 till 2020-04-18.

    image

  2. What is the daily number of confirmed cases in our country? This was calculated using different function and operation on queries i.e. sum (), count () whose result came out to be 240.8113 till 2020-04-18.

    image

  3. In India, which state is adversely affected due to Covid-19 mass-spread? This problem was calculated using simple SQL concepts of where clause and multiple conditions on required table. The result is clear these 6 states are conditioned in worse state of pandemic.

    image

  4. Which state is in good condition and is predicted to recover soon from this mass-spread? This problem is to figure out the recovering state or the state in good condition, here multiple conditions were used on the table to get result, here the % of recoverins where lesser the % better is the result.

    image

  5. What is the case fatality rate (CFR) of each state in India? CFR is an important factor in determining the extend of mass-spread which awakens us and is easy to calculate

    image

    CFR is not constant it varies with many parameters but as per acquired and available data, this is the current CFR of each state. When there are people who have the disease but are not diagnosed, the CFR will overestimate the true risk of death. There might be many undiagnosed people. So testing is also major factor on which CFR depends.

    image

  6. What is the current average CFR of India? What does it tell us? This is the problem where we used SQL function avg () to calculate the average CFR in whole India, which turned out to be 2.566.

    image

  7. Maharashtra has been declared severely affected and so many others. How much has the Maharashtra and all other affected states contributed to the Total Indian cases of Covid-19? This problem took two queries to complete as together they got very complex. Here Contribution in total confirmed cases was measured firstly taking the total confirmed cases in India and then Using it in the formula In the query using max (), group by etc…

Contribution per state = (Total confirmed in state / Total in India) *100

image

image

  1. Which state is considered to be untouched/not-affected with the Covid-19 virus?

    This problem is simple SQL query to find a state where the effect of pandemic and mass-spread is almost null. The query result came out to be Nagaland.

    image

  2. What is the estimated IFR (Infection Fatality Rate) for the whole India?

    This problem is solved using various function and formula where the data was limited and not updated regularly so IFR for India till date 2020-04-18 is 3.305

    The IFR = number of deaths from a disease divided by the total number of cases.

    image

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published