Samuel Ntsua samuel-ntsua

Hi there, I'm Samuel Ntsua

Analyst at UNC-Chapel Hill

I am enthusiastic about data, around which I formulate reliable and rational arguments to transform business rules and concepts from often ambiguous and incomplete instruction into a working programming logic. At my current job, I harvest, move, transform, and store data while automating the process. I write scripts in bash, PowerShell, python, SQL, and Stata to build multi-panel and hierarchical datasets out of administrative data and survey sampling data. I am seeking an opportunity to join a data team at mid-career level as Data Scientist, Data Engineer, or Machine Learning Engineer to propel the team's efforts and challenge myself in a production environment.

Skills

Connect With Me

Stock Exchange Data Analysis using Big-Data tools such as Hadoop, HIVE and Sqoop.

Objectives

To use HIVE and Sqoop features for data engineering or analysis and sharing the actionable insights.

Technology/Techniques Used

python3 mysql hiveQL hue-api hadoop-hdfs sqoop-import

DataScience_Capstone_Project

Objectives

Predict whether or not a patient has diabetes , based on certain diagnostic measurements included in the dataset.
Build a model to accurately predict whether the patients in the dataset have diabetes or not.

Technology/Techniques Used

Pandas NumPy machine-learning-algorithms scikit-learn xgboost missing-values analysis dimensionality reduction seaborn-plots extratrees GitLab

Mercedes-Benz Greener Manufacturing

Objectives

Used Xgboost to narrow down features, yet get a good prediction of vehicule safety standard, thus reducing the time a Mercedes-Benz spends on the test bench.

Technology/Techniques Used

Pandas NumPy machine-learning-algorithms scikit-learn xgboost label encoder dimensionality reduction seaborn-plots GitLab

Data Science with R Programming

Objectives

To record the patient statistics, the agency wants to find the age category of people who frequent the hospital and has the maximum expenditure.
In order of severity of the diagnosis and treatments and to find out the expensive treatments, the agency wants to find the diagnosis related group that has maximum hospitalization and expenditure.
To make sure that there is no malpractice, the agency needs to analyze if the race of the patient is related to the hospitalization costs.
To properly utilize the costs, the agency has to analyze the severity of the hospital costs by age and gender for proper allocation of resources. Since the length of stay is the crucial factor for inpatients, the agency wants to find if the length of stay can be predicted from age, gender, and race.
To perform a complete analysis, the agency wants to find the variable that mainly affects the hospital costs.

Technology/Techniques Used

r-programming-language/rstudio supervised learning linear regression GitLab

DataScience_with_Python

Objectives

Technology/Techniques Used

Pandas NumPy supervised learning linear regression scikit-learn xgboost seaborn-plots GitLab

Tableau_project

Objectives

Compute and display a Country's economic growth indicator as well as the percentage of it's population who purchased life insurance.

Technology/Techniques Used

Tableau public growth-kpi linear-trend kpi-dashboard data merge statistical measures computation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samuel Ntsua samuel-ntsua

Block or report samuel-ntsua

Hi there, I'm Samuel Ntsua

Analyst at UNC-Chapel Hill

Skills

Connect With Me

Stock Exchange Data Analysis using Big-Data tools such as Hadoop, HIVE and Sqoop.

Objectives

Technology/Techniques Used

DataScience_Capstone_Project

Objectives

Technology/Techniques Used

Mercedes-Benz Greener Manufacturing

Objectives

Technology/Techniques Used

Data Science with R Programming

Objectives

Technology/Techniques Used

DataScience_with_Python

Objectives

Technology/Techniques Used

Tableau_project

Objectives

Technology/Techniques Used

Pinned