Skip to content

cardosop/Big-Data

Repository files navigation

Big Data Specialization

https://www.coursera.org/specializations/big-data

Drive better business decisions with an overview of how big data is organized, analyzed, and interpreted. Apply your insights to real-world problems and questions.

Do you need to understand big data and how it will impact your business? This Specialization is for you. You will gain an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers. Previous programming experience is not required! You will be guided through the basics of using Hadoop with MapReduce, Spark, Pig and Hive. By following along with provided code, you will experience how one can perform predictive modeling and leverage graph analytics to model problems. This specialization will prepare you to ask the right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex datasets. In the final Capstone Project, developed in partnership with data software company Splunk, you’ll apply the skills you learned to do basic analyses of big data.

21 weeks - 147 hours

Certificate: https://www.coursera.org/account/accomplishments/specialization/certificate/JHYY9FP3L7UR

Course 1: Introduction to Big Data

https://www.coursera.org/learn/big-data-introduction/home/info

Interested in increasing your knowledge of the Big Data landscape? This course is for those new to data science and interested in understanding why the Big Data Era has come to be. It is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems. It is for those who want to start thinking about how Big Data might be useful in their business or career. It provides an introduction to one of the most common frameworks, Hadoop, that has made big data analysis easier and more accessible -- increasing the potential for data to transform our world!

At the end of this course, you will be able to:

  • Describe the Big Data landscape including examples of real world big data problems including the three key sources of Big Data: people, organizations, and sensors.

  • Explain the V’s of Big Data (volume, velocity, variety, veracity, valence, and value) and why each impacts data collection, monitoring, storage, analysis and reporting.

  • Get value out of Big Data by using a 5-step process to structure your analysis.

  • Identify what are and what are not big data problems and be able to recast big data problems as data science questions.

  • Provide an explanation of the architectural components and programming models used for scalable big data analysis.

  • Summarize the features and value of core Hadoop stack components including the YARN resource and job management system, the HDFS file system and the MapReduce programming model.

  • Install and run a program using Hadoop!

This course is for those new to data science. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments.

3 weeks - 18 hours

Certificate: https://www.coursera.org/account/accomplishments/verify/KUW2YVYSAHF3

Course 2: Big Data Modeling and Management Systems

https://www.coursera.org/learn/big-data-management/home/info

Once you’ve identified a big data issue to analyze, how do you collect, store and organize your data using Big Data solutions? In this course, you will experience various data genres and management tools appropriate for each. You will be able to describe the reasons behind the evolving plethora of new big data platforms from the perspective of big data management systems and analytical tools. Through guided hands-on tutorials, you will become familiar with techniques using real-time and semi-structured data examples. Systems and tools discussed include: AsterixDB, HP Vertica, Impala, Neo4j, Redis, SparkSQL. This course provides techniques to extract value from existing untapped data sources and discovering new data sources.

At the end of this course, you will be able to:

  • Recognize different data elements in your own work and in everyday life problems
  • Explain why your team needs to design a Big Data Infrastructure Plan and Information System Design
  • Identify the frequent data operations required for various types of data
  • Select a data model to suit the characteristics of your data
  • Apply techniques to handle streaming data
  • Differentiate between a traditional Database Management System and a Big Data Management System
  • Appreciate why there are so many data management systems
  • Design a big data information system for an online game company

This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments. Refer to the specialization technical requirements for complete hardware and software specifications.

6 weeks - 18 hours

Certificate: https://www.coursera.org/account/accomplishments/certificate/KZFQT5WV4DVN

Course 3: Big Data Integration and Processing

https://www.coursera.org/learn/big-data-integration-processing/home/info

At the end of the course, you will be able to:

*Retrieve data from example database and big data management systems *Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications *Identify when a big data problem needs data integration *Execute simple big data integration and processing on Hadoop and Spark platforms

This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments. Refer to the specialization technical requirements for complete hardware and software specifications.

6 weeks - 48 hours

Certificate: https://www.coursera.org/account/accomplishments/certificate/45HGG4UPUBQL

Course 4: Machine Learning With Big Data

https://www.coursera.org/learn/big-data-machine-learning/home/info

Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to scale those models up to big data problems.

At the end of the course, you will be able to: • Design an approach to leverage data using the steps in the machine learning process. • Apply machine learning techniques to explore and prepare data for modeling. • Identify the type of machine learning problem in order to apply the appropriate set of techniques. • Construct models that learn from data using widely available open source tools. • Analyze big data problems using scalable machine learning algorithms on Spark.

5 weeks - 25 hours

Certificate: https://www.coursera.org/account/accomplishments/certificate/7LLFLJSE9CEE

Course 5: Graph Analytics for Big Data

https://www.coursera.org/learn/big-data-graph-analytics/home/info

Want to understand your data network structure and how it changes under different conditions? Curious to know how to identify closely interacting clusters within a graph? Have you heard of the fast-growing area of graph analytics and want to learn more? This course gives you a broad overview of the field of graph analytics so you can learn new ways to model, store, retrieve and analyze graph-structured data.

After completing this course, you will be able to model a problem into a graph database and perform analytical tasks over the graph in a scalable manner. Better yet, you will be able to apply these techniques to understand the significance of your data sets for your own projects.

4 weeks - 20 hours

Certificate: https://www.coursera.org/account/accomplishments/certificate/WWACALWXN32G

Course 6: Big Data - Capstone Project

https://www.coursera.org/learn/big-data-project/home/info

Welcome to the Capstone Project for Big Data! In this culminating project, you will build a big data ecosystem using tools and methods form the earlier courses in this specialization. You will analyze a data set simulating big data generated from a large number of users who are playing our imaginary game "Catch the Pink Flamingo". During the five week Capstone Project, you will walk through the typical big data science steps for acquiring, exploring, preparing, analyzing, and reporting. In the first two weeks, we will introduce you to the data set and guide you through some exploratory analysis using tools such as Splunk and Open Office. Then we will move into more challenging big data problems requiring the more advanced tools you have learned including KNIME, Spark's MLLib and Gephi. Finally, during the fifth and final week, we will show you how to bring it all together to create engaging and compelling reports and slide presentations. As a result of our collaboration with Splunk, a software company focus on analyzing machine-generated big data, learners with the top projects will be eligible to present to Splunk and meet Splunk recruiters and engineering leadership.

6 weeks - 24 hours

Certificate: https://www.coursera.org/account/accomplishments/certificate/JHGLL7CE69CQ

About

UCSD COURSERA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published