Skip to content
/ OpenCCML Public

Category: Cloud Computing and Machine Learning Application - Subject: A cloud platform to make data processing with machine learning algorithms, built on Openstack, using Spark for data distribution and Hadoop Filesystem for data storage

Notifications You must be signed in to change notification settings

rshad/OpenCCML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

OpenCCML: OpenStack Cloud Computing Machine Learning Platform

Category:

  • Cloud Computing and Machine Learning Application

Subject:

  • A platform to make data processing on a cloud platform, build on Openstack, using Spark for data distribution and Hadoop Filesystem for data storing

Project Story:

  • This project represents the final graduation project of my Computer Science Degree in the University of Granada.

Table of Content

  1. Project Description

Project Description

This project consists on the implementation of a cloud computing platform to make data processing of data sets provided by the users. This data processing consists on the application of a determined machine learning algorithms chosen by the user.

To make this process done, firstly we'll implement the platform on OpenStack platform, se it'll provide to our application all the related with the virtual machines, the images, the volumes, cpu's ..., etc.

Once we get the required machines, we use Spark to make the right distribution of the data, to be processed.

To store the data sets to be processed, we gonna use Hadoop Filesystem.

clustering-openstack

Project Description

In this project we pretend to describe the mecanism, used to deploy a cluster on Spark. We gonna describe the

Table of Content

  1. Introduction

  2. OpenStack commands used in this project

  3. Scripting

  4. Spark configuration

Create instances

To launch an instance, we must at least specify the flavor, image name, network, security group, key, and instance name.

We will work with the Name or ID of the elements that we will show:


List the flavors

openstack flavor list 

flavor list

A flavor specifies a virtual resource allocation profile which includes processor, memory, and storage.


List the availabel images

openstack image list

Screenshot: List of images image list

We will use CentOS7 or Fedora images


List the availabel networks

openstack network list

network list

List the created security groups

openstack security group listlaunch a cluster of master and slaves cirtual machines on openstack command line

Screenshot: Security groups scurity group


Create the instance

Example:

openstack server create --flavor XXXXX --image XXXXXX  --nic net-id=XXXXXXX --security-group XXXXXX  --key-name XXXXXXX provider-instance

With our data:

openstack server create --flavor 3 --image CentOS7  --nic net-id=55c3bd97-fef8-47cf-bde7-a7f6c22f2d2c --security-group default --key-name rashadkey provider-instance

created instance


IP Floating

  • List of Floating IP

openstack floating ip list

list of floating ip

For each floating IP address that is allocated to your project, the command outputs the ID of the floating IP address, the actual floating IP address, the private IP address of the instance the floating IP address is associated with, and the ID for the port that the floating IP address is connected to.


  • Disassociate floating IP of an instance:

Firstly, let us see the availabel instances "servers": server list

Let's disassociate the floating IP of the instance with the name, CirrOS-cloud-init. We can see that its floating IP is 192.168.10.53. To disassociate it, we do the following:

openstack server remove floating ip CirrOS-cloud-init 192.168.10.53

Screenshot: Disassociate Floating IP: disassociate floating ip

We can see how this floating ip is no longer associated to the specified instance.


  • Create floating IP

To associate a floating IP, we can use an exisiting one or create a new one and assign it to our project, before assign it to an instance.

openstack floating ip create <network>

create ip floating

We can see that network in our example, is external.


  • Associate floating IP to an instance

openstack server add floating ip CirrOS-cloud-init 192.168.10.68

associated floating ip

Now we can see how the instance called, CirrOS-cloud-init, has an associated floating IP, and it's, 192.168.10.68; which we created in previous step.


Assign an internal IP (Private Network)

Actually, we in conter to the floating IP, the internal IP is given automatically to the instance when it's created. So we need to assign an internal IP to an instance, only in 2 cases:

  1. Change the actual internal IP
  2. Assign it in case of an un associated IP to the instance.

Script

The script will allow us to create an instance environment with specific configuration parameters.

Input Parameters:

  • Options: {start, status and delete}

    • Start: Create a new cluster with a name;
    • Delete: Remove all instances assiciated to the cluster;
    • Status: check the status of the cluster.
  • Name of the cluster (identifier of the cluster) # On creating, deleting or checking

  • Number of master nodes # On creating

  • Number of slave nodes # On creating

  • IP master node (floating) # On creating

  • IP slaves nodes (internal) # On ceating

  • Flavor of the set of instances # On creating

  • Network Name for the instances # On creating

  • Security group # On creating # On creating

  • Key Name for the instances # On creating

  • Image for the instances # On creating

Parameter Code:

import sys
import random,sys,os,math
import argparse
import json


def main():

	parser = argparse.ArgumentParser(description='ClusterOpenStack')

	parser.add_argument('-op','--operation', help='Operation of the cluster', required=True)
	parser.add_argument('-name','--name', help='Name of the cluster', required=True)
	parser.add_argument('-nm','--nummasters', help='Num Masters', required=True)
	parser.add_argument('-ns','--numslaves', help='Num Slaves', required=True)
	parser.add_argument('-ipm','--ipmasters', help='IPs of Masters', required=True)
	parser.add_argument('-ips','--ipslaves', help='IPs of Slaves', required=True)
	parser.add_argument('-fl','--flavor', help='Flavor of the instances', required=True)
	parser.add_argument('-n','--network', help='Network name or ID', required=True)
	parser.add_argument('-s','--security', help='Security name', required=True)
	parser.add_argument('-i','--image', help='Image name', required=True)
	

	args = vars(parser.parse_args())

	print args.operation
	print args.name


main()

Configure the instances of the cluster

In this section, we want to make a study on how can we can install Apache and PHP -for example- in the created instances for a cluster, using Ansible.

Actually In this README file of this project, I'm gonna tell you what's necessary for this project, so if you want to go on deeper study on Ansible, I'm making a study on Ansible on another repository, you can find it here.

When we talk about ansible, we have to talk about its playbooks.

Playbooks are expressed in YAML format see YAML Syntax and have a minimum of syntax, which intentionally tries to not be a programming language or script, but rather a model of a configuration or a process.

Each playbook is composed of one or more ‘plays’ in a list.

The goal of a play is to map a group of hosts to some well defined roles, represented by things ansible calls tasks. At a basic level, a task is nothing more than a call to an ansible module (see About Modules).

Spark configuration.

Bibliograpgy


About

Category: Cloud Computing and Machine Learning Application - Subject: A cloud platform to make data processing with machine learning algorithms, built on Openstack, using Spark for data distribution and Hadoop Filesystem for data storage

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages