Skip to content

Commit

Permalink
Flume TwitterAgent.conf and demo02
Browse files Browse the repository at this point in the history
  • Loading branch information
lucas91batista committed Sep 13, 2019
1 parent 0471782 commit eda64c0
Show file tree
Hide file tree
Showing 2 changed files with 245 additions and 0 deletions.
200 changes: 200 additions & 0 deletions labs/lab6-flume/lab6-flume.demo02.ipynb
@@ -0,0 +1,200 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Criando uma App no Twitter\n",
"\n",
"## Acessar o endereço abaixo e criar uma App: https://apps.twitter.com/\n",
"\n",
"Criar login, senha e logar\n",
"\n",
"Criar uma nova App clicando em Create New App\n",
"\n",
"Definir os detalhes da aplicação: nome, descrição, website, etc\n",
"\n",
"**No menu \"Keys and Tokens\" gerar as chaves da App para usar na configuração do Flume e substituir no arquivo twitterAgent.conf abaixo.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Configurando o agent, source, channel e sink\n",
"\n",
"**Agent**: Apenas um agente chamado *TwitterAgent*\n",
"\n",
"**Source**: Twitter\n",
"\n",
"**Channel**: Memória\n",
"\n",
"**Sink**: Registra os dados no HDFS"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"#Nome dos componentes do agente\n",
"TwitterAgent.sources = Twitter\n",
"TwitterAgent.channels = MemChannel\n",
"TwitterAgent.sinks = HDFS\n",
"\n",
"#Configuração do Source\n",
"TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource\n",
"TwitterAgent.sources.Twitter.consumerKey = 6BWSmQX6AUfKhcNsMeav9zhi2\n",
"TwitterAgent.sources.Twitter.consumerSecret = DcYHM3EFR5oJR7VEq8cBVtjTPQxftI9PMrST71P7oXW0BlGiZv\n",
"TwitterAgent.sources.Twitter.accessToken = 1046705580-MTYNfMbLL6XSyQQgeL3Sah9RejwDRK5caBO9GRZ\n",
"TwitterAgent.sources.Twitter.accessTokenSecret = 2FRFyHQEdAIFVCFgTxyxCvl4zqoNtTEMZuwJHCfhXW2jk\n",
"TwitterAgent.sources.Twitter.keywords = #hadoop, #flume, #bigdata\n",
"\n",
"#Configuração do Sink\n",
"TwitterAgent.sinks.HDFS.type = hdfs\n",
"TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/matheus\n",
"TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream\n",
"TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text\n",
"\n",
"#number of events written to file before it is flushed to HDFS\n",
"TwitterAgent.sinks.HDFS.hdfs.batchSize = 50 \n",
"\n",
"#File size to trigger roll, in bytes (0: never roll based on file size)\n",
"TwitterAgent.sinks.HDFS.hdfs.rollSize = 0\n",
"\n",
"#Number of events written to file before it rolled (0 = never roll based on number of events)\n",
"TwitterAgent.sinks.HDFS.hdfs.rollCount = 50\n",
"\n",
"#Number of seconds to wait before rolling current file (0 = never roll based on time interval)\n",
"TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0\n",
"\n",
"#Configuração do Channel\n",
"TwitterAgent.channels.MemChannel.type = memory\n",
"#The maximum number of events stored in the channel\n",
"TwitterAgent.channels.MemChannel.capacity = 100\n",
"#The maximum number of events the channel will take from a source or give to a sink per transaction\n",
"TwitterAgent.channels.MemChannel.transactionCapacity = 100\n",
"\n",
"#Conectando Source, Sink, Channel\n",
"TwitterAgent.sources.Twitter.channels = MemChannel\n",
"TwitterAgent.sinks.HDFS.channel = MemChannel\n",
"\n",
"\n",
"#https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json\n",
"#https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object"
]
}
],
"source": [
"!cat /home/jovyan/labs/lab6-flume/twitterAgent.conf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Nota: \n",
"Usando as configurações descritas no site do Flume - **TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource** - na configuração do source, o arquivo é gerado com caracteres ilegíveis. Assim, iremos utilizar **TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource** e temos que copiar o arquivo flume-sources-1.0-SNAPSHOT.jar para a pasta do Flume para o seu correto funcionamento."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"!cp resources/flume-sources-1.0-SNAPSHOT.jar ~/resources/local/flume-${FLUME_VERSION}/lib"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Abrir um terminal e iniciar o FlumeAgent\n",
"\n",
"``` bash\n",
"flume-ng agent --conf conf --conf-file labs/lab6-flume/twitterAgent.conf --name TwitterAgent -Dflume.looger=INFO,console\n",
"``` "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TrendTopics\n",
"Criando um rank de palavras que contém #"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"No configs found; falling back on auto-configuration\n",
"No configs specified for hadoop runner\n",
"Looking for hadoop binary in /home/jovyan/resources/local/hadoop-2.9.2/bin...\n",
"Found hadoop binary: /home/jovyan/resources/local/hadoop-2.9.2/bin/hadoop\n",
"Using Hadoop version 2.9.2\n",
"Creating temp directory /tmp/mrjob-ex-3.jovyan.20190913.175842.269968\n",
"uploading working dir files to hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/wd...\n",
"Copying other local files to hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/\n",
"Running step 1 of 2...\n",
" packageJobJar: [/tmp/hadoop-unjar4483334449690778934/] [] /tmp/streamjob4300404582507966699.jar tmpDir=null\n",
" Connecting to ResourceManager at /0.0.0.0:8032\n",
" Connecting to ResourceManager at /0.0.0.0:8032\n",
" Total input files to process : 4\n",
" Cleaning up the staging area /tmp/hadoop-yarn/staging/jovyan/.staging/job_1568395552187_0004\n",
" Error Launching job : Not a file: hdfs://localhost:9000/user/matheus/output/output8\n",
" Streaming Command Failed!\n",
"Attempting to fetch counters from logs...\n",
"Can't fetch history log; missing job ID\n",
"No counters found\n",
"Scanning logs for probable cause of failure...\n",
"Can't fetch history log; missing job ID\n",
"Can't fetch task logs; missing application ID\n",
"Step 1 of 2 failed: Command '['/home/jovyan/resources/local/hadoop-2.9.2/bin/hadoop', 'jar', '/home/jovyan/resources/local/hadoop-2.9.2/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar', '-files', 'hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/wd/mrjob-ex-3.py#mrjob-ex-3.py,hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/wd/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/matheus/*', '-output', 'hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/step-output/0000', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 mrjob-ex-3.py --step-num=0 --mapper', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 mrjob-ex-3.py --step-num=0 --reducer']' returned non-zero exit status 1280.\n"
]
}
],
"source": [
"!python resources/mrjob-ex-3.py -r hadoop --hadoop-streaming-jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar hdfs:///user/matheus/*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
45 changes: 45 additions & 0 deletions labs/lab6-flume/twitterAgent.conf
@@ -0,0 +1,45 @@
#Nome dos componentes do agente
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

#Configuração do Source
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
TwitterAgent.sources.Twitter.accessToken = <accessToken>
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret>
TwitterAgent.sources.Twitter.keywords = #hadoop, #flume, #bigdata

#Configuração do Sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/matheus
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

#number of events written to file before it is flushed to HDFS
TwitterAgent.sinks.HDFS.hdfs.batchSize = 50

#File size to trigger roll, in bytes (0: never roll based on file size)
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

#Number of events written to file before it rolled (0 = never roll based on number of events)
TwitterAgent.sinks.HDFS.hdfs.rollCount = 50

#Number of seconds to wait before rolling current file (0 = never roll based on time interval)
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0

#Configuração do Channel
TwitterAgent.channels.MemChannel.type = memory
#The maximum number of events stored in the channel
TwitterAgent.channels.MemChannel.capacity = 100
#The maximum number of events the channel will take from a source or give to a sink per transaction
TwitterAgent.channels.MemChannel.transactionCapacity = 100

#Conectando Source, Sink, Channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel


#https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json
#https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

0 comments on commit eda64c0

Please sign in to comment.