-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
stevelowenthal
committed
Sep 10, 2015
1 parent
132e07a
commit fa11543
Showing
3 changed files
with
691 additions
and
338 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,324 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Lets \"pivot\" data and nest it into a list.\n", | ||
"We currently have our tracks_by_album table modeled as such:\n", | ||
"\n", | ||
"````\n", | ||
"CREATE TABLE music.tracks_by_album (\n", | ||
" album_title text,\n", | ||
" album_year int,\n", | ||
" track_number int,\n", | ||
" album_genre text static,\n", | ||
" performer text static,\n", | ||
" track_title text,\n", | ||
" PRIMARY KEY ((album_title, album_year), track_number)\n", | ||
"````\n", | ||
"\n", | ||
"Let's model it as one partition per album, and the track titles in a list\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 31, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%Cql create table if not exists music.albums (album_title text,\n", | ||
"album_genre text,\n", | ||
"album_year int,\n", | ||
"performer text,\n", | ||
"tracks list<text>, \n", | ||
"primary key ((album_title, album_year)) )" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 59, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<table><tr><tr></table>" | ||
] | ||
}, | ||
"execution_count": 59, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"%%Cql truncate music.albums" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Now a handy case class" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 47, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"case class Track(album_title: String,\n", | ||
"album_year:Int, number:Int,\n", | ||
"album_genre: String,\n", | ||
"performer: String,\n", | ||
"track_title: String)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### And an RDD to pick up the original data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 48, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"val tracks = sc.cassandraTable(\"music\",\"tracks_by_album\").as(Track)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 49, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"Track(Duos For Violin and Cello,2000,1,Classical,Nigel Kennedy,Sonata for Violin and Cello - Allegro)" | ||
] | ||
}, | ||
"execution_count": 49, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"tracks.first" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create a Pair RDD with key, track name. The key in this case (album_title, year). So we want tuples ((album_title, year), track_title). Also filter out tracks that are null" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 51, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"val track_pairs = tracks.filter(t => t.track_title != null).map(t => ((t.album_title, t.album_year), t.track_title))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 36, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"track_pairs.first.toString" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Use groupByBey, to convert it to (key, collection of track names)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 52, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"val albums_with_tracks = track_pairs.groupByKey" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 38, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"albums_with_tracks.first.toString\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Flatten it out so we have (album_title, year, collection of tracks)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 53, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"val albums_with_tracks2 = albums_with_tracks.map{ case ((album, year), tracks) => (album, year, tracks) }" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 54, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(Bramble Rose,2002,CompactBuffer(Trouble Over Me, Virginia, No One Can Warn You, Neighborhood, Bird Of Freedom, Bramble Rose, I Know Him Too, Sunday, Supposed To Make You Happy, Diamond Shoes, Are You Still In Love With Me?, When I Cross Over))" | ||
] | ||
}, | ||
"execution_count": 54, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"albums_with_tracks2.first.toString" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 61, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"albums_with_tracks2.saveToCassandra(\"music\",\"albums\", SomeColumns(\"album_title\", \"album_year\", \"tracks\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## ... And check it out" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 62, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<table><tr><th>album_title</th><th>album_year</th><th>album_genre</th><th>performer</th><th>tracks</th><tr><tr><td>Duos For Violin and Cello</td><td>2000</td><td><null></td><td><null></td><td>[Sonata for Violin and Cello - Allegro, Sonata for Violin and Cello - Tres vif, Sonata for Violin and Cello - Lent, Sonata for Violin and Cello - Vif, avec entrain, Passacaglia, Duo for Violin and Cello Op. 7 - Allegro serioso, non troppo, Duo for Violin and Cello Op. 7 - Adagio-Andante-Tempo I, Duo for Violin and Cello Op. 7 - Maestoso e largamente, ma non troppo lento-Presto, Two-Part Intervention No. 6 in E]</td></tr><tr><td>Golden Boy Elvis</td><td>1981</td><td><null></td><td><null></td><td>[She's Not You, Return To Sender, Joshua Fit The Battle, Don't, Tutti Frutti, It's Now Or Never, Surrender, Do The Clam, Kiss Me Quick, Such A Night, Blueberry Hill, Don't Be Cruel, Heartbreak Hotel, Fun In Acapulco, Hound Dog, Wooden Heart]</td></tr><tr><td>Home</td><td>1995</td><td><null></td><td><null></td><td>[I Believe, Let Me Be The One, All Along, Oh Virginia, Nora, Would You Be There, Home, End Of The World, Heaven, Forever For Tonight, Lucky To Be Here]</td></tr><tr><td>Little Sweet Delirium</td><td>1996</td><td><null></td><td><null></td><td>[Chase, Revolution, 829, Reaction, Hands Like Wings, God Shaped Hole, Fall, Point of You]</td></tr><tr><td>Merry Christmas From Elvis Presley, A</td><td>1982</td><td><null></td><td><null></td><td>[O Come All Ye Faithful, The First Noel, On A Snowy Christmas Night, Winter Wonderland, The Wonderful World Of Christmas, It Won't Seem Like Christmas, I'll Be Home On Christmas Day, If I Get Home On Christmas Day, Holly Leaves And Christmas Trees, Merry Christmas Baby, Silver Bells]</td></tr></table>" | ||
] | ||
}, | ||
"execution_count": 62, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"%%Cql select * from music.albums limit 5;" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Now we need to fill in genre and year. Let's just upsert them in! This will also insert any albums with no tracks" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 57, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"tracks.saveToCassandra(\"music\",\"albums\",SomeColumns(\"album_title\",\"album_year\",\"album_genre\",\"performer\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 58, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<table><tr><th>album_title</th><th>album_year</th><th>album_genre</th><th>performer</th><th>tracks</th><tr><tr><td>Duos For Violin and Cello</td><td>2000</td><td>Classical</td><td>Nigel Kennedy</td><td>[Sonata for Violin and Cello - Allegro, Sonata for Violin and Cello - Tres vif, Sonata for Violin and Cello - Lent, Sonata for Violin and Cello - Vif, avec entrain, Passacaglia, Duo for Violin and Cello Op. 7 - Allegro serioso, non troppo, Duo for Violin and Cello Op. 7 - Adagio-Andante-Tempo I, Duo for Violin and Cello Op. 7 - Maestoso e largamente, ma non troppo lento-Presto, Two-Part Intervention No. 6 in E]</td></tr><tr><td>Golden Boy Elvis</td><td>1981</td><td>Rock</td><td>Elvis Presley</td><td>[She's Not You, Return To Sender, Joshua Fit The Battle, Don't, Tutti Frutti, It's Now Or Never, Surrender, Do The Clam, Kiss Me Quick, Such A Night, Blueberry Hill, Don't Be Cruel, Heartbreak Hotel, Fun In Acapulco, Hound Dog, Wooden Heart]</td></tr><tr><td>Home</td><td>1995</td><td>Rock</td><td>Blessid Union of Souls</td><td>[I Believe, Let Me Be The One, All Along, Oh Virginia, Nora, Would You Be There, Home, End Of The World, Heaven, Forever For Tonight, Lucky To Be Here]</td></tr><tr><td>Little Sweet Delirium</td><td>1996</td><td>Unknown</td><td>Creeker</td><td>[Chase, Revolution, 829, Reaction, Hands Like Wings, God Shaped Hole, Fall, Point of You]</td></tr><tr><td>Merry Christmas From Elvis Presley, A</td><td>1982</td><td>Rock</td><td>Elvis Presley</td><td>[O Come All Ye Faithful, The First Noel, On A Snowy Christmas Night, Winter Wonderland, The Wonderful World Of Christmas, It Won't Seem Like Christmas, I'll Be Home On Christmas Day, If I Get Home On Christmas Day, Holly Leaves And Christmas Trees, Merry Christmas Baby, Silver Bells]</td></tr></table>" | ||
] | ||
}, | ||
"execution_count": 58, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"%%Cql select * from music.albums limit 5;" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Spark 1.2.1 (Scala 2.10.4)", | ||
"language": "scala", | ||
"name": "spark" | ||
}, | ||
"language_info": { | ||
"name": "scala" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
Oops, something went wrong.