Skip to content

ericanaglik/Tweet-Generator

Repository files navigation

Image and video hosting by TinyPic

Title project of my Computer Science 1.2 class

About CS 1.2

A project based course that looks under the hood at data structures and algorithms to see how they work. In addition to implementing these structures in an application; students will build them from scratch, analyze their complexity, and benchmark their performance to gain an understanding of their tradeoffs and when to use them in practice. Students will write scripts, functions, and library modules to use text processing tools like regular expressions, construct and sample probability distributions to create a Markov language model and gain insight into how grammar works and natural language processing techniques.

Assignment Requirements

Create a "tweet generator" using a corpus, a Markov Chain, and probability

What I Did/Learned:

  1. Strings and Random Numbers:

    • Create Python scripts and modules
    • Access command-line arguments
    • Read files and extract lines of text
    • Remove whitespace from strings
    • Store and access elements in a list
    • Generate random integers in a range
  2. Histogram Data Structures:

    • Split strings into components to find words
    • Build a histogram to count word occurrences
    • Create and use dictionary, list, and tuple data types
  3. Probability and Sampling:

    • Sample words according to their observed frequencies
    • Compare tradeoffs with different sampling techniques
    • Validate sampling techniques based on relative probabilities
  4. Flask Web App Development:

    • Set up Python virtual environments for package isolation
    • Build and test simple Flask web apps on local computers
    • Deploy Flask web apps to Heroku cloud hosted servers
  5. Application Archetecture:

    • Assess aspects of code quality including organization and modularity
    • Refactor functions that use structures as class instance methods
    • Plan application architecture to prepare for future expansion
  6. Generating Sentences:

    • Build Markov chains based on observed frequency of adjacent words in text
    • Generate sentences by sampling words by performing random walks on Markov chain
  7. Array's and Linked Lists:

    • Diagram abstract concepts implemented with nested data structures
    • Describe and diagram how arrays and linked lists are stored in memory
    • Describe how dynamic arrays automatically resize when more space is needed
    • Compare advantages and disadvantages of dynamic arrays with linked lists
    • Implement essential linked list class instance methods using node objects
  8. Hash Tables:

    • Describe what a hash function does to enable mapping arbitrary keys to integers
    • Describe and diagram how a hash table uses arrays and linked lists to store key-value entries
    • Explain what a hash collision is, why it happens, and at least one resolution technique
    • Compare advantages and disadvantages of using hash tables versus arrays or linked lists
    • Implement essential hash table class instance methods
  9. Algorithm Analysis:

    • Describe and diagram in detail how a hash table uses arrays and linked lists to store key-value entries
    • Explain how to add a new key-value entry to a hash table and how to get the value associated with a given key
    • Identify key ingredients used to build a hash table: hash function, indexed array of buckets, and linked lists to store multiple key-value entries per bucket
    • Perform basic analysis of algorithm time complexity with big O notation
  10. Higher Order Markov Chains:

    • Build higher order Markov chains based on observed frequency of n-grams (tuples of n words) in text
    • Generate sentences by sampling words by performing random walks on higher order Markov chain
    • Utilize a linked list as a queue to track words in a Markov chain's n-gram window
  11. Regular Expressions:

    • Use regular expressions to clean up and remove junk text from corpus
    • Use regular expressions to create a more intelligent word tokenizer

About

Python application that generates tweets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages