Skip to content

themains/domain_knowledge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Domain Knowledge: Learning with Pydomains

You are what you browse. More or less. We jest, just a bit.

To help make it easier to learn from browsing data, we developed a Python package, pydomains. The package provides multiple ways to infer the kind of content hosted by a domain. To illustrate its power (and also the general workflow), we use it to answer two important questions:

  1. Do poor people, minorities, and the less-well-educated visit sites that distribute malware or engage in phishing more frequently than their respective complementary groups---the better-off, the racial majority, the better educated?

  2. How does consumption of pornography vary by education and age?

Data

Scripts

  1. Malware by Age, Race, Education
  2. Pornography Consumption by Age and Education for comScore 2004
    • We pick 2004 because we have data from Trusted Source API for 2004 also. We plan to present some supplementary data and analysis that illustrate some of the issues with comScore data but much of it is beyond the scope of this illustration and we may do it separately.

Outputs

Authors

Suriyan Laohaprapanon and Gaurav Sood