Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API to focus on variable relationships #20

Open
emjun opened this issue Mar 18, 2021 · 0 comments
Open

Update API to focus on variable relationships #20

emjun opened this issue Mar 18, 2021 · 0 comments

Comments

@emjun
Copy link
Owner

emjun commented Mar 18, 2021

The most recent revision attempts to make variable relationships clearer and obvious from the syntax. A nice consequence of this revision is that the conceptual differences between Tisane and existing software tools are more apparent.

Variables

An end-user expresses variables according to their data type. If the end-user later provides the data, the variable names should be the column names. For nominal or ordinal data, end-users also must specify the cardinality of variables if they do not intend to provide data. If end-users provide data, cardinality information is not required. In this case, Tisane will calculate and populate these fields internally.

Variables are observed values of a measure. Variables can be measures of interest, as in dependent and independent variables. Variables can also be id numbers that act as keys to a dataframe (e.g., participant id).

import tisane as ts

# Example 1: 
hw = ts.Numeric('Homework') # 'homework' is the column name
race = ts.Nominal('Race', cardinality=5) # there are 5 groups/options for the variable race
math = ts.Numeric('MathAchievement') 
mean_ses = ts.Numeric('Mean_SES')
student = ts.Nominal('student id', cardinality=100) # IDs 100 students included in this study 
school = ts.Nominal('school', cardinality=10) # IDs for schools, 10 students/school

# Example 2: 
leaf_length = ts.Numeric('length')
fertilizer = ts.Nominal('fertilizer condition', cardinality=2)
season = ts.Nominal('season', cardinality=4)
plant = ts.Nominal('plant id') 
bed = ts.Nominal('plant bed') 

An end-user expresses relationships between variables that are related to domain theory (conceptual models) and data measurements.

Conceptual Relationships

There are two types of conceptual relationships: cause and associates_with

# Example 1
hw.cause(math) # Hours spent on homework causes math achievement. 
race.associates_with(math) # Math scores and race are associated with each other. 

# Example 2
fertilizer.cause(leaf_length) # Fertilizer causes leaf growth

Definitions:

  • cause: The LHS variable causes the RHS variable. The RHS variable cannot also cause the LHS variable.
  • associates_with: The LHS and RHS variables are associated/related in some way that is not causal.

Tisane provides aliases to both: causes and cause and associate_with and associates_with

Data measurement relationships

There are three types of data measurement relationships: (1) measurement attribution, (2) treatment for experiments, and (3) data hierarchies.

Measurement attribution

# Example 1: 
student.has(hw)
student.has(race)
student.has(math)
school.has(mean_ses)

# Example 2: 
plant.has(leaf_length)

Definition:

  • has distinguishes "levels" of observations by attributing variables to each level. In Example 1, there are two levels: student and school. Each student has a value for homework, race, and match. Each school has a value for mean_ses.

Idea: Create a separate Data type for "ID" and enforce that only variables of type "ID" can have other variables.

Treatment

End-users can express experimental treatments/manipulations.

# Example 2: 
fertilizer.treats(bed)

Only Example 2 is an experiment. Each bed is treated with a fertilizer. In other words, fertilizer is a bed-level manipulation.

Definition:

  • treats expresses the explicit/intentional manipulation of variables in an experiment. X.treats(Y) is internally equivalent to Y.has(X), which means that each Y has an observation for X.

Idea: Check that the LHS variable of treats has a causal relationship (in the graph) with the DV? And keep treatsandhas` different from one another.

Data hierarchies

Data can be clustered or nested. Tisane provides support for expressing two possible sources of clustering: (1) repeated measures and (2) nested relationships.

# Example 1 
student.nest_under(school) # Students belong to a school. Students within a school might also cluster more than between schools. 

# Example 2 
plant.nest_under(bed) # Plants belong in plant beds. 
plant.repeats(measure=leaf_length, repetitions=season) # Repeatedly measure the same plant once per season

Definitions:

  • nest_under nests one variable under another.
  • repeats means the LHS variable provides multiple values of the measure. Each value is enumerated/indexed by the repetitions variable (e.g., season). If a plant provides multiple measures per season, another column for indexing each measure is required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant