Welcome to your CDK Python project!

This is a blank project for CDK development with Python.

The cdk.json file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the .venv directory. To create the virtualenv it assumes that there is a python3 (or python for Windows) executable in your path with access to the venv package. If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv manually.

To manually create a virtualenv on MacOS and Linux:

$ python3 -m venv .venv

After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.

$ source .venv/bin/activate

If you are a Windows platform, you would activate the virtualenv like this:

% .venv\Scripts\activate.bat

Once the virtualenv is activated, you can install the required dependencies.

$ pip install -r requirements.txt

At this point you can now synthesize the CloudFormation template for this code.

$ cdk synth

To add additional dependencies, for example other CDK libraries, just add them to your setup.py file and rerun the pip install -r requirements.txt command.

Useful commands

cdk ls list all stacks in the app
cdk synth emits the synthesized CloudFormation template
cdk deploy deploy this stack to your default AWS account/region
cdk diff compare deployed stack with current state
cdk docs open CDK documentation

Enjoy!

Building the stacks

CognitoStack:

cdk deploy CognitoStack-dev --context env=dev
cdk deploy CognitoStack-stage --context env=stage
cdk deploy CognitoStack-prod --context env=prod

EmotionDataStack:

cdk deploy EmotionDataStack-dev --context env=dev
cdk deploy EmotionDataStack-stage --context env=stage
cdk deploy EmotionDataStack-prod --context env=prod

Sampling

Some notes on the sampling strategies employed on survey creation.

The purpose of all sampling strategies is find a specific number of filenames, e.g. items to attach to a specific survey. All sampling strategies seek to prioritize filenames that occur least frequently in previous surveys of the same project to make sure that the number of ratings per item is evenly distributed in the long term.

general input parameters:

valence: The emotional valence of the generated items.
- Type: str or None
- Possible Values:
  - "pos": Only sample filenames of positive and neutral emotional valence.
  - "neg": Only sample filenames of negative and neutral emotional valence.
  - None: Sample from all filenames.
balanced_sampling_enabled: The emotional valence of the generated items.
- Type: bool
- Possible Values:
  - True: Use balanced sampling.
  - False: Use randomized sampling.
samples_per_survey: How many samples to generate.
- Type: int
- Possible values:
  - Min: 1
  - Max: dataset size

Balanced filename sampling

The purpose of balanced filename sampling is to generate a specific number of filenames with a balanced distribution of emotion ids.

input parameters:

emotions_per_survey: How many emotions should be included in each survey.
- Type: int
- Possible Values:
  - "min": 1
  - "max": Number of emotions in the dataset

Algorithm

Here follows a general description of the algorithm for balanced sampling.

Generate frequency2filename dict with keys frequency (of previous occurrence) and values filenames (list of filenames).
Check if emotions_per_survey is less than the total number of emotions present in dataset. If so:
- Select x=emotions_per_survey number of emotions from the lowest frequencies.
Calculate the distribution of emotion ids in the full dataset (or selected subset of emotions/valence).
Collect a number of samples from each emotion (prioritizing the lowest frequencies in frequency2filename) such that the distribution of these sampels is approximately the same as the general distribution.
- Note: this is subject to rounding errors when the proportion are rounded to a discrete number of samples.
Check if we need more samples.
Fill up with one of each emotion.
- Stop when we have enough samples, e.g. the number of samples corresponding to samples_per_survey.

Randomized filename sampling

Algorith

Generate frequency2filename dict with keys frequency (of previous occurrence) and values filenames (list of filenames).
Take all filenames from the lowest frequency
Shuffle
Add filenames to return list until either:
- the number specified in samples_per_survey is reached.
- the filenames at current frequency runs out, if so
  - Take all filenames at the next frequency and restart at 3.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
lambda		lambda
stacks		stacks
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cdk.json		cdk.json
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
source.bat		source.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lambda

lambda

stacks

stacks

tests

tests

.gitignore

.gitignore

README.md

README.md

app.py

app.py

cdk.json

cdk.json

requirements-dev.txt

requirements-dev.txt

requirements.txt

requirements.txt

source.bat

source.bat

Repository files navigation

Welcome to your CDK Python project!

Useful commands

Building the stacks

Sampling

general input parameters:

Balanced filename sampling

Algorithm

Randomized filename sampling

Algorith

About

Releases

Packages

Languages

timlac/reco-aws-cdk-infrastructure

Folders and files

Latest commit

History

Repository files navigation

Welcome to your CDK Python project!

Useful commands

Building the stacks

Sampling

general input parameters:

Balanced filename sampling

Algorithm

Randomized filename sampling

Algorith

About

Resources

Stars

Watchers

Forks

Languages