Skip to content

vsoch/codeart-examples

Repository files navigation

ColorArt Examples

GitHub actions status

This is an examples repository to supplement codeart. The repository was getting large so I decided to move image and data files over here.

Dockerfiles

The Dockerfiles dinosaur dataset has about 100K Dockerfiles, and this small example shows creating visualizations for them. The interactive version isnt' very useful here because we only have one file type, and the range of colors is limited from greens to blues. This project helped to develop the sorted interactive version of the colormap.

Parse Repo

The parse_repo folder shows how to parse a repository (spack) and then generate a color gradient lookup. The first attempt at the color grid can be seen here and this was updated to better organize, seen here. For the second, since the Word2Vec model derives similar embeddings represented in color, this means that similar colors equate to similar terms. You can explore the visualization with this knowledge.

To generate a custom codeart image (with text), the library can also do that, with an example here and shown below.

parse_repo/text.png

And of course it would be more appropriate to write the name of the software as the text instead!

parse_repo/spack-text.png

and of course you have to zoom in to see that the pixels are actually code, colored based on their context thanks to Word2Vec.

Finally, to generate a d3 tree that shows code images in a folder on mouseover, see this example.

parse_repo/tree.png

It's not a great visualization, could be a lot better.

Interactive

The interactive version shows the interactive color grid, where groups (in this case extensions) are colored based on salience. Click an extension to see relevant terms in the embeddings model.

Abstract

The abstract versions, including those for:

are graphics generated with pictures of the code themself. By far, the all files generation is the most interesting and abstract!

parse_repo/all.png

The interactive version is cooler as you can click on any of the files to see in slightly more detail.

Parse by Year

Is an example project to use codeart to parse a large Python code base, determine year of creation using the GitHub api, and then break into groups based on the year. This small project helped me to develop the interactive colomap example, which you can view here or the previous version (not sorted) here

The example also generates static images, along with a gif (animation) to show the change in data over time.

colormap-groups.gif

Of course this was impossible to explore, hence why I made the interactive version.

Derive Colormap

The derive_colormap folder an example to show working on deriving a colormap from a set of embeddings. This means that we start with 3d, project to 2d, and then use Voronoi to fill border cells. Here is the original 3d map:

derive_colormap/colormap-3d.png

which we then project to 2d

derive_colormap/colormap-2d-example.png

And then the Voronoi exercise didn't work out as intended (and I pursued other methods instead)

derive_colormap/sploosh.png

And ultimately wound up developing an interactive colormap (too large to add to the repo here!) that is better sorted by rows and columns (and generates in a reasonable time). Note that since this is for all of my Python code, the interface is a bit clunky. Probably something with canvas would work better here for this number of points. The notebook is useful to get someone started with a similar project.

Parse Folders

The parse_folders is a similar effort to "Parse Repo" above, but instead we parse folders on our local file system and generate a colors gradient. This is the color gradient across all extensions for the model:

parse_folders/colors-gradient.png

Or instead, you can generate an interactive version that allows for mousing over colors to see terms, and clicking on the list of extensions to see relevant terms. The opacity corresponds to the relative count of the term for the extension. For spack, most terms will be derived from Python files.

GitHub Workflow

A github workflow is provided to create a gallery! The workflow opens a pull request, and deploys the files to the docs folder. You can see the live version here.

docs/codeart.png

About

Separate examples repository for codeart Python module, since data and image files tend to be large.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published