Skip to content

Portfolio repo for myself

Notifications You must be signed in to change notification settings

ViktorKob/portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Coverage Status Codacy Badge

Portfolio

This repository contains examples of how I have build a cloud'ish infrastructure to leverage a big data platform. It is meant to give potential employers an idea of what they can expect from me as a developer.

For anyone not wishing to employ me, this is an example of how I would build a micro-service based infrastructure, for looking up information in an environment centered around an HBASE index, using Spring Boot / -Cloud for implementation and GraphQL to wrap the API. All neatly wrapped in Docker images for deployment.

It is very much a work in progress, and I plan on adding new features continuously until I run out of ideas or get the job I want.

Quick-and-dirty structural diagram

Diagram of the services in the architecture and the flow of data

  • The boxes are the individual parts of the system, with hard edges for the proxy and soft edges for the internal services.
  • The database icon exemplify the usage data storage. This is implemented using SQLite and jOOQ.
  • The clouds exemplify related systems outside the infrastructure. None of these exist in the repository.
  • The arrows represent the flow of data in the system. Queries are not considered data here.
  • The thick dotted border represents the services that register themselves with the discovery service. These can all be scaled up and down as required, though external limitations may apply.

Quickstart

To just see the infrastructure (Tested in Chrome and Firefox)

When running the example queries below, login using service-user and password.

Note that this cluster is down since I am not actually looking for work at the moment

Examples (currently inactive)
Eureka The Eureka server I use for service discovery
Admin A server running the Spring Boot Admin tool, connected to the other services
HAL Browser A browser for the HAL layer in the API. Use "Authorization: Basic c2VydmljZS11c2VyOnBhc3N3b3Jk" as a custom request header.
Query history (Reactive) A simple JavaScript component rendering the search history using data from the Legal Service. Uses React to do live updating.
Selector suggestion Example of how selector suggestions are implemented.
Selector lookup Example of how to lookup a selector and data related to it.
Inverted index lookup Example of how to browse values in the inverted index.
Minified GraphQL example Example of how the same query looks when run without GraphiQL.
Document lookup Example of how documents and document fields can be looked up.
Mutation example Example of a mutation storing user activity into the database.

Feel free to experiment, but also note that the resources are very limited (AWS t2.medium service with 2 cores and 4 GiB of ram, and a t2.micro with 1 core and 1 GiB of ram, that are running 9 services and a MySQL server simultaneously). If it is very slow, or starts hanging and/or failing, either scale back your queries, or come back at a later time and try again.

To familiarize yourself with GraphiQL, I recommend going to their introduction of queries. The model can be browsed directly from the tool (in the right pane), and it will try to auto-complete queries when typing.

For a local setup

To run it, first make sure that ports 80, 8000, 8001, 8100, 8120, 8150, 8200, 8300 and 8350 are available on your system. Or simply run it, and check these first if a service fails.

  • Make sure you have Java (I currently use OpenJdk 11).
  • Check out the entire repository.
  • Import the root pom into your favorite IDE.
  • Install a mysql server (I use 5.5, but any newer should work), and use the schema to set it up in a database named "usage_data". I will probably add a SQLite version at some point for experimentation. I now use SQLite instead for easier setup and execution. It should be enough to simply run the service.
  • If you are running linux, make sure the maximum number of open files is larger than 200K (ulimit -n to check).
  • Start the Config service from the config folder (net.thomas.portfolio.config.ConfigApplication).
  • Start the infrastructure service from the infrastructure folder (net.thomas.portfolio.infrastructure.InfrastructureMasterApplication).
  • Start the Admin service from the admin folder (net.thomas.portfolio.infrastructure.AdminApplication).
  • Start the Proxy service from the proxy folder (net.thomas.portfolio.infrastructure.ProxyApplication).
  • Run each service using its respective net.thomas.portfolio.*.*ServiceApplication.java. Order should not matter, if you start all of them shortly after each other. O.w. make sure to start the HbaseIndexingService first.
  • Personally, I use a launch group in eclipse to start the services all at once (with a delays of 10 seconds after starting the config server and the infrastructure master).
  • Note that the HbaseIndexing service isn't fully ready until it has created the sample data, stored it, and run the processing step on it. It will tell you in the log, when the initialization is done. The schema can be pulled immediately after start of Tomcat.

Now you can do as described above, but locally (without https, though).
Note, that unless you also set up a local reverse proxy, you will need to specify ports directly when running queries (as opposed to the examples above). For instance, the hbase service should be running at (localhost:8120/HbaseIndex/). You should be able both to go through the proxy on port 80 and to go directly to each service using their ports.
Also note, that graphiql requires graphql and itself to be running at the root level (localhost:8100/graphql) GraphQL is now also located behind the /Nexus context path.

Status at the moment

The project contains a set of services:

Nexus

GraphQL service that enables easy access to the other services

Source

Settings
Port 8100
Technologies Graph(i)QL, Spring
User service-user
Password password
Endpoints
  • /Nexus/schema.json
  • /Nexus/graphql
  • /Nexus/graphiql

Here every other service is tied together. Using the GraphiQL interface, it is possible to transparently interact with the data in the other services.

HBASE Index

Fake HBASE service, allowing for model-discovery and emulating lookups in HBASE tables

Source

Settings
Port 8120
Technologies Spring
User service-user
Password password
Endpoints Swagger

The purpose of this service is to emulate lookups into HBASE tables. This should be seen as an index build on top of whatever data is ingested into the infrastructure, with a data model representing the content of the index.

When started, it will generate a sample data set (based on a random seed, default 1234) using a sample data model, both of which are exposed to the infrastructure on demand.

Legal

Legal service responsible for validating legal requirements and audit logging model access

Source

Settings
Port 8350
Technologies JavaScript, React, WebSocket, Spring
User service-user
Password password
Endpoints

The purpose here is to enclose all logic related to legal requirements into a separate service, to allow the developers working with the legal department to focus on a simple API instead of actual usage scenarios.

It has two responsibilities:

  • allow the system to check the legality of a query before actually executing it
  • audit log the execution of a query into protected data on demand

Render

Rendering service that can lookup data in the Hbase index and then renderer it in a meaningful manner

Source

Settings
Port 8150
Technologies Spring
User service-user
Password password
Endpoints Swagger

One could argue that this service contains functionality that should be a part of the HBASE index service, but I have chosen to separate them, because I expect that the teams maintaining either service will be very different (HBASE specialists vs. front-end specialists). Still, it is heavily model dependent, and will likely need to be updated any time the HBASE index service is updated.

Note that HTML rendering has not been implemented yet.

Usage data

Service responsible for storing and showing user interaction with the model

Source

Settings
Port 8200
Technologies SQLite, jOOQ, Spring
User service-user
Password password
Endpoints Swagger

To enable storage of data about usage of the data model, this service employ a SQLite backend and exposes two endpoint for manipulating the contents of this. jOOQ is used as a middle layer to enable compile-time validation of SQL queries.

On startup it will attempt to run SQLite as specified in the properties and make sure it can access a database schema with the name "usage-data". If the server doesn't contain one, it will be created with the necessary tables. If it does, it will assume that it has the correct structure and attempt to use it.

Analytics

Fake service representing interaction with the analytical information in the company

Source

Settings
Port 8300
Technologies Spring
User service-user
Password password
Endpoints Swagger

Another fake service, this time representing the existing analytical knowledge in the company, outside this infrastructure.

Eureka

Simple discovery service implementation with a hystrix UI

Source

Settings
Port 8000
Technologies Eureka, Hystrix, Spring
Endpoints
  • /Infrastructure/

This is my discovery service for the infrastructure. It doesn't really contain any code, just configuration. All standard Eureka endpoints are accessible through the sub-context-path /eureka. It also enables the Hystrix UI, if you have access to a Hystrix stream you want to monitor.

Admin

Service for monitoring the individual services in the infrastructure

Source

Settings
Port 8001
Technologies Spring Boot Admin, Spring
Endpoints
  • /Admin/

The Admin service / UI gives easy access to data from the actuator endpoints. It uses the discovery service both for service discovery and for looking up context paths.

Zuul

Reverse proxy for hiding ports, handling HTTPS and simplifying some endpoints

Source

Settings
Port 443
Technologies Spring, Zuul, WebSocket
Endpoints
  • All of the above

Reverse proxy for the entire setup and single point of access to the services. Also responsible for diverting HTTP calls to HTTPS and is the only encrypted service in the infrastructure at the moment.

Development strategy and major design principles used

This project was created as a greenfield project, but also contains code fragments from several older projects. Every component added has been (re-)written using the following.

Prioritization

In general, my focus is on getting features to market as soon as possible, both for the added value and to gain feedback early. Secondly, I prioritize expanding the core of stability in the more mature part of the system. I try to observe the following:

  • Never get sucked down by irrelevant details, time spend working on one component is time not spend working on everything else
  • Make it work at all, before trying to make it nice
  • Early feedback is key for quality; rather than maturing a feature extensively, throw it out of the nest and check whether it can fly
  • Write tests for units that worry you right away, do the rest when the units have matured reasonably
  • Consider (and preferably fix) all warnings and bugs as soon as possible
  • Whenever possible, write tests for public bugs before fixing them to guarantee recurrences will be caught and fixed
  • If someone else already made it for you, consider if using their solution is better than building your own
  • Adding features is gold, but remember to also go back and cleanup the code; try to always keep up with changes in the immediate code base

Development approach

I use a modern IDE (preferably eclipse) and a proper build pipeline (IDE -> VCS -> build server -> artifact store -> deployment server -> execution environment), when possible. I build my own, when not.
I value using static code analyzers like FindBugs, pylint (for Python) and the IDE itself, and coverage tools like Emma. Finally I use debugging for simple problems and profiling (e.g. VisualVm) to track down and fix the harder issues.
For bug-tracking and issue management, I have experience using JIRA and some companion Atlassian products, but in this project I use paper for now.

These are just tools, though. When I develop a new feature, the steps I go through are often the following:

  1. Plan layout in relation to the existing infrastructure based on domain knowledge and feature requirements
  2. Define points of contact with the existing system and planned (near future) sister components
  3. Build prototype component super-structure, faking the details (as little as is required to emulate the actual component)
  4. Deploy the system and make sure the fake component behaves as intended; change whatever doesn't and make the fake "production ready"
  5. Either check for thirds party tools that match / can be used for parts of the implementation, or reason why it should be implemented directly
  6. Either replace the fake with the integration of a third party tool or implement the details
  7. Deploy component with the rest of the system and check everything works
  8. Cleanup obvious omissions and do general refactoring of the system
  9. Over time, visit the component once in a while and check if anything should be refactored based on other feature implementations

If I am "just" adding features to an existing component, many of the steps are pretty light-weight or perhaps even skipped, but it is still the primary approach I use.

Design principles

When I write code, there is a set of principles that I try to respect more than others. "Try", because it is a process, not a goal, but still I value these highly. If you already know "Clean Code" by Robert C. Martin (Uncle Bob), much of this will seem familiar. I do not agree, however, with the principle that you should always prioritize code quality over development speed, rather I believe both to be equally important. I do not expect you do agree with these, but I stand by them, and update them as I evolve.

  • Only start feature implementation that can be completed in at most a few days, but preferably in a matter of hours; instead make the sub-features production ready, before working on the complex features that depend on these
  • Write the code to be read, using comments to elaborate invisible details or APIs, but not to explain the code itself
  • Use meaningful names; use the domain, spend time choosing them, don't use personal acronyms, and refactor when encountering strange or misleading names
  • Keep it small; start large, but work towards short functions, short classes and split into meaningful sub-classes when appropriate
  • Consider every single warning (both during development and when using static analysis) in the code and decide how to handle it; leave nothing for the build server
  • Code coverage is a tool, not a goal; 100% in itself is irrelevant, but the correct level of testing makes refactoring easy
  • Perfection is a beacon, continuous improvement is a path; use the Boy Scout Rule, but keep moving in the right direction
  • Make it work at all before worrying about details and niceness
  • Stay agile and "light-weight" for as long as possible; prioritize change initially, and add documentation and fine-grained tests when the right level of maturity has been reached
  • SOLID
  • KISS
  • When checking in, try to make it clear what is changed, e.g. by using a ticket id and a few lines about the specific change if it deviates from the ticket description
  • Commit often, merge often, deploy often, get feedback often; argue why not rather than why