etlog - eduroam traffic log analysis

Basic info

etlog can be accessed on etlog.cesnet.cz. It gathers and analyzes national radius log files generated by eduroam service and presents them to the users. etlog is intended for both users and administrators.

Some of the main reasons to create etlog were:

create generic interface for processing, analysis and searching of the radius log files
create a system for generating statistics and reports
create a system for trend analysis, which can signal service problems
create a system for anomaly detection (authnetication errors, device or identity theft, .. )

etlog is a web application, which consists of Node.js, Express web application framework and MongoDB.

Server setup

The application is setup on Debian jessie. It is running as user etlog and it's root is in /home/etlog/etlog/. It is listening for incoming http connections on port 8080. Apache webserver is in front of the apllication and is doing a proxy for it.

The main purpose of putting apache in front of the application itself is authentication. Apache uses shibd module for authentication in czech identitity feredation eduid.cz.

User setup

Add unprivileged user for application:

adduser etlog

Network setup

Application is running by unprivileged user, so he can not use standard http and https ports. Instead port 8080 is used. Apache webserver is in front of application web server. Apache proxies all incoming request to the application web server. Automatic redirection from port 80 to port 443 is handled by apache.

Shibboleth setup

Documentation used for sbibboleth setup is located at http://www.eduid.cz/cs/tech/sp/shibboleth.

IdP attributes

etlog assumes that user's eduroam identity is the same as his eduPersonPrincipalName. If that is not true, user's home IdP can implement eduroamUID attribute. This attribute contains user's eduroam identity(or multiple identities). If the attribute is not implemented by user's home IdP, his eduPersonPrincipalName is used as his eduroam identity in etlog. User's home IdP must release the attribute at least for entityID https://etlog.cesnet.cz/shibboleth. Implementation at the Shibboleth Idp 3 may look like:

<AttributeDefinition id="eduroamUID" xsi:type="ScriptedAttribute">
  <Dependency ref="uid" />
  <AttributeEncoder xsi:type="SAML1String" name="http://eduroam.cz/attributes/eduroamUID" />
  <AttributeEncoder xsi:type="SAML2String" name="http://eduroam.cz/attributes/eduroamUID" friendlyName="eduroamUID" />
  <Script>
    <![CDATA[
      if (typeof uid != "undefined" && uid != null) {
          eduroamUID.addValue (uid.getValues().get(0) + "@eduroam.%{idp.scope}");
      }
      ]]>
    </Script>
</AttributeDefinition>

At the SAML level, the messages can look like:

<saml2:Attribute FriendlyName="eduroamUID"
                Name="http://eduroam.cz/attributes/eduroamUID"
                NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri">

 <saml2:AttributeValue xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                       xsi:type="xsd:string">
                       user@org.eu</saml2:AttributeValue>
 <saml2:AttributeValue xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                       xsi:type="xsd:string">
                       user@org.cz</saml2:AttributeValue>
</saml2:Attribute>

Apache setup

Apache in conjuction with shibboleth is responsible for authentication of users into the application. After successful autentication apache proxies request to the application webserver.

Installation of apache webserver:

apt-get install apache2 libapache2-mod-proxy-html

Setup server certificate in /etc/ssl/certs/etlog.cesnet.cz.crt.pem and private key in /etc/ssl/private/etlog.cesnet.cz.key.pem.

Add intermediade certificate to /etc/ssl/certs/etlog.cesnet.cz.crt.pem:

cd /tmp
wget https://pki.cesnet.cz/certs/TERENA_SSL_CA_3.pem
cat TERENA_SSL_CA_3.pem >> /etc/ssl/certs/etlog.cesnet.cz.crt.pem
rm TERENA_SSL_CA_3.pem
cd

SSL default vhost and module are enabled by:

a2enmod ssl
a2dissite 000-default
a2ensite default-ssl
service apache2 restart

Proxy is enabled by:

a2enmod proxy
a2enmod proxy_http
service apache2 restart

Headers and remote ip are enabled by:

a2enmod headers
a2enmod remoteip
service apache2 restart

Configuration for default ssl apache vhost is in /etc/apache2/sites-enabled/default-ssl.conf. Set the configuration as below:

<VirtualHost *:80>
    ServerAdmin info@eduroam.cz
    ServerName etlog.cesnet.cz
    Redirect permanent "/" "https://etlog.cesnet.cz/"
</VirtualHost>

<IfModule mod_ssl.c>

    # aplikacni virtualhost
    <VirtualHost _default_:443>
        ServerAdmin info@eduroam.cz
        ServerName etlog.cesnet.cz
        DocumentRoot /var/www/html

        ErrorLog ${APACHE_LOG_DIR}/etlog_error.log
        CustomLog ${APACHE_LOG_DIR}/etlog_access.log combined
        SSLEngine on

        SSLCertificateFile  /etc/ssl/certs/...
        SSLCertificateKeyFile /etc/ssl/private/...

        BrowserMatch "MSIE [2-6]" \
                nokeepalive ssl-unclean-shutdown \
                downgrade-1.0 force-response-1.0
        # MSIE 7 and newer should be able to use keepalive
        BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

        # HSTS
        Header always set Strict-Transport-Security "max-age=63072000; includeSubdomains;"


        <Location />
            # konfigurace shibbolethu pro /
            AuthType shibboleth
            Require shibboleth
            ShibRequestSetting requireSession 1

            # predani SSL prommene prostredi REMOTE_USER
            RequestHeader set REMOTE_USER %{REMOTE_USER}s

            # pro nastaveni dalsich hlavicek je treba dodat direktivu pro kokretni promenne prostredi
            RequestHeader set entitlement %{entitlement}e
            RequestHeader set eduroamUID %{eduroamUID}e

            # proxy
            ProxyPass http://127.0.0.1:8080/
            ProxyPassReverse http://127.0.0.1:8080/
        </Location>

        # vyjimka z autentizace pro .well-known
        <Location "/.well-known/security.txt">
            AuthType shibboleth
            Require shibboleth
            ShibRequestSetting requireSession 0
        </Location>

        ProxyRequests Off
        RemoteIPHeader X-Forwarded-For
        RequestHeader set X-Forwarded-Proto "https"
    </VirtualHost>

    # virtualhost pro nrpe
    <VirtualHost 127.0.0.1:443>
        ServerAdmin info@eduroam.cz
        ServerName etlog.cesnet.cz
        DocumentRoot /var/www/html

        ErrorLog ${APACHE_LOG_DIR}/etlog_error.log
        CustomLog ${APACHE_LOG_DIR}/etlog_access.log combined
        SSLEngine on

        SSLCertificateFile  /etc/ssl/certs/...
        SSLCertificateKeyFile /etc/ssl/private/...

        BrowserMatch "MSIE [2-6]" \
                nokeepalive ssl-unclean-shutdown \
                downgrade-1.0 force-response-1.0
        # MSIE 7 and newer should be able to use keepalive
        BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

        # HSTS
        Header always set Strict-Transport-Security "max-age=63072000; includeSubdomains;"

        <Location />
            # proxy
            ProxyPass http://127.0.0.1:8080/
            ProxyPassReverse http://127.0.0.1:8080/
        </Location>

        ProxyRequests Off
        RemoteIPHeader X-Forwarded-For
        RequestHeader set X-Forwarded-Proto "https"
    </VirtualHost>

    # virtualhost pro dotazy na api a pro ermona
    <VirtualHost etlog.cesnet.cz:8443>
        ServerAdmin info@eduroam.cz
        ServerName etlog.cesnet.cz
        DocumentRoot /var/www/html

        ErrorLog ${APACHE_LOG_DIR}/etlog_error.log
        CustomLog ${APACHE_LOG_DIR}/etlog_access.log combined
        SSLEngine on

        SSLCertificateFile  /etc/ssl/certs/...
        SSLCertificateKeyFile /etc/ssl/private/...

        BrowserMatch "MSIE [2-6]" \
                nokeepalive ssl-unclean-shutdown \
                downgrade-1.0 force-response-1.0
        # MSIE 7 and newer should be able to use keepalive
        BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

        # HSTS
        Header always set Strict-Transport-Security "max-age=63072000; includeSubdomains;"

        <Location />
            # proxy
            ProxyPass http://127.0.0.1:8080/
            ProxyPassReverse http://127.0.0.1:8080/
        </Location>

        ProxyRequests Off
        RemoteIPHeader X-Forwarded-For
        RequestHeader set X-Forwarded-Proto "https"
    </VirtualHost>
</IfModule>

Set listening ports in /etc/apache2/ports.conf:

# If you just change the port or add more ports here, you will likely also
# have to change the VirtualHost statement in
# /etc/apache2/sites-enabled/000-default.conf

Listen 80

<IfModule ssl_module>
	Listen 443
	Listen 8443
</IfModule>

<IfModule mod_gnutls.c>
	Listen 443
	Listen 8443
</IfModule>

# vim: syntax=apache ts=4 sw=4 sts=4 sr noet

Configure apache log rotation in /etc/logrotate.d/apache2:

/var/log/apache2/*.log {
	monthly
	missingok
	rotate 12
	compress
	delaycompress
	notifempty
	create 640 root adm
	sharedscripts
	postrotate
                if /etc/init.d/apache2 status > /dev/null ; then \
                    /etc/init.d/apache2 reload > /dev/null; \
                fi;
	endscript
	prerotate
		if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
			run-parts /etc/logrotate.d/httpd-prerotate; \
		fi; \
	endscript
}

Additional settings are needed according to https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPApacheConfig. The page says:

Finally, on non-Windows systems you should make sure Apache is configured in so-called "worker" mode, using the "worker" MPM, either via a setting in an OS-supplied file like /etc/sysconfig/httpd or in the Apache configuration directly. Many servers come incorrectly configured in "prefork" mode, which emulates Apache 1.3's process model and causes vastly greater resource usage inside the shibd daemon.

Enable mpm-worker by:

a2dismod mpm_prefork
a2dismod mpm_event
a2enmod mpm_worker
service apache2 restart
apachectl -M | grep worker

Syslog setup

Radius data are acquired through syslog. Installation and configuration:

cd /etc/ssl/certs/
wget https://crt.cesnet-ca.cz/CESNET_CA_Root.pem
wget https://crt.cesnet-ca.cz/CESNET_CA_3.pem
c_rehash
apt-get install syslog-ng
cat > /etc/syslog-ng/conf.d/etlog-fticks.conf
source net {
  tcp(
    port(1999)
    tls( ca_dir("/etc/ssl/certs")
    key-file("/home/etlog/etlog/cert/etlog.cesnet.cz.key.pem")
    cert-file("/home/etlog/etlog/cert/etlog.cesnet.cz.crt.pem"))
  );
};

destination fticks { file("/home/etlog/logs/fticks/fticks-$YEAR-$MONTH-$DAY" owner("etlog") group("etlog") perm(0600)); };

log { source(net); destination(fticks); };
^D
service syslog-ng restart

su - etlog
mkdir -p ~/logs/{fticks,transform,mongo,invalid_records,systemd,ldap}

The code above installs certificates required for syslog tls connection. Next part installs syslog-ng and creates it's configuration. last part creates directories /home/etlog/logs/fticks, /home/etlog/logs/transform, /home/etlog/logs/mongo and ./home/etlog/logs/invalid_records Log files created by syslog are located in /home/etlog/logs/fticks. Ldap related files are in /home/etlog/logs/ldap.

Systemd log files

Systemd is used for integration of application within the system. Logging needs to be configured to acquire output to log files:

cat >> /etc/syslog-ng/conf.d/etlog-logs.conf

filter f_etlog { facility(local0); };

destination etlog_logs { file("/home/etlog/logs/systemd/log-$YEAR-$MONTH-$DAY" owner("etlog") group("etlog") perm(0600)); };

log { source(s_src); filter(f_etlog); destination(etlog_logs); };
^D

Application log files are located in /home/etlog/logs/systemd/.

Cron setup

Cron is used to run tasks periodically. Setup is done in application for application logic and in user's crontab for incoming log importing.

System

User's crontab can be edited by using crontab -e. Crontab contains following jobs:

command	interval	description
`/home/etlog/etlog/scripts/data_import.sh`	every 5 minutes	new data importing
`/home/etlog/etlog/scripts/ldap/admins.sh`	every 5 minutes	ldap synchronization
`/home/etlog/etlog/scripts/ldap/realms.sh`	every day at 0:30	all known czech realms synchronization
`/home/etlog/etlog/scripts/invalid_records.sh`	every day at 1:00	generating of files with invalid records
`/home/etlog/etlog/scripts/invalid_records_mail.sh`	every monday at 6:00	sending report about invalid records
`/home/etlog/etlog/scripts/archive.sh`	every monday at 6:05	archiving old log files
`/home/etlog/etlog/scripts/detection_data/create_detection_data.sh &>/dev/null`	every monday at 6:10	generating login count graphs
`/home/etlog/etlog/scripts/concurrent_users/update_data.sh`	every saturday at 4:30	generating old concurrent users data

Crontab contents:

*/5 *  *   *   *     /home/etlog/etlog/scripts/data_import.sh
*/5 *  *   *   *     /home/etlog/etlog/scripts/ldap/admins.sh
30  0  *   *   *     /home/etlog/etlog/scripts/ldap/realms.sh
0   1  *   *   *     /home/etlog/etlog/scripts/invalid_records.sh
0   6  *   *   1     /home/etlog/etlog/scripts/invalid_records_mail.sh
5   6  *   *   1     /home/etlog/etlog/scripts/archive.sh
10  6  *   *   1     /home/etlog/etlog/scripts/detection_data/create_detection_data.sh &>/dev/null
30  4  *   *   6     /home/etlog/etlog/scripts/concurrent_users/update_data.sh

Node.js

Setup is defined in cron.js. Table below defines how tasks are run.

Every task in the table below generates data for collection of the same name.

task name	interval
failed_logins	every day at 02:05:00
mac_count	every day at 02:15:00
roaming	every day at 02:20:00
shared_mac	every day at 02:25:00
realm_logins	every day at 02:35:00
visinst_logins	every day at 02:40:00
heat_map	every day at 02:45:00
unique_users	every day at 02:55:00
concurrent_users	every day at 03:10:00
users_mac	every 15 minutes

Other tasks:

task name	interval
retention	every day at 03:00:00

Task retention deletes data from logs collections which are older than 365 days.

Monthly report about failed logins is sent at 5:59 every first day of month. For details see reports.

Mail setup

Mail is handled by postfix mail server. Postfix configuration type is set up as Internet site. Listeting only on localhost address (for both ipv4 and ipv6) is done with inet_interfaces = localhost in /etc/postfix/main.cf

Node.js

Setup is done with nodemailer package in file mail.js.

Packages

These packages are necessary for etlog to run:

openssl git tmux htop iptables-persistent curl tmux make syslog-ng gawk logtail postfix mailutils bc duply ncftp lftp libkrb5-dev libapache2-mod-shib2 apache2 libapache2-mod-proxy-html apache2-bin apache2-data apache2-utils ldapscripts ldap-utils ldapscripts pwgen sharutils libdate-manip-perl libxml-libxml-perl libgps-point-perl libjson-perl

Other special packages along with installation are listed below.

Backup

Duply package is used to system backup. Configuration is in /etc/duply/system/conf. Files which should be backed up are defined in /etc/duply/system/exclude.

Backup is executed by root's crontab file. Backup script is run every day. For details see root's crontab file.

Database

mongodump is used for database backup. mongodump is a utility for creating a binary export of the contents of a database. Script /etc/duply/system/pre is launched before every system backup and does the database backup using mongodump. Binary export of database is located in /home/etlog/backup/dump.

MongoDB

MongoDB is document oriented database.

installation

At the time of writing this guide, no official documentation for installation on Debian jessie is available.

apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/debian jessie/mongodb-org/3.2 main" | tee /etc/apt/sources.list.d/mongodb-org-3.2.list
apt-get update
apt-get install mongodb-org
systemctl enable mongod
service mongod start

Configuration

Disable THP by following guide from official docs. THP is disabled using init script. No further configuration should be needed.

Time data

MongoDB stores all time data as Date data type, which stores time in UTC. This present issue when incoming data have localtime which has offset against UTC Official MongoDB docs say that:

MongoDB stores times in UTC by default, and will convert any local time representations into this form. Applications that must operate or report on some unmodified local time value may store the time zone alongside the UTC timestamp, and compute the original local time in their application logic.

Data will be reconstructed to original time when presenting to the user. Conversion in aggregation pipeline from UTC to localtime can be done using:

req.db.logs.aggregate([ { $sort : { timestamp : 1 } }, { $limit : 1 },
{ $project : { timestamp : 1, _id : 0 } } ],
function(err, doc) {
  ret.logs.min = convert(doc[0].timestamp).toISOString();
});


// --------------------------------------------------------------------------------------
// convert UTC to localtime based on input
// --------------------------------------------------------------------------------------
function convert(date)
{
  d = new Date(date);
  d.setTime(d.getTime() + (-1 * d.getTimezoneOffset() * 60 * 1000));
  // offset is variable [ -60 and - 120 minutes, depending on Daylight saving time ]
  // => offset * 60 * 1000
  // offset * 60 seconds * 1000 miliseconds
  return d;
}

Usage

Database can be accessed by command mongo.
Data are divided into databases, same as in the sql dabases. Each database consists of collections, which is equivalent of sql tables. Collections consist of documents, which use the BSON notation, which is basen on JSON.

Basic commands:

show databases lists all databases which are available.

use my_database swich current database to my_database

show collections lists collection for current database.

db.my_collection.find({}) display all documents in my_collection

db.my_collection.find({}).limit(5) display 5 document from my_collection

db.my_collection.find({})limit(5).pretty() display 5 nicely formatted documents from my_collection

Node.js

Node.js is server-side JavaScript. Because the version of Node.js available in Debian jessie is very old (0.10.29~dfsg-2), installation of newer version is needed. At the time of writing this guide current version of Node.js is 6.5.

installation

apt-get install curl
curl -sL https://deb.nodesource.com/setup_6.x | bash -
apt-get install nodejs

Application internals

etlog consists of Node.js, Express web application framework and MongoDB. It uses many auxiliary javascript modules. All the necesarry modules including their specific version can be found in file package.json.

Database

Application uses database etlog. Database is separated into several collections.

Collections

In the tables below the column note is just explanatory, it is not really present in the database. Every document has also a field _id, which is just for internal MongoDB purposes, it is not shown in tables below.

logs

Collection represents raw radius log records transformed to json format. For details on data transformation see scripts/fticks_to_bson.sh

Collection has following structure:

field name	data type	note
timestamp	Date	timestamp of authentication
realm	String	domain part of username
viscountry	String	visited country
visinst	String	visited institution
csi	String	mac address
pn	String	username
result	String	result of authentication

users_mac

Collection defines binding between user and all mac addresses, which he used for successful authentication to eduroam.

Collection has following structure:

field name	data type	note
username	String	username
addrs	Array	array of user's mac addresses

mac_count

Collection contains mapping of users and mac addresses, which they used for successful authentication, for every day. Each user with more than 2 devices (assuming notebook and smartphone) is inserted. Address count and all used mac addreses are also available.

Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.

Collection has following structure:

field name	data type	note
username	String	username
count	Number	mac addresses count
addrs	Array	Array of mac addresses
timestamp	Date	timestamp

roaming

Collection contains roaming related data. For every existing institution there is number of provided roamings and used roamings for every day.

Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.

Collection has following structure:

field name	data type	note
inst_name	String	name of the institution
used_count	Number	count of institution's users authenticated
provided_count	Number	count of authentications provided
timestamp	Date	timestamp

failed_logins

Collection contains information about users, which have not successfully authenticated, for every day. Any user which has not successfully authenticated at least once is inserted. Both numbers for successful and unsuccessful authentication are available. There is also a field representing ratio (see below).

Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.

Collection has following structure:

field name	data type	note
username	String	username
timestamp	Date	timestamp
fail_count	Number	count of failed login attempts
ok_count	Number	count of successful login attempts
ratio	Number	ratio of fail_count to (ok_count + fail_count)

realm_logins

Collection contains count of logins for realms. Both successful and unsuccessful logins are counted. Values are saved in ok_count and fail_count. Unique values are also gathered, they are saved in grouped_ok_count and gropued_fail_count.

Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.

Collection has following structure:

field name	data type	note
timestamp	Date	timestamp
realm	String	realm
ok_count	Number	count of successful logins
grouped_ok_count	Number	unique count of successful logins
fail_count	Number	count of unsuccessful logins
grouped_fail_count	Number	unique count of unsuccessful logins

visinst_logins

Collection contains count of logins for visited institutions. Both successful and unsuccessful logins are counted. Values are saved in ok_count and fail_count. Unique values are also gathered, they are saved in grouped_ok_count and gropued_fail_count.

Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.

Collection has following structure:

field name	data type	note
timestamp	Date	timestamp
realm	String	visinst
ok_count	Number	count of successful logins
grouped_ok_count	Number	unique count of successful logins
fail_count	Number	count of unsuccessful logins
grouped_fail_count	Number	unique count of unsuccessful logins

concurrent_users

Collection contains data about users which logged in different locations concurrently. For the user to be in the collection the time difference of authentication in first visisted institution and the second visisted instituon must be lower than time_needed. The value of time_needed field is computed from geo information about instituons and possible travel speed.

Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.

Collection has following structure:

field name	data type	note
timestamp	Date	timestamp
timestamp_1	Date	timestamp of first authentication
timestamp_2	Date	timestamp of second authentication
visinst_1	String	first visited institution
visinst_2	String	second visited institution
username	String	username
mac_address	String	MAC address related to incident
time_needed	Number	time needed to travel from visinst_1 to visinst_2 in seconds
dist	Number	distance between institutions in meters
revision	Number	revision number

Data update

Input data are stored in scripts/concurrent_users/inst.json. Data are converted from source XML document which contains geographical data for all institutions. Conversion script used is in scripts/concurrent_users/inst.pl. Each run of cron job which computes new collection data works with input json data.

Data are automatically updated by script scripts/concurrent_users/update_data.sh. The script is run every saturday at 04:30. It gets new version of institution.xml, compares it to one locally saved. If the files differ new version of input data are created by scripts/concurrent_users/inst.pl. These input data are used to compute new database data for concurrent_users collection. Newly computed data are 14 days old. New revision of data is also saved.

concurrent_rev

Collection contains all available revisions of concurrent_users collection data. Data in this collection can be used to retrieve data from specific revision.

Collection has following structure:

field name	data type	note
revisions	Array	Array of all available revisions

unique_users

Collection contains unique mac addresses for realms for every day. Addresses of users from the realm are in array used_addrs. Addresses of users from other instituions which used realm as visinst are in array provided_addrs;

Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.

Collection has following structure:

field name	data type	note
timestamp	Date	timestamp
realm	String	name of instituion
realm_addrs	Array	array of instituion's users addresses
visinst_addrs	Array	array of users, which used visited realm

realm_admins

Collection contains array of administrators email's for institutions. Each institution hay have administrators specified.

If the insititution is defined and has administrators defined, the administrator(s) get a report once every month. For more see reports.

The only exception is realm "cz" which does not correspond with any institution. In this case, the administrator recieves reports with most significant problems found.

Realms are hierarchical - they are domain names, which use DNS. Every institution has it's domain and may have subdomains. All of these are different realms. Depending on the size of a subdomain/realm, it may be efficient for each one to have separate administration.

This collection is used to determine administrators for specific realm. If realm is defined, the the administrators can be notified about events in their realm.

Collection has following structure:

field name	data type	note
realm	String	realm
admin	String	administrator's email address
notify_enabled	Boolean	flag if administrator should be notified

Data insertion may done easily by:

use etlog
db.realm_admins.insert({realm : "cvut.cz", admins : [ "administrator@cvut.cz" ]})

Data update may done easily by:

use etlog
db.realm_admins.update({realm : "cvut.cz"}, { $addToSet : { admins : "administrator2@cvut.cz" } } )

Data may be easily erased by:

use etlog
db.realm_admins.remove({realm : "cvut.cz"})

realm_admin_logins

Collection contains unique mac addresses for realms for every day. Addresses of users from the realm are in array used_addrs. Addresses of users from other instituions which used realm as visinst are in array provided_addrs;

Collection has following structure:

field name	data type	note
admin_login_ids	Array	Array of possible login identities
admin_notify_address	String	admin's email address
administered_realms	Array	Array of realms which the admin manages

shared_mac

Collection contains records about mac addresses, which have been used for successfull authnetication for mutiple different usernames for every day.

Collection has following structure:

field name	data type	note
timestamp	Date	timestamp
mac_address	String	MAC address
users	Array	Array of users, which have used specific MAC address
count	Number	number of users

realms

Collection contains all known realms from Czech republic.

Collection has following structure:

field name	data type	note
realm	String	realm

privileged_ips

Collection contains all IP addresses, which are allowed to do machine processing of data.

Collection has following structure:

field name	data type	note
ip	String	IP address
hostname	String	hostname of IP address
comment	String	comment

heat_map

Collection contains data for every known realm (see realms) for every day. Attribute realm represent institution from which the users roam. Attribute institutions is an array, which contains institution name (also named realm) and count. This array represents visited insitutions from users of specific realm for specific day.

Collection has following structure:

field name	data type	note
timestamp	Date	timestamp
realm	String	institution name
institutions	Array	array of other institutions

One record may look like:

{
        "_id" : ObjectId("5812681deb7bfee4dcde417d"),
        "realm" : "ufa.cas.cz",
        "timestamp" : ISODate("2016-10-25T22:00:00Z"),
        "institutions" : [
                {
                        "count" : 1,
                        "realm" : "utia.cas.cz"
                },
                {
                        "count" : 9,
                        "realm" : "asu.cas.cz"
                },
                {
                        "count" : 17,
                        "realm" : "ig.cas.cz"
                }
        ]
}

sessions

Collection contains user sessions. Collection data are managed by connect-mongo. Data are updated dynamically based on user authentication and role changes. All relevant information for each authenticated user is stored.

Indexes

Indexes are used to speed up queries. Following indexes are used:

collection name	indexed fields	note
failed_logins	_id, timestamp
logs	_id, timestamp, realm, visinst, pn, csi, result
realms	_id, realm
mac_count	_id, timestamp
shared_mac	_id, mac_address
privileged_ips	_id
realm_admins	_id
roaming	_id, timestamp
users_mac	_id, username
heat_map	_id, timestamp, realm
realm_logins	_id, timestamp, realm
visinst_logins	_id, timestamp, realm
unique_users	_id, timestamp, realm
concurrent_users	_id, timestamp, username
sessions	_id, expires
realm_admin_logins	_id, admin

Reports

Application produces periodical reports. A report is a mail content, which is sent to eduroam administrators.

Weekly reports

Weekly report is sent only to national radius administrator. It contains information about invalid records of past week.

Monthly reports

Monthly report is sent to all administrators defined in realm_admins which have notify_enabled flag set to true. It contains 100 users with most failed logins from corresponding realm. Limit of 100 users is defined in config.js.

Configuration

Report configuration is located in config directory. Weekly reports configuration is located in config/invalid_records_mail. Link to the code generating the report content. Monthly reports configuration is located in config/config.js. Link to the code generating the report content.

Privilege levels

The application contains three privilege levels - user, realm admin and admin. The user is just a regular user with no special permissions. The user is least privileged one. The Realm admin is an admin of some specific realm(s). The admin is a global admin of all existing realms.

Mapping users to privileges

The autentication mechanism can provide addionational information about users. Based on the provided information the user can be recognized as realm admin or admin. Mapping of the groups provided by autentication process to privilege levels is defined in config/config.js.

Application structure

  /home/etlog/etlog                - application root
  |-- app.js                       - main application file, constains appliation configuration
  |-- auth.js                      - authentication configuration
  |-- bin                   
      `-- www                      - script to start the application
  |-- cert                         - certificate related files
  |-- config                       - configuration files for reports
  |-- cron                         - cron tasks
      `-- delete_logs.js           - cron task for deleting old data from logs collection
      `-- failed_logins.js         - cron task for generating failed_logins collection data
      `-- heat_map.js              - cron task for generating heat_map collection data
      `-- mac_count.js             - cron task for generating mac_count collection data
      `-- roaming.js               - cron task for generating roaming collection data
      `-- service_state.js         - cron task for checking service state in all known realms
      `-- shared_mac.js            - cron task for generating shared mac address data
      `-- succ_logins.js           - cron task for generating succ_logins collection data
      `-- users_mac.js             - cron task for mapping users and mac addresses
  |-- cron.js                      - cron tasks definiton
  |-- db.js                        - database and schema configuration
  |-- doc                          - documentation
  |-- error_handling.js            - middleware error handlers
  |-- gulpfile.js                  - definition of gulp tasks
  |-- javavscripts                 - directory with source frontend javascript files
  |-- LICENSE                      - project LICENSE
  |-- mail.js                      - mail api
  |-- mongo_queries                - directory with mongo shell queries for debugging purposes
  |-- node_modules                 - application dependency files
  |-- package.json                 - definition of application dependencies and properties
  |-- public                       - directory for referring public files
      `-- partials                 - directory for generated html files from pug templates
  |-- README.md                    - link to doc/notes.md
  |-- request.js                   - wrapper to backend api
  |-- routes                       - application routes
  |-- routes.js                    - mapping of routes to application
  |-- scripts                      - various scripts
      `-- archive.sh               - script for old data archivation
      `-- data_import.sh           - cron script to import live data delivered by syslog
      `-- detection_data           - files for generating service state detection data
      `-- fticks_to_bson.sh        - transformation script from fticks to bson
      `-- indexes.js               - simple file with used indexes
      `-- invalid_records_mail.sh  - script for generating weekly invalid records report
      `-- invalid_records.sh       - script for generating invalid record files
      `-- old_data.sh              - script to import old data
      `-- process_old_data.js      - script to generate database data from old data
  |-- stylesheets                  - source frontend css files
  |-- views                        - templates of displayed pages
      `-- templates                - directory with pug templates for html pages

Gulp

Gulp is a build system, which can be used for variuos tasks.

Instalation

Gulp must be installed globally by root user by typing:

npm install -g gulp-cli

Usage

Everything that gulp does is defined in gulpfile.js. After defining tasks, they can be run bu using gulp. When no particular task is defined as gulp parameter, all tasks are run.

views

Gulp is used to generate html files from pug templating language. Pug files are in views/templates/, html output is in public/partials. Task is run by gulp views.

css

Gulp is used to generate single css file from all used css files. Source css files are in stylesheets, concatenated and minified output is in public/stylesheets/app.min.css. Task is run by gulp css.

js

Gulp is used to generate single javascript file from all used javascript files. Source javascript files are in javascripts/, concatenated output is in public/javascripts/app.js. Task is run by gulp js.

Log files

Everything related to log files is located in /home/etlog/logs.

  /home/etlog/logs           - log files root
  |-- fticks                 - directory with log files and offset files
  |-- last_date              - file with date of last processed log file 
  |-- mongo                  - directory with log files generated by mongoimport
  |-- transform              - directory with files related to transformation from F-Ticks to BSON
      `-- err-*              - file containing line numbers of invalid records
      `-- last_*             - file containing number of last processed line of corresponding log file
  |-- invalid_records        - directory with files containing invalid records for every day
  |-- access                 - webserver access log files for every day

New data

Incoming syslog data are processed by scripts/data_import.sh and subsequently by scripts/fticks_to_bson.sh. Data are converted from F-Ticks format (for more see this) to BSON.

Data are processed every 5 minutes by user's crontab. Last date file (/home/etlog/logs/last_date) contains date of last processed log file. File is updated every day, when the last part of the data is imported.

Last file for every processed log file contains last processed line number. File on every cron job run. It is used to calculate absolute line numbers for error reporting.

Filtering

Following filtering and replacements are done on incoming data:

All unprintable characters (ASCII codes 0 - 31) with exception of newline (\n, ASCII code 10) are replaced with string representing their code. For example data containing backspace (\b, ASCII code 8) will be replaced to string "<8>".
Backslash ('') and quote ('"') are escaped: '' will become '\' and '"' will become '"'
Correct number of fields in each record is checked: Each record must contain exactly 7 fields - REALM, VISCOUNTRY, VISINST, CSI, PN, RESULT + (inital log part).
Each of attributes REALM, VISCOUNTRY, VISINST, and RESULT must be separated from it's value by exactly one character '=' and it's value must not be empty.
VISINST value must begin with character '1'.
CSI value after normalization (all byte separators are deleted - eg. 123456789abc) must be 12 characters long.

Data which do not meet the filtering criteria are considered invalid and not imported to database. Information about invalid records are printed to error log files - see error log files.

Error log files

Transform error log file has this structure: filename:line number:error reason Transform error log file may look like:

/home/etlog/logs/fticks/fticks-2016-10-20:681871: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:682504: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:683314: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:684293: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:685727: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:686547: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:688106: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:688122: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:688317: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:689784: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:690431: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:690872: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:692246: skipped, invalid mac address

Archiving

Older data are archived due to space usage. Data are archived every monday at 6:05. F-tick files, transform log files and invalid records from past 14 days to past week are compressed. gzip is used for compression.

API

api-query-params

For easy use mapping from query string to MongoDB queries is used. Module api-query-params is used for this functionality. Official documentation provides full information how to use. Module is slightly modified to support various timestamps and for correct mapping of them to backend api.

Table below defines operators usage:

URI	example	explanation
`key=val`	`type=public`	equal
`key>val`	`count>5`	greater
`key>=val`	`rating>=9.5`	greater or equal
`key<val`	`createdAt<2016-01-01`	lower
`key<=val`	`score<=-5`	lower or equal
`key!=val`	`status!=success`	not equal
`key=val1,val2`	`country=GB,US`	equal to all listed
`key!=val1,val2`	`lang!=fr,en`	not equal to all listed
`key`	`phone`	exists
`!key`	`!email`	not exists
`key=/value/<opts>`	`email=/@gmail\.com$/i`	reqex equal
`key!=/value/<opts>`	`phone!=/^06/`	regex not equal

Other operators usage:

operator type	example	explanation
skip	`skip=10`	skip 10 items before presenting to the user
limit	`limit=10`	limit query to 10 items
sort	`sort=key`	sort ascending by key
sort	`sort=-key`	sort descending by key
sort	`sort=-key1,-key2`	sort descending by both key1 and key2

Application api:

URL	query string variables	note
/api/search/	timestamp, pn, [ csi, result, realm, visinst]
/api/failed_logins/	timestamp, [ username, fail_count, ok_count, ratio ]
/api/mac_count/	timestamp, [ username, count, addrs ]
/api/roaming/most_provided/	timestamp, [ inst_name, provided_count ]
/api/roaming/most_used/	timestamp, [ inst_name, used_count ]
/api/shared_mac/	timestamp, [ count, mac_address, users ]
/api/heat_map/	timestamp, [ realm, institutions.realm, institutions.count ]
/api/succ_logins/	timestamp, [ username, count ]
/api/db_data/		url with current data state
/api/realms/		url returning list of realms from realms collection
/api/realm_logins	timestamp, [ realm ]
/api/visinst_logins	tiemstamp, [ realm ]
/api/unique_users/realm	timestamp, realm
/api/unique_users/visinst	timestamp, realm
/api/concurrent_users	timestamp, [ username, visinst_1, visinst_2, revision, diff_needed_timediff ]
/api/concurrent_inst	timestamp
/api/count/mac_count	timestamp, [ username, count, addrs ]	returns count of records for mac_count collection
/api/count/shared_mac	timestamp, [ count, mac_address, users ]	returns count of records for shared_mac collection
/api/count/concurrent_users	timestamp, [ username, visinst_1, visinst_2, revision, diff_needed_timediff ]	returns count of records for concurrent_users collection
/api/count/logs	timestamp, [ pn, csi, realm, visinst, result ]	returns count of records for logs collection

Examples

Examples below are using the curl command, but any other method (wget, browser, ... ) to retrieve http content can be used. Some of the command below may take some time (units of seconds) to complete.

Basic examples

curl 'https://etlog.cesnet.cz/api/mac_count/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/roaming/most_provided/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/roaming/most_used/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/failed_logins/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/shared_mac/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/heat_map/?timestamp=2016-10-07'

Advanced examples

# get mac count records for 2016-10-07 with more than 5 mac addresses
curl 'https://etlog.cesnet.cz/api/mac_count/?timestamp=2016-10-07&count>5'

# get mac count records for 2016-10-07 with more than 5 mac addresses, sort from most to least
curl 'https://etlog.cesnet.cz/api/mac_count/?timestamp=2016-10-07&count>5&sort=-count'

# get mac count records for 2016-10-07 with more than mac addresses between 5 and 15, sort from most to least
curl 'https://etlog.cesnet.cz/api/mac_count/?timestamp=2016-10-07&count>5&count<15&sort=-count'

# get most provided roaming records for 2016-10-07 with more than 1000 provided roamings, sort from most to least
curl 'https://etlog.cesnet.cz/api/roaming/most_provided/?timestamp=2016-10-07&provided_count>1000&sort=-count'

# get most used roaming records for 2016-10-07 with more than 100 used roamings, sort from most to least
curl 'https://etlog.cesnet.cz/api/roaming/most_used/?timestamp=2016-10-07&used_count>100&sort=-count'

# get failed logins records for 2016-10-07 with ratio between 0.4 and 0.9, sort from most to least
curl 'https://etlog.cesnet.cz/api/failed_logins/?timestamp=2016-10-07&ratio>0.4&ratio<0.9&sort=-ratio'

# get failed logins records for 2016-10-07 only for users with realms ending '.cz' with fail count more than 500, sort from most to least
curl 'https://etlog.cesnet.cz/api/failed_logins/?username=/\.cz$/&timestamp=2016-10-07&fail_count>500&sort=-fail_count'

# get failed logins records from 2016-09-20 to 2016-09-30 only for users with realms ending '.edu'
# with fail count more than 100, sort from most to least
curl 'https://etlog.cesnet.cz/api/failed_logins/?username=/\.edu$/&timestamp>2016-09-20&timestamp<2016-09-30&fail_count>100&sort=-fail_count'

# get failed logins records from 2016-09-20 to 2016-10-10 only for users from realm 'fit.cvut.cz'
# with no successful logins, sort from most failed logins to least
# get only 10 results
curl 'https://etlog.cesnet.cz/api/failed_logins/?username=/.*@fit\.cvut\.cz$/&timestamp>2016-09-20&timestamp<2016-10-10&ok_count=0&sort=-fail_count&limit=10'

# get heat map data for 2016-08-30 for institution 'cvut.cz'
curl 'https://etlog.cesnet.cz/api/heat_map/?timestamp=2016-08-30&realm=cvut.cz'

# get heat map data for 2016-08-30 where institution 'vfn.cz' was the visited institution
curl 'https://etlog.cesnet.cz/api/heat_map/?timestamp=2016-08-30&institutions.realm=vfn.cz'

# get heat map data for 2016-08-30 where the visited count was more than 1000
curl 'https://etlog.cesnet.cz/api/heat_map/?timestamp=2016-08-30&institutions.count>1000'

Timestamp

Timestamp value must be in one of the formats in table below. All timestamps used for querying must have set hours, minutes, seconds and milliseconds to 0, when specified. When not specified, values for hours, minutes, seconds and milliseconds are set automatically to 0.

format	example
ISO-8601	2016-10-06T22:00:00.000Z
reduced ISO-8601	2016-10-06T22:00:00
%Y-%m-%d	2016-10-06

Routes

Frontend

Frontend is built in Pug (formely Jade) template engine. Only the index page is loaded through templating engine. All other pages are compiled to html by gulp. This is necessary becasuse all other pages are loaded dynamically via angular, which is not able to use templating engine.

Angular

AngularJS is a complete JavaScript-based open-source front-end web application framework. It enables dynamic content manipulation throught html element attributes.

Frontend has following structure:

state	url	title
search	/#/search?pn&csi	etlog: obecné vyhledávání
mac_count	/#/mac_count	etlog: počet zařízení
shared_mac	/#/shared_mac	etlog: sdílená zařízení
failed_logins	/#/failed_logins	etlog: neúspěšná přihlášení
heat_map	/#/heat_map	etlog: mapa roamingu
orgs_roaming_most_used	/#/orgs_roaming_most_provided	etlog: organizace nejvíce poskytující konektivitu
orgs_roaming_most_provided	/#/orgs_roaming_most_used	etlog: organizace nejvíce využívající roaming
roaming_activity	/#/roaming_activity	etlog: aktivita eduroamu
detection_data	/#/detection_data	etlog: absolutní počet přihlášení
detection_data_grouped	/#/detection_data_grouped	etlog: normalizovaný počet přihlíšení
notifications	/#/notifications	etlog: správa notifikací

Backend

Application api is described in section API. This section describes classic html pages.

URL	explanation
/	title page

System intergation

Application is intergated in system with the use of systemd. systemd is an init system used in Linux distributions.

Service configuration is in /etc/systemd/system/etlog.service. File contents:

[Service]
ExecStart=/usr/bin/npm --prefix /home/etlog/etlog/ start
WorkingDirectory=/home/etlog/etlog/
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogFacility=local0
SyslogLevel=info
SyslogIdentifier=etlog
User=etlog
Group=etlog

[Install]
WantedBy=multi-user.target

Service is enabled by systemctl enable etlog. Service is launched by systemctl start etlog. In case the application crashes for some reason, systemd automatically restarts it.

Name		Name	Last commit message	Last commit date
Latest commit History 1,615 Commits
bin		bin
config		config
cron		cron
doc		doc
javascripts		javascripts
mongo_queries		mongo_queries
node_modules		node_modules
public		public
routes		routes
scripts		scripts
stylesheets		stylesheets
views		views
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.js		app.js
audit.sh		audit.sh
cron.js		cron.js
db.js		db.js
error_handling.js		error_handling.js
gulpfile.js		gulpfile.js
mail.js		mail.js
package-lock.json		package-lock.json
package.json		package.json
prezentace.sh		prezentace.sh
request.js		request.js
routes.js		routes.js

License

CESNET/etlog

Folders and files

Latest commit

History

Repository files navigation

etlog - eduroam traffic log analysis

Basic info

Server setup

User setup

Network setup

Shibboleth setup

IdP attributes

Apache setup

Syslog setup

Systemd log files

Cron setup

System

Node.js

Mail setup

Node.js

Packages

Backup

Database

MongoDB

installation

Configuration

Time data

Usage

Node.js

installation

Application internals

Database

Collections

logs

users_mac

mac_count

roaming

failed_logins

realm_logins

visinst_logins

concurrent_users

Data update

concurrent_rev

unique_users

realm_admins

realm_admin_logins

shared_mac

realms

privileged_ips

heat_map

sessions

Indexes

Reports

Weekly reports

Monthly reports

Configuration

Privilege levels

Mapping users to privileges

Application structure

Gulp

Instalation

Usage

views

css

js

Log files

New data

Filtering

Error log files

Archiving

API

api-query-params

Examples

Basic examples

Advanced examples

Timestamp

Routes

Frontend

Angular