CyberSecurity_Strategies

Machine-based Text Analytics of CyberSecurity Strategies

The cybersecurity strategies can be downloaded from this website from the International Telecommunications Union:

National Cybersecurity Strategies repository

Here you can see a human analysis of cyberscurity laws. It would be interesting to see whether a similar analysis per country could be done using machine-learning or other text mining techniques:

Cyberwellness Profiles

Suggested analysis

###Text mining

Create a text version of each of the 73 cybersecurity strategy documents at the link above
Break down each document into sentences
Classify each sentence and try to assign it a tag from this list below. It is expected that each document will have hundreds or thousands of sentences tagged and many sentences which can't be tagged because they are note related to any of the tags below.

These labels come from the headers in the cyberwellness profiles linked above

Category	Sub category
LEGAL MEASURES	CRIMINAL LEGISLATION, REGULATION AND COMPLIANCE
TECHNICAL MEASURES	CIRT, STANDARDS, CERTIFICATION
ORGANIZATION MEASURES	POLICY, ROADMAP FOR GOVERNANCE, RESPONSIBLE AGENCY, NATIONAL BENCHMARKING
CAPACITY BUILDING	STANDARDISATION DEVELOPMENT, MANPOWER DEVELOPMENT, PROFESSIONAL CERTIFICATION, AGENCY CERTIFICATION
COOPERATION	INTRA-STATE COOPERATION, INTRA-AGENCY COOPERATION, PUBLIC SECTOR PARTNERSHIP, INTERNATIONAL COOPERATION
CHILD ONLINE PROTECTION	NATIONAL LEGISLATION, UN CONVENTION AND PROTOCOL, INSTITUTIONAL SUPPORT, REPORTING MECHANISM

For example, let's classify a sentence from imaginary country "Aztlan": "This country has implemented strict criminal legislation and certification of technology specialists for cybersecurity"

A suggested schema to store this sentence after the analysis is: (suggested json or yaml format):

Sentence id number: 10384648   # A unique number to identify this sentence  
Source Document: "Aztlan Cyber Security Strategy 2016.pdf"   # the name of the .pdf where the sentence comes from  
Country: "Aztlan"  
Sentence: "This country has implemented strict criminal legislation and certification of technology specialists for cybersecurity"  
Classification Tags: "Legal - Criminal legislation", "Technical measures - certification"  # from classification process  
Topics or keywords: "children, attacks, green computing, denial of service, etc"  # from unsupervised topic discovery  
Sentence Position in Document: page #, or byte   (this is so then we can link to the sentence)

Search (optional but very useful)

If you used a search engine for any of the steps above this information is unnecessary to you, but if you did not, keep reading:

If the thousands of sentences in the documents could be ingested into an Apache Solr or Elastic Search server they could be easily searched by the end-users with their specific queries.

Unsupervised topic extraction (optional but useful)

If you can also do an unsupervised analysis to extract the topics of the sentences, this would also be useful. This would show that there might be some important topics which do not fit in any of the classification categories of the table above.

Expected result

After all your efforts above, the expected product is a web page where end-users can interact with the data. The webpage should include:

A visualization where the user can graphically explore the dataset
It must include an area where the user can read the tagged sentences
It must have buttons or check boxes to filter the sentences displayed (Country, Source Document, Tags). Multiple filters should be allowed simultaneously. (E.g. Filter by country and by 2 tags)
Ideally the users shuld be able to query the dataset freely.

For example this could be a reulting dashboard, but please don't limit your creativity to this:

Software and hardware requirements / recommendations

Web interface: it is recommended the web interface uses only static html, javascript and css files. In this way the hosting can be done moved around easily without dependencies on php, pyhton, java, etc. Suggested frameworks include: http://vuejs.org/ , https://facebook.github.io/react/, https://angularjs.org/
Backend or database: as above, if possible do not create any dependency on a database or search server for serving the application. It is suggested that you use database or search engine servers for processing the data, but the result ideally will all be saved in static files (json, csv, yaml, xml, etc.) which can be stored and served from many hosting environment.
If your organization has servers with long-term life and support you might want to use these as a backend, but your organization must be committed to offering the service for at least five years.

Examples (demos and code)

Includes a search engine and a rich visualization https://unite.un.org/ideas/content/whs-explorer Text analysis offline and just created a visualization of the results https://unite.un.org/ideas/content/links-sustainable-cities

The screenshot below was built using data from the Fordham Design Lab branch: https://github.com/ICT4SD/CyberSecurity_Strategies/tree/Fordham_DesignLab

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dashboard_fordham_design_lab.qvf		dashboard_fordham_design_lab.qvf
requirements.txt		requirements.txt
results_concatenated.json		results_concatenated.json
screenshot.jpg		screenshot.jpg
sentence_classifier.ipynb		sentence_classifier.ipynb
visual_demo.JPG		visual_demo.JPG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

dashboard_fordham_design_lab.qvf

dashboard_fordham_design_lab.qvf

requirements.txt

requirements.txt

results_concatenated.json

results_concatenated.json

screenshot.jpg

screenshot.jpg

sentence_classifier.ipynb

sentence_classifier.ipynb

visual_demo.JPG

visual_demo.JPG

Repository files navigation

CyberSecurity_Strategies

Suggested analysis

Search (optional but very useful)

Unsupervised topic extraction (optional but useful)

Expected result

Software and hardware requirements / recommendations

Examples (demos and code)

About

Releases

Packages

Languages

License

LordDarkula/CyberSecurity_Strategies

Folders and files

Latest commit

History

Repository files navigation

CyberSecurity_Strategies

Suggested analysis

Search (optional but very useful)

Unsupervised topic extraction (optional but useful)

Expected result

Software and hardware requirements / recommendations

Examples (demos and code)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages