added training notebooks

johnwunder · johnwunder · commit f12c102308de · 2019-10-17T12:16:03.000-04:00
diff --git a/README.md b/README.md
@@ -13,6 +13,9 @@ Note: this repository is a work in progress. In the coming months we will be add
 2. Activate the environment: `source env/bin/activate`
 3. Install requirements into the virtual environment: `pip3 install -r requirements.txt`
 
+## Training
+This repository also contains Jupyter notebooks and other material for ATT&CK training. The `trainings` directory has that content, which can be launched via Binder. The `binder` directory has requirements for that notebook, per the Binder documentation.
+
 ## Related MITRE Work
 #### CTI
 [Cyber Threat Intelligence repository](https://github.com/mitre/cti) of the ATT&CK catalog expressed in STIX 2.0 JSON.
diff --git a/binder/requirements.txt b/binder/requirements.txt
@@ -0,0 +1,56 @@
+altair==3.2.0
+antlr4-python3-runtime==4.7.2
+appnope==0.1.0
+attackcti==0.2.7
+attrs==19.2.0
+backcall==0.1.0
+bleach==3.1.0
+certifi==2019.9.11
+chardet==3.0.4
+decorator==4.4.0
+defusedxml==0.6.0
+entrypoints==0.3
+idna==2.8
+ipykernel==5.1.2
+ipython==7.8.0
+ipython-genutils==0.2.0
+jedi==0.15.1
+Jinja2==2.10.1
+jsonschema==3.0.2
+jupyter-client==5.3.3
+jupyter-core==4.5.0
+MarkupSafe==1.1.1
+mistune==0.8.4
+nbconvert==5.6.0
+nbformat==4.4.0
+notebook==6.0.1
+numpy==1.17.2
+pandas==0.25.1
+pandocfilters==1.4.2
+parso==0.5.1
+pexpect==4.7.0
+pickleshare==0.7.5
+prometheus-client==0.7.1
+prompt-toolkit==2.0.9
+ptyprocess==0.6.0
+Pygments==2.4.2
+pyrsistent==0.15.4
+python-dateutil==2.8.0
+pytz==2019.2
+pyzmq==18.1.0
+requests==2.22.0
+Send2Trash==1.5.0
+simplejson==3.16.0
+six==1.12.0
+stix2==1.2.0
+stix2-patterns==1.1.0
+taxii2-client==0.5.0
+terminado==0.8.2
+testpath==0.4.2
+toolz==0.10.0
+tornado==6.0.3
+traitlets==4.3.2
+urllib3==1.25.6
+vega==2.6.0
+wcwidth==0.1.7
+webencodings==0.5.1
diff --git a/trainings/detection-training/.gitignore b/trainings/detection-training/.gitignore
@@ -0,0 +1,2 @@
+data_sources.json
+**/*.ipynb_checkpoints/
diff --git a/trainings/detection-training/Data Sources.ipynb b/trainings/detection-training/Data Sources.ipynb
@@ -0,0 +1,290 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Data Source Investigation\n",
+    "\n",
+    "We'll use the [ATT&CK Python Client](https://github.com/hunters-forge/ATTACK-Python-Client) to manually examine the techniques, list the data sources, and build a heatmap out of our selected sources.\n",
+    "\n",
+    "If you're looking for less development or a more in-depth and finely-grained dive, check out:\n",
+    "* [DeTTACK](https://github.com/rabobank-cdc/DeTTECT)\n",
+    "* [AttackDatamap](https://github.com/olafhartong/ATTACKdatamap)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Import the packages we'll need\n",
+    "\n",
+    "# Some basic python and jupyter stuff\n",
+    "from collections import defaultdict\n",
+    "import json\n",
+    "from IPython.display import FileLink, FileLinks\n",
+    "\n",
+    "# Visualization and data libraries\n",
+    "import altair as alt\n",
+    "import pandas as pd\n",
+    "\n",
+    "# ATT&CK Python Client, by @HuntersForge (https://github.com/hunters-forge/ATTACK-Python-Client)\n",
+    "from attackcti import attack_client\n",
+    "\n",
+    "# Because this is in Jupyter notebooks we need to enable that renderer for the altair charts to work\n",
+    "alt.renderers.enable('notebook')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Get the ATT&CK Enterprise techniques using the client library"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = attack_client()\n",
+    "all_techniques = client.get_enterprise()['techniques'] # Note - this takes a few seconds to download and parse\n",
+    "\n",
+    "print(\"Got {} techniques\".format(len(all_techniques)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Analyze the data sources and build a chart to understand the most valuable ones\n",
+    "\n",
+    "We'll build up a dictionary that counts data sources by the number of techniques they can help detect."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Collect unique data sources from the techniques\n",
+    "techniques_by_source = defaultdict(lambda: {'count': 0, 'techniques': []})\n",
+    "\n",
+    "# Loop through all techniques, then through all data sources on that technique\n",
+    "for technique in all_techniques:\n",
+    "    for ds in technique.get('x_mitre_data_sources', []):\n",
+    "        techniques_by_source[ds]['count'] += 1\n",
+    "        # External_ID in the first external reference is the T#### number\n",
+    "        techniques_by_source[ds]['techniques'].append(technique.external_references[0].external_id)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a pandas dataframe out of that result and show count column for the top 15\n",
+    "df = pd.DataFrame.from_dict(techniques_by_source, orient='index', columns=['count', 'techniques']).rename_axis('source')\n",
+    "top_15 = df.sort_values('count', ascending=False)[0:15]\n",
+    "top_15[['count']]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Show the chart in altair\n",
+    "\n",
+    "Altair can be used to easily turn pandas dataframes into visualizations. In this case, we just show a bar chart of the top 10 data sources (those that help you detect the most techniques)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "alt.Chart(df.reset_index()).mark_bar().encode(\n",
+    "    y=alt.Y(\n",
+    "        'source',\n",
+    "        sort=alt.EncodingSortField(\n",
+    "            field=\"count\",  # The field to use for the sort\n",
+    "            order=\"descending\"  # The order to sort in\n",
+    "        )\n",
+    "    ),\n",
+    "    x='count'\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Building a Heatmap\n",
+    "\n",
+    "But what if you only have certain data already, and don't have flexibility to add different ones? That's the case for our exercise! How do you know what techniques you can detect based on that?\n",
+    "\n",
+    "We can generate a heatmap based on the data we created earlier. We can map the data sources we know we have into the data sources here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# First, list the data sources alphabetically so we can figure out which ones we have\n",
+    "\n",
+    "df.reset_index().sort_values('source')[['source', 'count']].style.hide_index()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### List the data sources that are available\n",
+    "\n",
+    "In the list below, add the data sources that we have available in BRAWL. As a reminder, we have:\n",
+    "* Sysmon\n",
+    "* Windows event logs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Case sensitive!!!\n",
+    "sources_we_have = [\n",
+    "    'Windows event logs',\n",
+    "    'Process monitoring'\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Calculate out the techniques for which we have some detection capability"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "techniques = defaultdict(lambda: 0)\n",
+    "\n",
+    "for ds in sources_we_have:\n",
+    "    for technique in techniques_by_source[ds]['techniques']:\n",
+    "        techniques[technique] += 1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"You can detect {} out of {} techniques\".format(len(techniques), len(all_techniques)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Generate the heatmap"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate that heatmap!\n",
+    "\n",
+    "def technique_score(t):\n",
+    "    if t not in techniques:\n",
+    "        return 0.0\n",
+    "    elif techniques[t] > 1:\n",
+    "        return 1.0\n",
+    "    else: # count of sources == 1\n",
+    "        return 0.5\n",
+    "\n",
+    "heatmap = {\n",
+    "    'version': \"2.1\",\n",
+    "    'name': 'Detection Possibilities',\n",
+    "    'domain': \"mitre-enterprise\",\n",
+    "    'showTacticRowBackground': True,\n",
+    "    'gradient': {\n",
+    "        'colors': [\n",
+    "            '#ffffff',\n",
+    "            '#66b1ff'\n",
+    "        ],\n",
+    "        'minValue': 0.0,\n",
+    "        'maxValue': 1.0\n",
+    "    },\n",
+    "    'techniques': [{'techniqueID': t, 'score': technique_score(t)} for t in techniques]\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write as a JSON file and show a download link\n",
+    "with open('data_sources.json', 'w') as f:\n",
+    "    f.write(json.dumps(heatmap))\n",
+    "    \n",
+    "FileLink('data_sources.json')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "heatmap"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "detection-training",
+   "language": "python",
+   "name": "detection-training"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/trainings/detection-training/Prioritization Scenarios.ipynb b/trainings/detection-training/Prioritization Scenarios.ipynb
diff --git a/trainings/detection-training/README.md b/trainings/detection-training/README.md

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+data_sources.json`
	`2`	`+*/.ipynb_checkpoints/`