|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Data Source Investigation\n", |
| 8 | + "\n", |
| 9 | + "Let's use the [ATT&CK Python Client](https://github.com/hunters-forge/ATTACK-Python-Client) to manually examine the techniques, list the data sources, and build a heatmap out of our selected sources.\n", |
| 10 | + "\n", |
| 11 | + "If you're looking for less development or a more in-depth and finely-grained dive, check out:\n", |
| 12 | + "\n", |
| 13 | + "* [DeTTACK](https://github.com/rabobank-cdc/DeTTECT)\n", |
| 14 | + "* [AttackDatamap](https://github.com/olafhartong/ATTACKdatamap)\n", |
| 15 | + "\n", |
| 16 | + "*Consider: What have you used to track data sources? What has worked well, and what has not worked so well?*" |
| 17 | + ] |
| 18 | + }, |
| 19 | + { |
| 20 | + "cell_type": "code", |
| 21 | + "execution_count": null, |
| 22 | + "metadata": {}, |
| 23 | + "outputs": [], |
| 24 | + "source": [ |
| 25 | + "# Import the packages we'll need\n", |
| 26 | + "\n", |
| 27 | + "# Some basic python and jupyter stuff\n", |
| 28 | + "from collections import defaultdict\n", |
| 29 | + "import json\n", |
| 30 | + "from IPython.display import FileLink, FileLinks\n", |
| 31 | + "\n", |
| 32 | + "# Visualization and data libraries\n", |
| 33 | + "import altair as alt\n", |
| 34 | + "import pandas as pd\n", |
| 35 | + "\n", |
| 36 | + "# ATT&CK Python Client, by @HuntersForge (https://github.com/hunters-forge/ATTACK-Python-Client)\n", |
| 37 | + "from attackcti import attack_client\n", |
| 38 | + "\n", |
| 39 | + "# Because this is in Jupyter notebooks we need to enable that renderer for the altair charts to work\n", |
| 40 | + "alt.renderers.enable('notebook')" |
| 41 | + ] |
| 42 | + }, |
| 43 | + { |
| 44 | + "cell_type": "markdown", |
| 45 | + "metadata": {}, |
| 46 | + "source": [ |
| 47 | + "## Get the ATT&CK Enterprise techniques using the client library" |
| 48 | + ] |
| 49 | + }, |
| 50 | + { |
| 51 | + "cell_type": "code", |
| 52 | + "execution_count": null, |
| 53 | + "metadata": {}, |
| 54 | + "outputs": [], |
| 55 | + "source": [ |
| 56 | + "client = attack_client()\n", |
| 57 | + "all_techniques = client.get_enterprise()['techniques'] # Note - this takes a few seconds to download and parse\n", |
| 58 | + "\n", |
| 59 | + "print(\"Got {} techniques\".format(len(all_techniques)))" |
| 60 | + ] |
| 61 | + }, |
| 62 | + { |
| 63 | + "cell_type": "markdown", |
| 64 | + "metadata": {}, |
| 65 | + "source": [ |
| 66 | + "## Analyze the data sources and build a chart to understand the most valuable ones\n", |
| 67 | + "\n", |
| 68 | + "We'll build up a dictionary that counts data sources by the number of techniques they can help detect." |
| 69 | + ] |
| 70 | + }, |
| 71 | + { |
| 72 | + "cell_type": "code", |
| 73 | + "execution_count": null, |
| 74 | + "metadata": {}, |
| 75 | + "outputs": [], |
| 76 | + "source": [ |
| 77 | + "# Collect unique data sources from the techniques\n", |
| 78 | + "techniques_by_source = defaultdict(lambda: {'count': 0, 'techniques': []})\n", |
| 79 | + "\n", |
| 80 | + "# Loop through all techniques, then through all data sources on that technique\n", |
| 81 | + "for technique in all_techniques:\n", |
| 82 | + " for ds in technique.get('x_mitre_data_sources', []):\n", |
| 83 | + " techniques_by_source[ds]['count'] += 1\n", |
| 84 | + " # External_ID in the first external reference is the T#### number\n", |
| 85 | + " techniques_by_source[ds]['techniques'].append(technique.external_references[0].external_id)" |
| 86 | + ] |
| 87 | + }, |
| 88 | + { |
| 89 | + "cell_type": "code", |
| 90 | + "execution_count": null, |
| 91 | + "metadata": {}, |
| 92 | + "outputs": [], |
| 93 | + "source": [ |
| 94 | + "# Create a pandas dataframe out of that result and show count column for the top 15\n", |
| 95 | + "df = pd.DataFrame.from_dict(techniques_by_source, orient='index', columns=['count', 'techniques']).rename_axis('source')\n", |
| 96 | + "top_15 = df.sort_values('count', ascending=False)[0:15]\n", |
| 97 | + "top_15[['count']]" |
| 98 | + ] |
| 99 | + }, |
| 100 | + { |
| 101 | + "cell_type": "markdown", |
| 102 | + "metadata": {}, |
| 103 | + "source": [ |
| 104 | + "## Show the chart in altair\n", |
| 105 | + "\n", |
| 106 | + "Altair can be used to easily turn pandas dataframes into visualizations. In this case, we just show a histogram that you can scan." |
| 107 | + ] |
| 108 | + }, |
| 109 | + { |
| 110 | + "cell_type": "code", |
| 111 | + "execution_count": null, |
| 112 | + "metadata": {}, |
| 113 | + "outputs": [], |
| 114 | + "source": [ |
| 115 | + "df.reset_index()\n", |
| 116 | + "\n", |
| 117 | + "alt.Chart(df.reset_index().sort_values('count', ascending=False)).mark_bar().encode(\n", |
| 118 | + " y=alt.Y(\n", |
| 119 | + " 'source',\n", |
| 120 | + " sort=alt.EncodingSortField(\n", |
| 121 | + " field=\"count\",\n", |
| 122 | + " order=\"descending\"\n", |
| 123 | + " )\n", |
| 124 | + " ),\n", |
| 125 | + " x='count'\n", |
| 126 | + ")" |
| 127 | + ] |
| 128 | + }, |
| 129 | + { |
| 130 | + "cell_type": "markdown", |
| 131 | + "metadata": {}, |
| 132 | + "source": [ |
| 133 | + "## Advanced Filtering (BONUS)\n", |
| 134 | + "\n", |
| 135 | + "How would you alter this chart to only consider some techniques? Maybe (peeking ahead) we have a list of threat actors or techniques we want to prioritize? Can you generate a chart that prioritizes techniques used by APT1 or APT3?" |
| 136 | + ] |
| 137 | + }, |
| 138 | + { |
| 139 | + "cell_type": "code", |
| 140 | + "execution_count": null, |
| 141 | + "metadata": {}, |
| 142 | + "outputs": [], |
| 143 | + "source": [ |
| 144 | + "# TODO: Your code to show a similar chart for APT1 and APT3" |
| 145 | + ] |
| 146 | + }, |
| 147 | + { |
| 148 | + "cell_type": "markdown", |
| 149 | + "metadata": {}, |
| 150 | + "source": [ |
| 151 | + "## Building a Heatmap\n", |
| 152 | + "\n", |
| 153 | + "But what if you only have certain data already, and don't have flexibility to add different ones? That's the case for our exercise! How do you know what techniques you can detect based on that?\n", |
| 154 | + "\n", |
| 155 | + "We can generate a heatmap based on the data we created earlier. We can map the data sources we know we have into the data sources here." |
| 156 | + ] |
| 157 | + }, |
| 158 | + { |
| 159 | + "cell_type": "code", |
| 160 | + "execution_count": null, |
| 161 | + "metadata": {}, |
| 162 | + "outputs": [], |
| 163 | + "source": [ |
| 164 | + "# First, list the data sources alphabetically so we can figure out which ones we have\n", |
| 165 | + "\n", |
| 166 | + "df.sort_index()[['count']]" |
| 167 | + ] |
| 168 | + }, |
| 169 | + { |
| 170 | + "cell_type": "markdown", |
| 171 | + "metadata": {}, |
| 172 | + "source": [ |
| 173 | + "### List the data sources that are available\n", |
| 174 | + "\n", |
| 175 | + "In the list below, add the data sources that we have available in BRAWL. As a reminder, we have:\n", |
| 176 | + "* Sysmon\n", |
| 177 | + "* Windows event logs (common security , authentication, and audit logs)" |
| 178 | + ] |
| 179 | + }, |
| 180 | + { |
| 181 | + "cell_type": "code", |
| 182 | + "execution_count": null, |
| 183 | + "metadata": {}, |
| 184 | + "outputs": [], |
| 185 | + "source": [ |
| 186 | + "# Case sensitive!!!\n", |
| 187 | + "sources_we_have = [\n", |
| 188 | + " '' # e.g. 'Web proxy'\n", |
| 189 | + "]" |
| 190 | + ] |
| 191 | + }, |
| 192 | + { |
| 193 | + "cell_type": "markdown", |
| 194 | + "metadata": {}, |
| 195 | + "source": [ |
| 196 | + "### Calculate out the techniques for which we have some detection capability" |
| 197 | + ] |
| 198 | + }, |
| 199 | + { |
| 200 | + "cell_type": "code", |
| 201 | + "execution_count": null, |
| 202 | + "metadata": {}, |
| 203 | + "outputs": [], |
| 204 | + "source": [ |
| 205 | + "techniques = defaultdict(lambda: 0)\n", |
| 206 | + "\n", |
| 207 | + "for ds in sources_we_have:\n", |
| 208 | + " for technique in techniques_by_source[ds]['techniques']:\n", |
| 209 | + " techniques[technique] += 1" |
| 210 | + ] |
| 211 | + }, |
| 212 | + { |
| 213 | + "cell_type": "code", |
| 214 | + "execution_count": null, |
| 215 | + "metadata": {}, |
| 216 | + "outputs": [], |
| 217 | + "source": [ |
| 218 | + "print(\"You can detect {} out of {} techniques\".format(len(techniques), len(all_techniques)))" |
| 219 | + ] |
| 220 | + }, |
| 221 | + { |
| 222 | + "cell_type": "markdown", |
| 223 | + "metadata": {}, |
| 224 | + "source": [ |
| 225 | + "### Generate the heatmap" |
| 226 | + ] |
| 227 | + }, |
| 228 | + { |
| 229 | + "cell_type": "code", |
| 230 | + "execution_count": null, |
| 231 | + "metadata": {}, |
| 232 | + "outputs": [], |
| 233 | + "source": [ |
| 234 | + "# Generate that heatmap!\n", |
| 235 | + "\n", |
| 236 | + "def technique_score(t):\n", |
| 237 | + " if t not in techniques:\n", |
| 238 | + " return 0.0\n", |
| 239 | + " elif techniques[t] > 1:\n", |
| 240 | + " return 1.0\n", |
| 241 | + " else: # count of sources == 1\n", |
| 242 | + " return 0.5\n", |
| 243 | + "\n", |
| 244 | + "heatmap = {\n", |
| 245 | + " 'version': \"2.1\",\n", |
| 246 | + " 'name': 'Detection Possibilities',\n", |
| 247 | + " 'domain': \"mitre-enterprise\",\n", |
| 248 | + " 'showTacticRowBackground': True,\n", |
| 249 | + " 'gradient': {\n", |
| 250 | + " 'colors': [\n", |
| 251 | + " '#ffffff',\n", |
| 252 | + " '#66b1ff'\n", |
| 253 | + " ],\n", |
| 254 | + " 'minValue': 0.0,\n", |
| 255 | + " 'maxValue': 1.0\n", |
| 256 | + " },\n", |
| 257 | + " 'techniques': [{'techniqueID': t, 'score': technique_score(t)} for t in techniques]\n", |
| 258 | + "}" |
| 259 | + ] |
| 260 | + }, |
| 261 | + { |
| 262 | + "cell_type": "code", |
| 263 | + "execution_count": null, |
| 264 | + "metadata": {}, |
| 265 | + "outputs": [], |
| 266 | + "source": [ |
| 267 | + "# Write as a JSON file and show a download link\n", |
| 268 | + "with open('data_sources.json', 'w') as f:\n", |
| 269 | + " f.write(json.dumps(heatmap))\n", |
| 270 | + " \n", |
| 271 | + "FileLink('data_sources.json')" |
| 272 | + ] |
| 273 | + }, |
| 274 | + { |
| 275 | + "cell_type": "markdown", |
| 276 | + "metadata": {}, |
| 277 | + "source": [ |
| 278 | + "# Overlaying Priorities with Data Sources\n", |
| 279 | + "\n", |
| 280 | + "The reason we collect data is of course to help us detect attacks, so let's see how the data that we've collected measures up.\n", |
| 281 | + "\n", |
| 282 | + "How would you do this?\n", |
| 283 | + "\n", |
| 284 | + "How would you show gaps in data source coverage?" |
| 285 | + ] |
| 286 | + }, |
| 287 | + { |
| 288 | + "cell_type": "code", |
| 289 | + "execution_count": null, |
| 290 | + "metadata": {}, |
| 291 | + "outputs": [], |
| 292 | + "source": [] |
| 293 | + } |
| 294 | + ], |
| 295 | + "metadata": { |
| 296 | + "kernelspec": { |
| 297 | + "display_name": "detection-training", |
| 298 | + "language": "python", |
| 299 | + "name": "detection-training" |
| 300 | + }, |
| 301 | + "language_info": { |
| 302 | + "codemirror_mode": { |
| 303 | + "name": "ipython", |
| 304 | + "version": 3 |
| 305 | + }, |
| 306 | + "file_extension": ".py", |
| 307 | + "mimetype": "text/x-python", |
| 308 | + "name": "python", |
| 309 | + "nbconvert_exporter": "python", |
| 310 | + "pygments_lexer": "ipython3", |
| 311 | + "version": "3.7.1" |
| 312 | + } |
| 313 | + }, |
| 314 | + "nbformat": 4, |
| 315 | + "nbformat_minor": 2 |
| 316 | +} |
0 commit comments