Some days ago, we released the new Data Science and Artificial Intelligence for DFIR Virtual Machine, DAISY, including precooked outputs from different tools and Jupyter Notebooks to apply DS on them. In this documentation page we will explain what is included and try to answer some of the main questions we may have when working with DAISY: What amount of RAM do I need to work with DAISY? What is the minimum required to start playing and learning the basics?
If you only want to know what you can run with different configurations, take a look at demo section
First of all, it is important that you know we have 2 different versions of DAISY in the downloads section. The demo version brings some precooked evidence and notebooks for you and it is perfect to learn and play, while the production version is completely clean and a better option for your production environment. Here we will be explaining the demo version of DAISY, which contains outputs from the following evidence:
One disk of the following cases:
These evidence have been used to get a precooked output/notebook file for the following tools:
All these precooked outputs can be found under /mnt/data/Precooked/. Here you will find a folder for each case, containing different outputs ready to be used by the precreated Jupyter Notebooks and other tools:
The precooked notebooks can be found under /opt/ds4n6/anaconda3/Notebooks/. Here we will find two folders:
In both versions (production and demo), DAISY is preconfigured with 8GB of RAM. This is not an arbitrary decision, as this is the minimum recommended to load all the precooked evidence and play with the Data Science, and also this is the minimum required by other tools such as TimeSketch. Moreover, 8GB of RAM are not enough to load the outputs of some tools, such as the plaso ones when running all the parsers in a whole evidence, so, in these cases, DAISY includes a reduced version of the output files to be used with 8GB. In the same way, all the demo notebooks are ready to load the reduced evidence if necessary.
Does it mean we cannot use DAISY if we have less than 8GB of RAM available for the VM? Absolutely not, it only means you won't be able to run all the tools and prepared notebooks. Let's take a look to the demo evidence.
The good news is that almost all of the precooked outputs can be used with only 4GB of RAM. So, actually, this is the minimum recommended to learn Data Science and play with DAISY. It is important to remember that, if you don't shutdown the kernels of the notebooks you run, the available RAM will decrease and you won't be able to run all the notebooks, so it is a good practice to shutdown/restart the kernel when you stop working with a notebook.
Here you have a table with information of processing the evidence with 4GB of RAM. The processing time of each notebook will be about one minute for a laptop with the DAISY standard configuration:
Notebook Template | Works | Evidence Case | Tool | Output Size |
---|---|---|---|---|
plaso-evtx | X | Szechuan | plaso | 16MB |
volatility | X | Szechuan | volatility | 101MB |
kape | X | Szechuan | kape | 35MB |
autoruns | X | Szechuan | autoruns | 1.1MB |
fls | X | Ah2-polivio | fls | 104MB |
mactime | X | Musctf19 | mactime | 184MB |
plaso (reduced output) | Musctf19 | plaso | 489MB | |
plaso | Musctf19 | plaso | 2.6GB |
So, what about plaso? As you can see in the table above, the plaso output size is 2.6GB (remember we got the outputs from running all the parsers in the whole disk), so we have created a reduced version of 489MB that can be used with 8GB of RAM. If you want to use the full plaso file, you will need 14GB of RAM in your VM.
As a final tip for the notebooks, with 8GB of RAM you can run all the notebooks, except for the full plaso one, without any kernel restart.
We also have a precooked file for TimeSketch, so you can create your demo investigation timeline to try TimeSketch and Picatrix functions. The file can be found in the Demo version in: /mnt/data/Precooked/Szechuan/szechuan_dc01_plaso_log2timeline_reduced.csv
As you can see, 8GB of RAM is not enough to load a csv file after running all the Plaso parsers in the whole disk, so we have created a reduced version of the Plaso output. With this file and 8GB of RAM, you can create the investigation, run analysis and use all the TimeSketch features.