Quantcast
Channel: Digital Forensics – Forensic Focus – Articles
Viewing all 196 articles
Browse latest View live

Digital Forensics Resources

$
0
0

by Scar de Courcier

One of the most frequent questions I’m asked by digital forensics students is about resources: where can they go to continue learning, where can they find out more about the industry, what are the best blogs and social accounts out there for DFIR people?

The below is by no means an exhaustive list, but here are some of the places I get my computer forensics news from, which you might find helpful.

Forensic 4:Cast’s list of award nominees is a good place to start if you’re looking for inspiration. Forensic Focus is on there twice this year, and the other nominees are also well worth a look. Cindy Murphy’s blog over at Gillware.com is a popular one among practitioners; Phill Moore’s This Week In 4n6 rounds up the latest industry news every week; DFIR.training has a whole load of handy resources and a new social network for the community; and About DFIR is a compendium project which aims to bring together everything you need in one space.

Oleg Skulkin, who co-wrote Windows Forensics Cookbook with me last year, is one of the minds behind Cyber Forensicator, a digital forensics blog which publishes updates regularly and is a great place to check out new papers and articles.

As one of the most well-known names in digital forensics, Harlan Carvey’s blog is unsurprisingly very popular among practitioners and students alike. Over at 4N6IR, James posts mainly about Windows forensics, with helpful in-depth articles featuring step-by-step guides along with screenshots.

The aptly named blog A Fistful Of Dongles is Eric Huber’s place to discuss everything DFIR-related. I’ve been surprised over the years at just how many dongles and wires I’ve accumulated in the service of digital forensic investigation; after a while it gets kind of addictive.

If like me you’re a bit of a Kali fan, Blackmore Ops will be an invaluable resource; I have so many of their pages bookmarked that I might as well just commit the whole site to memory.

On the vendor side, Magnet Forensics have an excellent blog with frequent posts from Christa Miller, founder Jad Saliba and more. BlackBag Tech’s blog provides a great window into the industry, and they often focus on a specific element across a series of posts, so keep an eye out for those too. AccessData’s blog focuses quite heavily on their own releases, but you’ll also find the occasional gem discussing recent trends in the industry.

The digital forensics community is also very active on Twitter: following the #digitalforensics and #DFIR hashtags will help you stay up to date. I’ve also put together a list of digital forensics practitioners on Twitter, which may be helpful.

Facebook and Instagram aren’t so well-frequented by the community, although there are still a few spaces you might want to check out. I’m a member of the Digital Forensics and Cyber Security – Digital Forensics – Ethical Hacking groups, although both of those tend to be more cybersecurity focused than specifically digital forensics based.

If you’re studying at the moment and looking to get into the industry, why not start your own blog? Getting your writing out there is one of the best ways to receive extra feedback on your projects and get to know others in the field. It’ll also put you on the radar of some of the companies in the area, and who knows where that might lead?

What are your favourite online digital forensics resources?


Apple iPhone Forensics: Significant Locations

$
0
0

by Patrick Siewert, Principal Consultant, Pro Digital Forensic Consulting

I recently attended a conference of civil litigators in Virginia. During the cocktail hour and after a very interactive CLE presentation on “Leveraging Data in Insurance Fraud Investigations”, I was talking with a few attendees about the different types of data available to them in their investigation and litigation of insurance fraud claims. Admittedly, I was taken aback when one of the attorneys mentioned to me the “Significant Locations” that are logged on iPhones and showed me the locations on his. This is probably because I have most (or all) location services turned off on my personal device, so I’d never given it much thought. However, the conversation brought up the question, are these artifacts available through forensic data extraction and analysis? And if so or if not, how do we access them? What value might they serve in both criminal and civil investigations?

For the extraction, testing and exhibits illustrated here, we used an iPhone 5s running iOS v. 11.2.6. Cellebrite Physical Analyzer v. 7.5 was used for the extraction and analysis. As mentioned later, location services must be turned ON with the device in order for this information to be logged, as detailed in the UFED Device Extraction Info below.

Where & What Are “Significant Locations?”

The first step is to identify where and what “Significant Locations” are. The artifact is available to view on the device at Settings>Privacy>Location Services>System Services>Significant Locations (see below)

If location services are turned OFF, the significant locations data will not be logged and therefore unavailable. Interestingly, to access Significant Locations on the device, the passcode or Touch ID must be entered, as shown below.

As we should all know by now, we need to obtain the passcode in some way (consent, court order, Gray Key, etc.) in order to facilitate data extraction in iOS 11 regardless, so while this may seem like an obstacle, it’s just another reason to obtain the passcode.

Upon accessing Significant Locations, a disclaimer is present, which reads as follows.

The final sentence that the Significant Locations are encrypted already gives us a clue about whether or not UFED will be able to parse this data, but more on that a little later.

What’s Inside Significant Locations?

Once accessed, the Significant Locations are presented as a list, shown here:

Some interesting things of note about these particular locations: This device doesn’t travel much. The 13 locations logged in Henrico (Richmond/Midlothian), VA are related to the home location(s) of the device, which is already good information to have in the course of an investigation. The device visits Williamsburg, which is the reason for the listings for that location. All of the remaining locations are related to a trip from April, 2018 to and from Richmond, VA to Cincinnati, OH. The device stopped in Beaver, WV and Beckley, WV. Covington, KY is across the Ohio River from Cincinnati, where a dinner stop was made. A stop in Fishersville, VA was made to get gas on the way back from Cincinnati. Essentially, we have a road map of the trip to and from Cincinnati.

Further inspection of the locations where there are multiple listings reveals even more detail about where the device has been, as shown here in the Richmond, VA area:

And even more as shown here in the Cincinnati, OH area:

What’s most interesting about these artifacts is that at no time was the device connected to any wireless networks in either location, save one in the Mt. Adams section of Cincinnati. Yet in some instances, the business name and/or street address is listed in the log.

UFED Extraction & Access To “Significant Locations”

An Advanced Logical (option 1) encrypted extraction was conducted in Cellebrite UFED Physical Analyzer v. 7.5 to see if this data would be available through mobile forensic data extraction. When the names of the locations were searched globally in the case, no results were presented. When the term “Significant” was searched globally in the case, the following artifacts were located at var/root/library/caches/locationd:

The highlighted .plist files were exported and opened in XCode on a Mac system. Each of these artifacts did not present any data that was readily identifiable as useful. Is it possible that these artifacts are encoded within the extraction data and could therefore be located? Sure, but for the purposes of this article, those measures were not undertaken. As these artifacts are behind a double security wall (main passcode, then re-entry of the passcode to access Significant Locations on the device), it is logical to conclude that they are not accessible through mobile forensic data extraction (i.e., encrypted).

How Does This Help Your Case?

To recap, we located the Significant Locations on the device and performed a data extraction and it appears that these locations are not part of any readable portion of that data. So how can we best incorporate this data into our investigations to add value? Unfortunately, the best answer is the “old fashioned way”. Access the device, navigate to “Significant Locations” and document each entry through photographs (NOT screen shots). Depending on the level of usage of the device, this can be tedious and time-consuming, but the value of the data cannot be overlooked.

In criminal cases, this data can help put the device in locations where the suspect may have been (or not have been) during the time of the incident. It can also help identify home locations and frequently visited locations, which can increase investigative leads, present additional accomplices, serve to impeach statements already made and more. Naturally, accessing the device is key. It bears noting that the “Significant Locations” data, combined with cellular provider call detail records could help paint a more thorough picture of the device location and/or movements than either one or the other alone.

In civil litigation, this data can be used in much the same way, but more likely to prove or disprove frequent locations, known associates (paramours, accomplices, etc.), and to help confirm or refute deposition or trial testimony. If your case involves insurance fraud and the claimant says that he cannot travel, this data helps refute that statement without the need to obtain cellular carrier records. But again, ideally we would couple this data with cellular location data to paint a more complete picture of the device usage patterns.

A couple of final notes about the existence of this data. First, it can be deleted. Note in the image above the option to “Clear History” is present and if the user selects this, the logging will be reset. It also appears (from checking a separate device with this logging turned on) that the data is stored for approximately 6 months. It is unknown whether or not the data would transfer from an older device to an upgraded device as further testing would need to be conducted. Finally, it is also unknown whether or not this data would be more readily accessible through mobile forensic data extraction on a jail-broken device.

Conclusions

This data is a proverbial gold mine, but it’s one we need to access in ways we generally don’t like to – by manipulating the device and accessing the UI. However, this is still a valid form of analysis and documentation, especially when the access limitations on iOS devices forces us to use tools and techniques other than those that are automated. As with most things in forensics, simply knowing where to look, how the data got there and how to best utilize the data to confirm or refute the other aspects of your case is (about) half the battle. We all know Google, Apple and the cellular carriers are tracking us. Let’s start using that data to help serve justice, no matter what we’re investigating!

About The Author

Patrick Siewert is the Principal Consultant of Pro Digital Forensic Consulting, based in Richmond, Virginia. In 15 years of law enforcement, he investigated hundreds of high-tech crimes, incorporating digital forensics into the investigations, and was responsible for investigating some of the highest jury and plea bargain child exploitation investigations in Virginia court history. Patrick is a graduate of SCERS, BCERT, the Reid School of Interview & Interrogation and multiple online investigation schools (among others). He is a Cellebrite Certified Operator and Physical Analyst. He continues to hone his digital forensic expertise in the private sector while growing his consulting & investigation business marketed toward litigators, professional investigators and corporations, while keeping in touch with the public safety community as a Law Enforcement Instructor.

Email: ProDigitalConsulting@gmail.com
Web: www.ProDigital4n6.com
Linked In: https://www.linkedin.com/company/professional-digital-forensic-consulting-llc
Twitter: @ProDigital4n6

Techno Security Myrtle Beach 2018 – Recap

$
0
0

by Scar de Courcier

This article is a recap of some of the main highlights from the Techno Security & Forensic Investigation Conference 2018, which took place in Myrtle Beach, SC from the 3rd-6th June 2018.

Under the sunny skies of South Carolina, the digital forensic community got together at the beginning of June this year to discuss topics ranging from international espionage to the admissibility of evidence obtained from the cloud. The conference was split into several streams: audit / risk management; forensics; information security; and investigations. There were also labs run by Cellebrite and Magnet Forensics, and various sponsor demos throughout the conference. The exhibition hall was open at various points throughout the day, allowing attendees to meet representatives from universities, forensics companies and law enforcement agencies and discuss current industry trends.

The first session Forensic Focus attended was conducted by Richard Spradley from Whooster, who was discussing how to decode investigative data in real-time. Spradley talked about how VOIP and burner phones are the hardest devices to investigate, but there are ways of identifying people using such phone numbers. Often a person will use a burner phone for more than one thing; while they might not use it to call their friends, they may place a personal ad, for example. Geographical identifiers are also important and may be able to give you a back door into a phone, especially if you have a partial name or frequently used alias.

Mark Spencer from Arsenal Consulting then spoke about what happens when things go wrong in a digital forensic investigation, particularly in a high stakes case. Attendees discovered the full story behind the forged digital forensics report which was discussed in our forums last year: a fascinating and definitely high-stakes investigation! The main takeaway? Timelines can lie to you. It is possible, in certain cases, that every timestamp has been forged and there is no ‘hidden’ timestamp that will help you in these situations.

Yulia Samoteykina and Mokosiy from Atola discussed the need for speed in digital investigations, and demonstrated how their new Atola TaskForce tool can help to ease the pain of large-scale investigations. They quoted the results of Forensic Focus’ 2015 survey, specifically the response to the question ‘What is the biggest challenge facing digital forensic investigators today?’

The proliferation of devices and the number of damaged drives investigators are having to look at are both important challenges in digital forensics. It was interesting to see Atola’s latest offering and its ability to address these issues, particularly for cases that require very quick turnaround times.

The keynote address on the second day of the conference was by Roman Yampolskiy, who looked at AI and its implications for the future of cybersecurity. Sticking with the subject of new advances in technology, Jerry Diamond from MSAB discussed drone forensics and some of the unique challenges of extracting data from drones.

Admissibility of evidence from the cloud is something that affects law enforcement agencies around the world, and in the afternoon on Monday a panel session convened to discuss this topic. One of the main areas of concern is that case law is being developed as we go along, so it can be hard to understand what is and what is not allowed to be admitted as evidence. Consent is another issue: if a suspect won’t give you access to their device but their spouse gives you access to the cloud account to which they know the password, will that stand up in court? The concensus seemed to be that it generally would, especially if the cloud account was shared by both parties, but there were questions around exactly what could be gathered from the cloud without compromising investigative integrity.

John Wilson from Discovery Squared presented an interesting talk about investigations involving Bitcoin and other cryptocurrencies. While these are in theory anonymous, it can sometimes be possible to trace a trail and end up with more information than you might have expected.

Abdul Hassan from the International Counter Terrorism Forensics Foundation opened the day on Tursday with an Early Riser Session about counter terror forensics. International law was a big point for consideration in this session: terrorists know where INTERPOL faces restrictions and they deliberately locate their servers in these territories in an attempt to foil investigations.

Magnet Forensics’ Jessica Hyde then ran an invigorating session about using operating systems, memory and other artifacts to piece together elements of an investigation. There will be a webinar on the subject later this month – watch this space!

After lunch, retired SSA FBI Bob Osgood talked attendees through the investigation into Robert Hanssen, an FBI agent who was also working as a Russian spy. Digital forensics were instrumental in his arrest and eventual conviction: the final nail in the coffin was his PDA, which contained notes in which he’d written the locations of the drop-offs for the Russians.

The final day of the conference began with Amber Schroader from Paraben demonstrating some of the key challenges in smartphone investigation, and how they can be eased with comprehensive investigative procedures and intelligent outsourcing. Wednesday ended with a fascinating session about how deep learning techniques can be used to detect indecent images and videos of children, and some attendees dispersed while others stayed on for the training sessions which were taking place on Thursday.

The next Techno Security & Forensic Investigation conference will take place in Texas in September – register here.

Deep Learning At The Shallow End: Malware Classification For Non-Domain Experts

$
0
0

by Quan Le, Oisín Boydell, Brian Mac Namee & Mark Scanlon

Abstract

Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification.

1. Introduction

In law enforcement agencies throughout the world, there are growing digital forensic backlogs of unimaged, unprocessed, and unanalyzed digital devices stored in evidence lockers [1]. This growth is attributable to a several compounding factors. The sheer volume of cases requiring digital forensic processing extends far beyond digitally executed crimes such as phishing, online sharing of illicit content, online credit card fraud, etc., to “traditional” crimes such as murder, stalking, financial fraud, etc. The volume of data to be analysed per case is continuously growing and there is a limited supply of trained personnel capable of the expert, court-admissible, reproducible analysis that digital forensic processing requires.

In order to address the latter factor, many police forces have been implementing a first responder/triage model to enable onsite evidence seizure and securing the integrity of the evidence gathered [2]. These models train field officers in the proficient handling of digital devices at a crime scene enabling the available expert digital investigators to remain in the laboratory processing cases. In this model, the first responders are not trained in the analysis or investigation phase of the case, but can ensure the integrity and court-admissibility of the gathered evidence.

While physical resourcing in terms of hardware, training first responders, and increased numbers of expertly skilled personnel can increase an agency’s digital forensic capacity, the digital forensic research community has identified the need for automation and intelligent evidence processing [3]. One of the more labor intensive and highly-skilled tasks encountered in digital forensic investigation is malware analysis. A common technique for analyzing malware is to execute the malware in a sandbox/virtual machine to gain insight to the attack vector payload installation, network communications, and behavioral analysis of the software with multiple snapshots taken throughout the analysis of the malware lifecycle. This is an arduous, time-consuming, manual task that can often span over several days. A survey of digital forensic examiners conducted by Hibshi et al. [4] found that users are often overwhelmed by the amount of technical background required to use common forensic tools. This results in a high barrier to entry for digital investigators to expand their skillset to incorporate additional topics of expertise, such as malware analysis.

Artificial Intelligence (AI) combined with automation of digital evidence processing at appropriate stages of an investigation has significant potential to aid digital investigators. AI can expedite the investigative process and ultimately reduce case backlog while avoiding bias and prejudice [5]. Overviews of the applications of AI to security and digital forensics are provided in [6] and [7]. A number of approaches have been implemented to aid digital forensic investigation through AI techniques [8, 9], automation [10], and big data processing [11].

1.1. Contribution of this Work

The contribution of this work can be summarized as:

  • An overview of existing techniques for malware analysis from a manual and automated perspective.
  • An approach to enable malware classification by malware analysis non-experts, i.e., no expertise required on behalf of the user in reverse engineering/binary disassembly, assembly language, behavioral analysis, etc.
  • Without using complex feature engineering, our deep learning model achieves a high accuracy of 98.2% in classifying raw binary files into one of 9 classes of malware. Our model takes 0.02 seconds to process one binary file in our experiments on a regular desktop workstation; this short processing time is of potential practical importance when applying the model in reality.
  • Our 1 dimensional representation of a raw binary file is similar to the image representation of a raw binary file [12]; but it is simpler, and it preserves the sequential order of the byte code in the binaries. The sequential representation makes it natural for us to apply the Convolutional Neural Network – Bi Long Short Term Memory architecture (CNN-BiLSTM) on top of it; helping us achieve better performance than using the CNN model alone.

2. Literature Review/State of the Art

There is a growing need for non-expert tools to perform digital evidence discovery and analysis [3, 13]. Due to the increasing delays in processing digital forensic evidence in law enforcement agencies throughout the world, there has been a focus in the digital forensic research and vendor communities in empowering the non-expert case detective to perform some preliminary analysis on the gathered evidence in a forensically sound manner [14]. To this end, the Netherlands Forensic Institute (NFI) have implemented a Digital Forensics as a Service solution to expedite digital forensic processing [15]. This system facilitates the case officer in uploading evidence to a private cloud-based system. Preliminary preprocessing takes place and the officer is able to browse the evidence to unearth potentially case-progressing information.

2.1. Digital Forensic Backlog

Storage capabilities are increasing exponentially while cybercrime related court cases are being dismissed. According to Ratnayake et al. [16], the likelihood of a prosecution can be lessened due to the uncertainty in determining the age of a victim portrayed in a digital image. Their work considered a parallel challenge to age estimation which was to scan the sheer surface of disk drives. They are aware of the backlog that is eminent due to the lack of both relevant experts to analyze an offense and a laborious digital forensic process. Per Scanlon [1], these factors will continuously influence the throughput of digital forensic laboratories; therefore, hinder digital forensic investigators in the future.

2.2. Machine Learning for Malware Analysis

Machine learning offers the ability to reduce much of the manual effort required with traditional approaches to malware analysis, as well as increased accuracy in malware detection and classification. In the context of malware analysis, a machine learning model is trained on a dataset of existing labeled malware examples, with the labeling either in terms of malicious or benign in the case of binary classification, or in terms of the type or family of malware for multi-class classification. In either case, the model learns the differentiating features between the classes and so is able to infer, for a new and previously unseen example, whether it is malicious or benign, or which malware family it belongs to with a certain degree of accuracy.

Of course there are many different types and variations of machine learning algorithms and the training examples can be represented in many different ways, which all influence the classification accuracy of the resulting model. Research in the field generally involves the evaluation of different machine learning algorithms and approaches, in conjunction with different and novel types of features derived from the data. Many different approaches have been proposed and a comprehensive review of the literature is provided by both Ucci et al. [17] and Gandotra et al. [18].

In the next section, we focus specifically on approaches based on deep learning (a type of machine learning) as these are most related to our work. However, the types of features used and how they are extracted in the general context of machine learning for malware classification is also of key relevance. Machine learning reduces much of the manual effort required with traditional approaches to malware analysis by automatically learning to differentiate between malicious or benign, or different families of malware. However, the analysis and extraction of the features from the data, over which the machine learning model operates, still requires a high level of domain expertise in conjunction with complex and time consuming processes.

There are two families of features used in malware analysis: those which can be extracted from the static malware bytecode, and those which require the malware code to be executed (typically in a sandbox environment). Static features include information such processor instructions, null terminated strings and other static resources contained in the code, static system library imports, and system API calls, etc. Features derived from executed code capture how the malware interacts within the wider operating system and network and can include dynamic system API calls and interactions with other system resources such as memory, storage and the wider network, e.g., connecting to external resources over the Internet.

Although dynamic features extracted from executed code are generally more time and computational resource consuming to extract than features from the static code, both cases require specialist tools and software environments – not to mention a high level of domain expertise required to understand and extract them. The core benefit of our approach, which we present in detail in Section 3, is that our deep learning model requires only the raw, static bytecode as input with no additional feature extraction or feature engineering.

Before moving on to review general deep learning approaches for malware classification in the next section, we first discuss two machine learning approaches which attempt to make use of the raw, static bytecode in a way which has some similarities to our work. Nataraj at al. [12] interpret the raw bytecode as greyscale image data where each byte represents a greyscale pixel, and they artificially wrap the byte sequence into a two dimensional array. They then treat the malware classification task as image classification by applying various feature extraction and feature engineering techniques from the image processing field, and use machine learning over these. Inspired by this approach, Ahmadi et al. [19] use a similar representation of the data, and they evaluate this technique using the same dataset with which we evaluate our work, however they do not make use of deep learning. We provide a comparison of classification accuracy to our approach in Section 4.1. The application of image classification techniques to the malware domain however still requires the use of complex feature extraction procedures and domain expertise in their application.

2.3. Deep Learning for Malware Classification

Deep Learning ([20],[21]) is a machine learning approach that has experienced a lot of interest over the last 5 years. Although artificial neural networks have been studied for decades, recent advances in computing power and increased data volumes have enabled the application of multi-layer neural networks (deep neural networks) to large training datasets, which has resulted in significant performance improvements over traditional machine learning techniques. Deep learning is now responsible for the state-of-the-art in many different machine learning tasks on different types of data, e.g., image classification [22] and natural language understanding and translation [23]. Malware classification has also attracted the attention of deep learning researchers.

The majority of deep learning approaches applied to malware classification involve training deep neural networks over the same types of extracted features on which traditional machine learning approaches are applied. These features require specialist knowledge and tools to generate and usually involve either the parsing or disassembly of the malware binary or running the malware in a sandbox environment and logging and analyzing the process execution and process memory, i.e., what the executed binary actually does [24]. We survey various applications of deep learning to malware classification from the perspective of which types of data and features are used.

2.3.1. Features from Static Code

Saxe and Berlin [25] present a deep feed forward neural network for binary malware classification that is trained on various features extracted from the static malware binary: system library imports, ASCII printable strings, metadata fields in the executable as well as sequences of bytes from the raw code. All these features require further processing and are then fed into a four layer feed forward network.

Hardy et al. [26] propose the DL4MD framework (Deep Learning Framework for Intelligent Malware Detection), which is trained over API calls extracted from malware binaries. An API database is required to convert the call references from the format they are extracted from the code in to a 32-bit global ID representing the API call. These features are then used as input to a deep learning architecture based on stacked autoencoders.

Davis and Wolff [27] discuss an approach whereby they apply a convolutional neural network for binary classification to disassembled malware byte code. The raw disassembled code is further processed to generate a more regularized set of features. For example, they extract the individual x86 processor instructions, which are variable length, and then apply padding or truncation to create fixed length features. They also parse the disassembled code to extract code imports, which they use to generate a further fixed length feature vector for each example.

All the aforementioned approaches require differing degrees of in-depth analysis of the disassembled code to extract domain specific features, which are then fed into various deep learning architectures. A key differentiator of our approach is that we do not require any domain specific analysis or parsing of the raw malware executable byte code. Our deep learning architecture requires no additional information as to the meaning of the raw data, or how it should be interpreted by the neural network. Although we do still need to normalize the length of the input data, as this is a basic requirement of the deep learning architecture we use, we do so at the entire raw malware file level and we use a context independent method to achieve this as described in Section 3.2.

Our methodology eliminates the need for complex feature engineering requiring expert domain knowledge and tools such as disassemblers, is not limited only to malware compiled for a specific processor or operating system, and the deep neural network is able to learn complex features directly from the data rather than being constrained to those engineered by human experts.

2.3.2. Features Extracted from Executed Code

As well as deep learning based malware classification based on features from parsed and disassembled static malware code as summarized above, many approaches also make use of features derived from running the malware in a sandbox environment and analyzing the behavior of the running process. Although the key advantage of our methodology is that it only requires the raw malware byte code as input, we also include the following summary of these alternative approaches.

As with more traditional machine learning based malware classification, the use of system API calls logged from running malware processes are a popular source of input features. Dahl et al. trained neural networks of between one and three hidden layers on features generated from system API calls as well as null terminated strings extracted from the process memory [28]. A random projection technique was used to reduce the dimensionality of the features to that which was manageable by the neural network. Huang and Stokes [29] propose an alternative deep learning architecture which uses similar features, however their model addresses multi-task learning in which the same deep learning architecture provides both a binary malware/benign classification as well as a classification of the malware type.

David and Netanyahu apply a deep belief network (DBN) to log files generated directly by a sandbox environment. This captures API calls as well as other events from the running malware as a sequential representation [30]. Similarly, Pascanu et al. [31] apply a Recurrent Neural Network (RNN) to event streams of API calls, in this case encoded as 114 higher level events by a malware analysis engine. The use of an RNN captures the relationships of these events across time, and is similar in function to the LSTM component of our deep learning architecture. However we use it to capture the positional relationship of patterns within the static malware bytecode file rather than temporal relationships.

Kolosnjaji et al. [32] propose a similar deep learning architecture to our methodology, which is also based on CNN and LSTM layers. However, the input data are sequences of system API calls extracted using the same sandbox environment as used by David and Netanyahu’s approach discussed above [30]. The CNN layers capture local sequences of API calls, whilst the LSTM layers model the relationships between these local sequences across time. In our approach, since we do not require the actual execution of the malware code, the CNN layers instead capture local sequences and patterns within the bytecode on a spatial level, and the LSTM layers model their longer distance relationships throughout the file.

Rather than using simple API call sequences, Tobiyama et al. [33] use a more detailed representation of malware process behavior. They record details of each operation such as process name, ID, event name, path of current directory in which the operation is executed, etc. They then apply a RNN to construct a behavioral language model from this data, whose output is converted into feature images. A CNN is then trained over these feature images to produce binary malware/benign classifications. As with the previously outlined approaches that use features extracted from executing the malware code, the process required to collect the data is complex and time consuming. In this particular case, each malware or non-malware example was executed and analyzed for 100 minutes (5 minutes of logging, followed by 5 minutes interval and this was repeated 10 times). A complex sandbox environment setup was also needed, which is likely to have been another factor which resulted in a limited evaluation dataset being generated – only 81 malware and 69 benign examples.

In real world scenarios, malware defense systems that utilize machine learning based malware classification must be able to adapt to new variants and respond to new types of malware. If the approach requires complex, time and resource consuming processes to extract features required for the machine learning model, this will adversely impact the usefulness of the solution. This is a key motivation for our approach and so we focus on using only the static, raw malware bytecode with minimal data preprocessing.

Before we describe our methodology in detail in the next section, we will conclude our literature review with two approaches that are most similar to our methodology. Raff et al. [34] describe a very similar motivation for their deep learning approach for malware classification – the need to remove the requirement for complex, manual feature engineering. Similar to our work, they focus on the raw malware bytecode and the application of deep learning techniques directly to this data. However, when faced with the challenge of how to work with such long sequences of bytes, they took a different approach which involved designing an atypical deep learning architecture that could handle such long input sequences. Our solution, on the other hand, is to simply use a generic data scaling approach (down sampling) as a pre-processing step, after which a more standard deep learning architecture can be applied. Although this approach, which by its nature reduces the detail in the data, might intuitively be thought of as resulting in drastically reduced classification accuracy, we show through evaluation that sufficient signal remains in the data for the deep learning network to exploit and achieve very high accuracy levels.

Finally, motivated by Ahmadi’s work [19], and with similarities to [12], Gibert [35] applied a CNN to malware bytecode represented as 2 dimensional greyscale images. A similar down sampling approach as we employed was applied to normalize the size of each sample to a 32 x 32 pixels. The key differences with our approach is that we use the raw malware bytecode in its original one dimensional representation (we don’t artificially wrap the byte sequence to create 2D representation), and we preserve more detail by down sampling the data to 10,000 bytes rather than 1,024 (32 x 32). In terms of deep learning architectures, we utilize LSTM layers on top of CNN layers in order to capture relationships among local patterns across the entire malware sample. We used the same evaluation dataset and experimental setup as the work by Gilbert so we could directly compare approaches, and we observed a significant increase in classification accuracy with our approach which we present in more detail in Section 4.1.

3. Methodology

In this section, we describe our deep learning based approach for malware classification in detail, including the dataset we used for our experiments, data preprocessing, deep learning architectures, and experimental design.

3.1. Dataset

For our experiments, we used the malware data from the Microsoft Malware Classification Challenge (BIG 2015) on Kaggle [36]. Although the Kaggle challenge itself finished in 2015, the labeled training dataset of 10, 868 samples is still available and represents a large collection of examples classified into malware classes, as shown in Table 1. As well as being able to use this data to both train and evaluate our own deep learning approaches, the Kaggle challenge still allows the submission of predictions for a separate unlabeled test set of 10, 873 samples for evaluation.

Each labeled malware example consists of a raw hexadecimal representation of the file’s binary content, without the PE header (to ensure sterility). In addition, a metadata representation is also provided, which includes details of function calls, embedded strings, etc., which were extracted using a disassembler tool. As the focus of our work is the application of deep learning techniques to classify malware based on the raw binary file content, we only consider the raw hexadecimal file representations, and convert it to its binary representation.

3.2. Data Pre-processing

One of the benefits of deep learning over other machine learning techniques is it’s ability to be applied over raw data without the need for manual and domain specific feature engineering. This is a key motivation for our work – the ability to efficiently classify malware without requiring specialist expertise and time consuming processes to identify and extract malware signatures. To parallelize the computation in training and testing the models efficiently, our deep learning approach requires that each file be a standard size, and in the case of malware the file size is highly variable, as shown in Figure 1.

In addition to having the same size, from a computational perspective, our deep learning methods require that this size is constrained so as to keep the model training process practical using standard hardware. There are a number of options we could have taken to standardize the file size including padding and truncation, however we design our deep learning models to identify and detect common patterns and structure within the malware file data; hence we want to preserve the original structure as much as possible. To this end, we used a generic image scaling algorithm, where the file byte code is interpreted as a one dimensional ‘image’ and is scaled to a fixed target size. This is a type of lossy data compression. However, by using an image scaling algorithm, we aim to limit the distortion of spatial patterns present in the data. Compared to approaches of converting a malware binary file to a 2D image before doing classification, our approach is simpler since we do not have to make the decision about the height and width of the image.

Also converting a binary file to a byte stream preserves the order of the binary code in the original file, and this sequential representation of the raw binary files makes it natural for us to apply a recurrent neural network architecture to it. In our experiments that follow, we scale each raw malware file to a size of 10,000 bytes using the OpenCV computer vision library [37] – i.e. after the scaling one malware sample corresponds to one sequence of 10, 000 1-byte values.

Figure 2 shows a number of example malware files which have been scaled using this approach, and then represented as two dimensional greyscale images (one byte per pixel), where the images are wrapped into two dimensions purely for visualization purposes. The spatial patterns in the data both on a local scale and on a file level are visible and it is these raw features and patterns that our deep learning architecture is designed to exploit.

3.3. Deep Learning Architectures

We utilize different deep learning architectures for our experiments. We first apply multiple convolutional neural layers (CNNs) [38] on the 1 dimensional sequential representation of the file. Since convolutional neural layers are shift invariant, this helps the models capture 1 dimensional spatial patterns of a malware class wherever they appear in the file.

On top of the convolutional layers, we apply two different approaches. In our first model, we connect the outputs of the convolutional layers to a dense layer, then to the output layer with a softmax activation to classify each input into 1 of the 9 classes of malware, as shown in Figure 3. This CNN-based approach classifies the 1 dimensional representation of the binary file using local patterns of each malware class, and is the dominant and very successful neural network architecture in image classification [39].

For the second and third models, we apply recurrent neural network layers, the Long Short Term Memory module (LSTM) [40], on top of the convolutional layers, before feeding the output of the recurrent layer to the output layer to classify the input into 1 of the 9 malware classes. Our rationale behind this approach is that since there are dependencies between different pieces of code in a binary file, a recurrent layer on top of the CNN layers will help to summarize the content of the whole file into 1 feature vector before feeding it to the output layer. In model 2, CNN – UniLSTM, we apply 1 forward LSTM layer on top of the convolutional layer, where the connecting direction of the cells in the LSTM is from the beginning to the end of the file, as shown in Figure 4. But since the dependency between code in a binary file does not go only in 1 direction, we design our third model, CNN-BiLSTM, where we connect the outputs of the convolutional layers to 1 forward LSTM layer and 1 backward LSTM layer. The outputs of the two LSTM layers are then concatenated and fed to the output layer, as can be seen in Figure 5.

3.4. Experiment Protocol

Since we only have the labels of the malware files on the training set of the Kaggle challenge, except for the final step of submitting the predictions on the test set to the Kaggle website each of the experiments results we report here are measured on this set of samples. For simplicity, we will refer to the training set of the Kaggle challenge as the main dataset.

After the preprocessing step we have 10,860 labeled samples in our dataset. Since this is a not a significantly large number, to achieve a more robust accuracy measure we use five fold cross-validation. The dataset is shuffled and divided into 5 equal parts, each with roughly the same class distribution as the main dataset. For a chosen deep learning configuration, we set each of the 5 parts as the left out part, train one model on the other 4 parts and record the predictions for samples in it. We then assemble the predictions for all 5 parts and use them to compute the performance of the chosen deep learning configuration.

The distribution of the classes in the dataset is highly imbalanced, with the number of samples per class ranging from 42 samples for the class Simda to 2,942 samples for the class Kelihos_v3.Besides using the micro average classification accuracy to report the performance of a model, we also assess the performance of a model by its macro-averaged F1-score for each of the classes. The F1-score reports the performance of the model on any one class as the harmonic mean of the precision and recall on that class, and the macro average F1-score will treat the performance on each class of equal importance.

We take one additional step to address the class imbalance problem. In one training step of a deep learning model, a batch of chosen size, e.g. 64 samples, will be drawn from the training data, then the forward computation and the backward propagation is used to modify the weights of the model toward better performance. The default sampling mode where all samples are drawn randomly from the training data will take samples mostly from the populous class, while likely missing samples from a rare class, such as Simda. To address this, in conjunction with using the default sampling procedure to generate data batches, we test a class rebalancing sampling approach, where for each batch we draw approximately the same number of samples from each class randomly. One batch of samples, of size batch_size× sequence_length is fed to the deep learning model without using the data normalization step.

In total, we have 6 deep learning configurations: each configuration is a combination of 1 of 3 deep learning architectures (CNN, CNN-UniLSTM, CNN-BiLSTM), and one of the 2 batch sample generating procedures in training the model (the default sampling mode, and the class rebalance sampling mode). All models have 3 convolutional layers, while the hyperparameters of a deep learning configuration, i.e., the number of nodes in each layer, is chosen through its performance in the cross-validation procedure.

To avoid overfitting, we use L2 regularization to constrain the weights of the convolutional layers, and dropout in the dense and LSTM layers. We choose the batch size to be 64. Other hyperparameters, e.g., the number of nodes in each layer, are chosen through the 5 fold cross-validation procedure.

Once the best deep learning configuration is chosen, we retrain the model on the whole training set, predict the labels for the malware files in the unlabeled test set, and submit them to the Kaggle website to get back the test set average log-loss – a low average log-loss correlates to a high classification accuracy.

4. Results and Discussion

4.1. Results

Our final deep learning models’ hyper-parameters are as follows. All models have 3 layers of convolutional layers with the rectified linear unit (ReLU) activation function; the number of filters at the 3 layers are 30, 50, and 90. For the CNN models, the outputs of the convolutional layers are connected to a dense layer of 256 units, then fed to the output layer. For the CNN with UniLSTM or CNN with BiLSTM models, we connect the outputs of the convolutional layers to 1 (UniLSTM) or 2 LSTM layers (BiLSTM), each LSTM layer has 128 hidden units; the outputs of the LSTM layers are then connected to the output layer. As described earlier, to complete a deep learning configuration each deep learning architecture (CNN, CNN-UniLSTM, CNN-BiLSTM) will be paired with one of the 2 data batch generators: the default sampling batch generator (DSBG), and the class rebalance batch generator (CRBG). The models are implemented using the Keras library with the Tensorflow backend.

In the 5-fold cross-validation procedure, we train each model for 100 epochs on our Nvidia 1080 Ti GPU; the weights of the model are modified by the Adam optimization method [41] to minimize the average log-loss criteria (i.e. the average crossentropy criteria).

Table 2 reports the number of parameters and the training time for the 6 deep learning configurations. We report the average accuracy and the F1-score of different deep learning configurations in Table 3.

From the results, the CNN-BiLSTM with the class rebalance sampling batch generator configuration has the best F1-score and the best accuracy on validation data. As a result, we train our final model with this configuration on the entire training dataset, where 9 10 of the dataset is used to tune the weights of the model and the remaining 1 10 of the dataset is used as the validation data to choose the best model among the 100 epochs.

Figure 6 visualizes the loss and the accuracy on the training and validation data for the final model.

The final CNN-BiLSTM model achieves an average log-loss of 0.0762 on the validation data and a validation accuracy of 98.80%. Upon submitting the predictions of this model for the test malware files to Kaggle, we receive two average log-loss scores: a public score of 0.0655 calculated from 30% of the test dataset and a private score of 0.0774 calculated from 70% of the test dataset. These results align with the log-loss we obtained on the validation data, which means our final model generalizes well on new data.

Table 4 reports the times our final model takes to pre-process and predict the classes for the 10, 873 test files. To simulate a real-life deployment situation, we load our final model onto a CPU (Intel Core i7 6850K) to do the predictions.

4.2. Discussion

Our experiments show that the 1 dimensional representation of the raw binary file is a good representation for the malware classification problem. It is very similar to the image representation of the malware raw binary file; however it is simpler, it preserves the sequential order of code in the raw binary file, and one does not have to make the decision about the ratio between the width and the height in the image representation.

Our use of the class rebalance sampling procedure helps to improve both the accuracy and the F1 score of all the CNN LSTM models (both the UniLSTM and BiLSTM models). We believe this improvement is due to the fact that the inclusion of samples of all classes in each batch gives the back propagation a better signal to tune the parameters of the models.

The best performance was achieved when training the CNNBiLSTM with the class rebalance sampling procedure. Due to the sequential dependency when computing the cells in the LSTM layer, the CNN BiLSTM can not utilize the GPU as efficiently as the CNN model. With the same batch sampling procedure, training a CNN model is 10 times faster than training a CNN – BiLSTM model. On the other hand, the CNNBiLSTM model uses 268,000 parameters while the CNN model uses 1.84 million parameters. When we use both models to predict the classes of raw binary files on the CPU, the CNNBiLSTM model is only 1.5 times slower than the CNN model. The CNN – UniLSTM model trained with the class rebalance sampling procedure is a nice compromise; training it takes less time than training the CNN-BiLSTM model but it still achieves good performance.

The results also show that adding another dependency direction in the binary code when going from using only the forward LSTM layer (CNN-UniLSTM model) to using both the forward and backward layer (CNN-BiLSTM model) helps improve the performance of the deep learning model. However the bigger jump in performance is achieved when we go from the CNN architecture to the CNN – LSTM architecture.

Ahmadi et al. [19] also evaluate a machine learning-based approach to malware classification using the Kaggle dataset. Their feature engineering approach used a combination of different features extracted from the raw binary files and the disassembled files. One of them is the features extracted on the image representation of the raw binary files. Using the XGBoost classifier on these extracted features of the image representation they obtain the performance of 95.5% accuracy on the 5-fold cross-validation procedure, as shown in Table 4 of [19]). While our 1 dimensional representation of the raw binary file is similar to the image representation of raw binary file, our deep learning does not use feature extraction on top of it, and our best deep learning model obtains the accuracy of 98.2%, which is better than the previous feature engineering approach.

Another advantage of the deep learning approach is the time it takes to classify a new binary file. While training the models requires a GPU, the final model only needs to use a CPU to predict the malware class of a new binary file. Using our regular workstation with a 6 core i7-6850K Intel processor, training and testing files our final model takes on average 1/50 second to classify a binary file. This includes the time taken to convert a binary file to its 1 dimensional representation and the prediction time. As a comparison, two image feature extraction techniques in [19] take on average of 3/4 and 1.5 seconds for each binary file, as can be seen in Figure 8 in [19].

Gibert Llauradó [35] (Chapter 5) uses an approach similar to ours when using convolutional neural networks on the image representation of raw binary file. The CNN model they describe has 34.5 millions parameters; it has a public score of 0.1176 and a private score of 0.1348. Our CNN – BiLSTM model achieved a better performance with a public score of 0.0655 and a private score of 0.0774 while using 268,000 parameters.

5. Concluding Remarks

Our deep learning approach achieves a high performance of 98.2% accuracy in the cross-validation procedure, and the final model has 98.8% accuracy based on the validation data. The appeal of the outlined deep learning approach for malware classification is two fold. Firstly, it does not require feature engineering, which is a big obstacle for researchers who are not familiar with the field. Secondly, the model takes a short time to classify the malware class of a binary file (0.02 seconds in our experiments), hence it is practical to use it in reality.

The results also show that the class rebalance batch sampling procedure could be used to address the class imbalance problem in the dataset. In practice, new malware files belonging the malware families recognized by the model will be found over time. For the deep learning approach, one could start from an available model and update it with new training data to improve its accuracy, thus the cost of retraining the model is small.

Our 1 dimensional representation of the raw binary has its limitations: it does not consider the semantics of the binary code in the raw binary file. However, as our experiments show, there are spatial patterns of each malware class in the raw binary files, and deep learning models could use them to predict the class of a malware file effectively. Gibert Llauradó [35] shows that one could apply deep learning on the disassembled files successfully, it shows that there are merits in considering the semantic meaning of each byte – even if the reverse engineering step is not conducted through disassembling the raw binary files.

5.1. Future Work

For future work, we would like to test our deep learning approach on bigger datasets with more malware classes. One approach is to preserve the semantic meaning of each byte in the raw binary file in the preprocessing step, though this approach means we need a suitable way to compress a large binary file (approximately 60 Mbytes) to a small size without losing the semantic meaning of the bytes in the final representation. Another useful feature would be to modify our deep learning model so it could detect if the new binary file belongs to one of the available classes or belongs to a new malware class. Finally, we could apply more complex deep learning architectures to attain better performance, for example we could add residual modules [42] to the model to alleviate the vanishing gradient problem.

References

1. Scanlon M. Battling the digital forensic backlog through data deduplication. In: 2016 Sixth International Conference on Innovative Computing Technology (INTECH). 2016:10–14. doi:10.1109/INTECH.2016. 7845139.
2. Hitchcock B, Le-Khac NA, Scanlon M. Tiered forensic methodology model for digital field triage by non-digital evidence specialists. Digital Investigation 2016;16:S75 – S85. doi:https://doi.org/10.1016/j. diin.2016.01.010.
3. Sun D. Forensics tool for examination and recovery and computer data. 2010. US Patent 7,644,138.
4. Hibshi H, Vidas T, Cranor L. Usability of forensics tools: a user study. In: IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on. IEEE; 2011:81–91.
5. James JI, Gladyshev P. Challenges with automation in digital forensic investigations. arXiv preprint arXiv:13034498 2013.
6. Franke K, Srihari SN. Computational forensics: An overview. In: International Workshop on Computational Forensics. Springer; 2008:1–10.
7. Mitchell FR. An overview of artificial intelligence based pattern matching in a security and digital forensic context. In: Cyberpatterns. Springer; 2014:215–222.
8. Mohammed H, Clarke N, Li F. An automated approach for digital forensic analysis of heterogeneous big data. The Journal of Digital Forensics, Security and Law: JDFSL 2016;11(2):137.
9. Rughani PH, Bhatt P. Machine learning forensics: A new branch of digital forensics. International Journal of Advanced Research in Computer Science 2017;8(8).
10. In de Braekt R, Le-Khac NA, Farina J, Scanlon M, Kechadi MT. Increasing Digital Investigator Availability through Efficient Workflow Management and Automation. In: The 4th International Symposium on Digital Forensics and Security (ISDFS 2016). Little Rock, AR, USA: IEEE; 2016:68–73.
11. Guarino A. Digital forensics as a big data challenge. In: ISSE 2013 Securing Electronic Business Processes. Springer; 2013:197–203.
12. Nataraj L, Karthikeyan S, Jacob G, Manjunath BS. Malware images: Visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security. VizSec ’11; New York, NY, USA: ACM. ISBN 978-1-4503-0679-9; 2011:4:1–4:7. doi:10.1145/2016904.2016908.
13. van de Weil E, Scanlon M, Le-Khac NA. Network Intell: Enabling the Non-Expert Analysis of Large Volumes of Intercepted Network Traffic. Heidelberg, Germany: Springer; 2018.
14. Lee S, Savoldi A, Lim KS, Park JH, Lee S. A proposal for automating investigations in live forensics. Computer Standards and Interfaces 2010;32(5):246 – 255. doi:https://doi.org/10.1016/j.csi. 2009.09.001; information and communications security, privacy and trust: Standards and Regulations.
15. Casey E, Barnum S, Griffith R, Snyder J, van Beek H, Nelson A. Advancing coordinated cyber-investigations and tool interoperability using a community developed specification language. Digital Investigation 2017;22:14–45.
16. Ratnayake M, Obertová Z, Dose M, Gabriel P, Bröker H, Brauckmann M, Barkus A, Rizgeliene R, Tutkuviene J, Ritz-Timme S, et al. The juvenile face as a suitable age indicator in child pornography cases: a pilot study on the reliability of automated and visual estimation approaches. International journal of legal medicine 2014;128(5):803–808.
17. Ucci D, Aniello L, Baldoni R. Survey on the usage of machine learning techniques for malware analysis. CoRR 2017;abs/1710.08189. URL: http://arxiv.org/abs/1710.08189.
18. Gandotra E, Bansal D, Sofat S. Malware analysis and classification: A survey 2014;05:56–64.
19. Ahmadi M, Ulyanov D, Semenov S, Trofimov M, Giacinto G. Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. CODASPY ’16; New York, NY, USA: ACM. ISBN 978-1-4503-3935-3; 2016:183–194. doi:10.1145/ 2857705.2857713.
20. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–444.
21. Schmidhuber J. Deep learning in neural networks: An overview. Neural networks 2015;61:85–117.
22. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. CoRR 2017;abs/1709.01507. URL: http://arxiv.org/abs/1709.01507.
23. Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. CoRR 2017; abs/1708.02709. URL: http://arxiv.org/abs/1708.02709.
24. Schaefer E, Le-Khac NA, Scanlon M. Integration of Ether Unpacker into Ragpicker for plugin-based Malware Analysis and Identification. In: Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS 2017). Dublin, Ireland: ACPI; 2017:419–425.
25. Saxe J, Berlin K. Deep neural network based malware detection using two dimensional binary program features. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE). 2015:11–20. doi:10.1109/MALWARE.2015.7413680.
26. Hardy W, Chen L, Hou S, Ye Y, Li X. Dl4md: A deep learning framework for intelligent malware detection. Athens: The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp); 2016:61–67.
27. Davis A, Wolff M. Deep Learning on Disassembly Data; 2015. URL: https://www.blackhat.com/docs/us-15/materials/ us-15-Davis-Deep-Learning-On-Disassembly.pdf.
28. Dahl GE, Stokes JW, Deng L, Yu D. Large-scale malware classification using random projections and neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013:3422–3426. doi:10.1109/ICASSP.2013.6638293.
29. Huang W, Stokes JW. Mtnet: A multi-task neural network for dynamic malware classification. In: Caballero J, Zurutuza U, Rodríguez RJ, eds. Detection of Intrusions and Malware, and Vulnerability Assessment. Cham: Springer International Publishing. ISBN 978-3-319-40667- 1; 2016:399–418.
30. David OE, Netanyahu NS. Deepsign: Deep learning for automatic malware signature generation and classification. In: 2015 International Joint Conference on Neural Networks (IJCNN). 2015:1–8. doi:10.1109/ IJCNN.2015.7280815.
31. Pascanu R, Stokes JW, Sanossian H, Marinescu M, Thomas A. Malware classification with recurrent networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015:1916–1920. doi:10.1109/ICASSP.2015.7178304.
32. Kolosnjaji B, Zarras A, Webster G, Eckert C. Deep learning for classification of malware system call sequences. In: Kang BH, Bai Q, eds. AI 2016: Advances in Artificial Intelligence. Cham: Springer International Publishing. ISBN 978-3-319-50127-7; 2016:137–149.
33. Tobiyama S, Yamaguchi Y, Shimada H, Ikuse T, Yagi T. Malware detection with deep neural network using process behavior. In: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC); vol. 2. 2016:577–582. doi:10.1109/COMPSAC.2016.151.
34. Raff E, Barker J, Sylvester J, Brandon R, Catanzaro B, Nicholas C. Malware Detection by Eating a Whole EXE. ArXiv e-prints 2017.
35. Gibert Llauradó D. Convolutional neural networks for malware classification. Master’s thesis; Universitat Politècnica de Catalunya; 2016.
36. Ronen R, Radu M, Feuerstein C, Yom-Tov E, Ahmadi M. Microsoft malware classification challenge. arXiv preprint arXiv:180210135 2018;.
37. Bradski G. The opencv library. Dr Dobb’s Journal: Software Tools for the Professional Programmer 2000;25(11):120–123.
38. LeCun Y, Bengio Y, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 1995;3361(10):1995.
39. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. 2012:1097–1105.
40. Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation 1997;9(8):1735–1780.
41. Kingma D, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014.
42. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016:770–778.

Download the paper here.

Electromagnetic Side-Channel A‚ttacks: Potential For Progressing Hindered Digital Forensic Analysis

$
0
0

by Asanka Sayakkara, Nhien-An Le-Khac & Mark Scanlon

Abstract

Digital forensics is a fast-growing €field involving the discovery and analysis of digital evidence acquired from electronic devices to assist investigations for law enforcement. Traditional digital forensic investigative approaches are o‰ften hampered by the data contained on these devices being encrypted. Furthermore, the increasing use of IoT devices with limited standardisation makes it difficult to analyse them with traditional techniques. ŒThis paper argues that electromagnetic side-channel analysis has signi€ficant potential to progress investigations obstructed by data encryption. Several potential avenues towards this goal are discussed.

1 Introduction

ŒThe increasing consumer reliance on electronic devices has risen to a level where it is easier for aŠttackers to compromise the privacy and security of an individual’s digital information than by any other means. Private information is stored in a wide variety of digital platforms including mobile phones, personal computers, social media pro€files, cloud storage, etc. [28] ThŒe recent emergence of Internet of ThŒings (IoT) devices, which integrates into the fabric of everyday life, enables the digital recording of even more personal information. ŒThe €field of information security deals with the challenge of keeping this sensitive data from falling into the hands of unauthorized parties. However, when criminal and illegal activities involve electronic and computing devices, law enforcement authorities require access to each suspect’s private data, under warrant, in order to collect potentially pertinent evidence [30]. In this regard, the fi€elds of information security and digital forensics are juxtaposed with each other.

Modern personal computers and mobile devices provide a facility to encrypt the hard disks and other non-volatile data storage. While this functionality was fi€rst o‚ffered as an option to users on initial setup of these devices, it is increasingly the default behaviour, especially on mobile environments, such as iOS and Android [2]. While IoT devices have limited data processing power and storage capabilities, lightweight cryptographic mechanisms are utilized in many platforms. Encrypted data has long been identi€fied as a potentially rich source of evidence. Many cases have been hampered when encrypted data was encountered [17]. With respect to IoT devices, even if encryption is not employed, the lack of standardised interfaces to access the stored data can still pose a challenge.

Side-channel analysis has been proven to be e‚ffective against many security mechanisms on computing systems. Accessing unauthorized regions of volatile and non-volatile storage, intercepting regular operations of applications and processes, and many other useful possibilities exist [24]. Among various side-channel aŠttacks, electromagnetic (EM) side-channel analysis is an important class of aŠttacks which does not require an aŠttacker to have physical access to the target device. ŒThis means that passive observation of unintentional EM wave emissions from a target device opens up a window to an aŠttacker to infer the activities being performed and the data being handled on the target [35]. Without running any specifi€c so‰ftware on the target device or without tapping into its internal hardware, EM side-channel aŠttacks can provide a seamless access point for the aŠttacker. Recent advances in the domain shows that such aŠttacks are capable of retrieving sensitive data, such as encryption keys [26].

Most mobile devices and IoT devices seized for forensic investigations tend to be powered on when they are found. However, legal requirements for digital forensic investigation demand that, ideally, investigations should be performed without inadvertently, or intentionally, modifying any information. Meeting this requirement o‰ften prevents an investigator from compromising the so‰ftware and hardware while acquiring evidence [9]. Due to the nature of EM side-channel analysis, it has a desirable hands-o‚ff quality from a forensic perspective and has the potential to act as a manner to unobtrusively access the internal information from a device. A variety of avenues ranging from simple activity recognition to breaking encryption could be bene€ficial to a digital forensic investigator. In this work, various potential applications for EM side-channel analysis in the domain of digital forensics are discussed.

2 Digital Forensic Analysis

A typical digital forensic investigation starts when law enforcement encounters an electronic device in a crime scene or seize it from a person under investigation. ThŒese devices can vary from traditional personal computers and mobile devices to IoT devices, such as smart home devices and wearables. The seized devices are usually handed over to a digital forensic laboratory where specialists perform the investigation on the device [9]. Initially, pictures and notes were taken about the physical conditions of the device. For personal computers, the investigation mainly focuses on the data stored in the non-volatile memory, i.e., the hard disk or solid state drive. A forensically-sound disk image is acquired, which is analysed using specialised so‰ftware tools to identify pertinent information.

ŒThe sole purpose of acquiring a disk image from the device under investigation is to prevent the investigative procedure from inadvertently making changes to the device. Popular tools such as EnCase and ŠThe Sleuth Kit are designed to extract information from disk images. In contrast to personal computers, the forensic analysis of mobile devices typically requires specialised hardware tools due to the fact that di‚fferent makes and models of mobile devices have di‚fferent internal structures. Even though there are various commercial tools available for mobile devices, they need to be updated each time a new device model comes into the market. ŒThe maintainers of commercial tools for forensic evidence acquisition on mobile devices are struggling to keep up with the highly dynamic ecosystem of mobile devices [2].

IoT devices have become ubiquitous in everyday life and collect a large volume of information that can be useful in a forensic investigation [21]. For example, a €fitness wearable can contain highly precise information regarding the movements of the owner, which can assist in identifying where the person was at a particular point in time. Similarly, a smart TV or a smart light bulb may contain information regarding the usage paŠtterns of the owner and might hint at the presence of the owner in a premises at a particular time. However, IoT focused digital forensic tools are extremely limited. In fact, many IoT devices are not usable in investigations due to unavailability of support from commercial vendors or open-source projects. ThŒe large variety of IoT devices in the market makes it virtually impossible to support all of them within a limited tool set.

Whenever encryption is involved in the storage of a device being investigated, forensic tools are unable to extract information [17]. From the investigator’s perspective, a very limited number of workarounds are potentially viable. ŒThe obvious approach can be asking the device owner for the decryption key or password. However, if the device owner is not cooperative, this approach is not viable. Another possible approach can involve seeking the assistance of the device vendor to unlock the access to data using whatever the capabilities the vendor holds. However, many recent cases indicate that even the device vendors do not have access to the encrypted data storage on devices they produce. Under these circumstances, forensic investigations may end up unable to collect the required evidence from the devices they have seized [34].

Figure 1 illustrates the workƒow of actions taken in a typical digital forensic analysis of a device. ThŒe usual sequence of actions to analyse non-volatile storage has to be altered if the device uses encryption to protect data. If the device is turned on at the time it was seized, there’s an opportunity to use EM side-channel analysis as a live data forensic technique on the device.

3 Electromagnetic Side-Channels

Passing time varying electric currents through conductors cause EM waves to radiate into the environment. As computing devices consist of electronic circuits, they unintentionally generate EM emissions during their internal operations [11]. Depending on the exact component on a device that contributes, the resulting EM emission can unintentionally contain information about the activities associated with that component. For example, computer displays are a strong EM wave source that are known to leak information about the images being displayed on screen [33]. Similarly, central processing units (CPUs) of computers are known to provide hints on the CPU activities being performed [5]. From a digital forensic perspective, EM emissions associated with the CPU operations are of speci€fic interest.

In order to use EM emissions as a side-channel information source for an aŠttacker, it is necessary to capture the signals with sufficient accuracy. Professionals in radio frequency (RF) engineering and related €fields use oscilloscopes and spectrum analysers as the typical tools to measure EM emissions from electronic devices for purposes such as electromagnetic compatibility (EMC) testing. However, cheap and o‚ff-the-shelf devices, so‰ftware de€fined radios (SDR), are geŠtting increasingly popular among EM side-channel security researchers due to their lower cost and ease of use with con€figurable so‰ftware components [32].

When an acquired EM signal from a target device, i.e., EM trace, is illustrated as a waveform or as a spectrogram, it is possible to visually distinguish individual operations of the CPU. Using these illustrations straightforwardly to eavesdrop on the the CPU activities is called simple electromagnetic analysis (SEMA). ŒThis has been widely used to demonstrate aŠttacks to computer systems [14]. By monitoring instructions being executed on the CPU, an aŠttacker gains several capabilities including reverse engineering unknown so‰ftware, monitoring the control ƒflow of known so‰ftware, etc.

Di‚fferential electromagnetic analysis (DEMA) is an advanced technique to eavesdrop on critical variables being handled by algorithms running on a CPU [14, 15]. For example, when a cryptographic algorithm performs data encryption continuously over a time period using a single encryption key, the observed EM traces have a strong correlation to that specifi€c reused encryption key. DEMA aŠttacks utilise this correlation between the secret key and the EM traces to reduce the number of bruteforce guesses an aŠttacker has to make in order to determine the secret key’s bit paŠttern. It has been shown that DEMA is successful against many cryptographic algorithms including AES, RSA and many others [25, 37].

Recognising the threat of EM side-channel aŠttacks to computer systems, various countermeasures have been proposed that involve both hardware and so‰ftware modi€cations [25, 27, 36]. Among various so‰ftware based countermeasures, two important methods are masking variables and randomizing the operations of algorithms in order to make it difficult for an external observer to identify them. Similarly, major hardware countermeasures include minimizing the EM emission intensity by employing obfuscation techniques and the use of dual line logic. Even though proper implementation of such countermeasures can place a barrier to the aŠttackers, many computing devices do not implement these techniques – leaving the window for EM side-channel aŠttacks open. Furthermore, it has been shown that even when such countermeasures are implemented on devices, it does not completely prevent EM side-channel aŠttacks. ŒThey simply increase the difficulty for the aŠttacker by requiring more observations and a larger number of EM traces to carry out the same aŠttack procedure.

Figure 2 illustrates a forensic investigative seŠtting for EM sidechannel analysis. ŒThe device under investigation (DUI) is placed inside an EMC/Anechoic chamber to prevent external EM interference and vibrations from a‚ffecting the accuracy of the EM measurement. ŒThe signals are captured using a magnetic loop antenna and converted to an Inphase and ‹Quadrature (I/Q) data stream that is subsequently analysed on a computer system.

4 Electromagnetic Side-Channels for Forensics

With the current challenges in digital forensics and the state-of-the-art of EM side-channel analysis, it is important to identify the future potential impact for digital forensics from these aŠttacks. ThŒis section highlights some of the potential ways this impact may occur in the future under several key themes. Many of these approaches are already starting to be realised and others are ambitious predictions that can prove signi€ficantly bene€ficial.

4.1 More Frequent Cryptographic Operations

EM side-channel aŠttacks require a large number of traces acquired from a target device while the device is performing cryptographic operations using a single key. It has been demonstrated that such aŠttacks are viable under laboratory conditions. However in most PC operating systems, it is rare to fi€nd practical situations where an aŠttacker can observe EM emissions from a device for an extended period of time (since cryptographic operations typically occur less o‰ften than in the laboratory experimental conditions). ŒThe most common encryption occurring on many personal devices is secure socket layer (SSL) based web traffic.

Encrypted storage is becoming commonplace in both desktop and mobile devices. Access to encrypted €file systems causes an increased number of cryptographic CPU operations. Live data forensic techniques can help to perform investigations on such devices [13]. However, forensic investigators o‰ften encounter powered on but locked devices. As long as the device is reading and writing to the encrypted storage, EM emissions should reƒflect the cryptographic operations on the device. ŒTherefore, an aŠttacker can straightforwardly force the victim device to perform cryptographic operations in order to acquire side-channel traces for key extraction.

4.2 Combined Side-Channel Attacks

Instead of using a single side-channel aŠttack in isolation, combinations of multiple side-channel aŠttacks directed towards a single computer system can prove more fruitful. It has been proven that power and EM side-channel analysis can be combined to achieve beŠtter results [1]. ThŒere can be some operations of the CPU that are more clearly reƒected in the device’s power consumption than in the EM emission and vice versa.

Sometimes, combining conventional aŠttacks, e.g., spyware and worms, with EM side-channel aŠttacks can provide new kinds of compound aŠttacks that are difficult to counteract. For example, a malware running on a victim computer can aid an EM side-channel aŠttacker to extract additional information over the EM side-channel alone. ŒThis can be achieved through running specially selected instruction sequences on the CPU to intentionally emit encoded EM signals. Yang et al. [38] illustrated a mechanism to intentionally modulate EM emissions of electronic and electromechanical devices to ex€filtrate data from the device to an external receiver. ThŒis hints at the potential for employing these unintentional EM side-channels to intentionally and covertly transmit data wherever necessary.

ThŒere are two potential avenues for malware assisted EM sidechannel aŠttacks. Firstly, malicious JavaScript can be embedded in a website, using cross-site scripting (XSS) or otherwise, and read the contents of a user’s screen and encode that information into deliberate CPU EM emissions. Furthermore, TEMPEST style aŠttacks on computer monitors can be combined with other aŠttacks to increase the aŠttack surface for air-gapped computer equipment [12]. For example, malware running on a target computer could read local fi€les and encode that information into the computer’s video output. Image steganographic techniques can be used to hide the encoded data from the human user’s view [6]. Meanwhile, a TEMPEST style aŠttack can be performed on the computer’s monitor in order to extract the video frames ultimately leaking data to the aŠttacker.

4.3 Backscatter Side-Channels

RFID tags communicate a unique identi€fication number by changing the impedance of the antenna that result in amplitude modulation (AM) on the carrier wave while using the the same carrier wave as the power source to run the tag’s electronic components. While the primary purpose of RFID is to communicate a hard-coded tag ID, aŠttempts have been made to transmit dynamic sensor data by modulating them in the same way from the tag to the RFID reader [23].

Traditional RFID technology relies on the carrier wave provided by the reader device for power and communication, various ambient RF signals can be used as the carrier wave for communication between two devices. If one device can modulate the ambient RF signal, the other device can recognize this modulation. ŒThis approach of using ambient RF signals for wireless communication is called backscattˆer communication technology, which has received a signi€ficant aŠttention from the IoT research community recently [20]. ThŒere are various carrier wave sources that have been tested in the literature, such as TV transmission stations and WiFi access points [19, 39].

It is important to study this ambient backscaŠtter communication phenomena in the context of EM side-channel analysis. Internal operations of electronic circuits (including CPUs) could demonstrate the backscaŠtter e‚ffect on ambient RF sources during their operation. ŒThe potential of using externally generated RF signals near a target CPU and whether internal CPU operations modulate the RF signal in some predictable manner requires further exploration. Laptop computers have been shown to modulate signals from commercial AM radio stations, which hints at the possibility of this phenomenon [10]. Instead of scanning the RF spectrum for potential EM emissions, such a backscaŠtter technique could enable the aŠttacker to provide both the external RF source and the RF receiver on a speci€fic frequency. Such a frequency can be selected avoiding external interference increasing the accuracy of side-channel information leakage. One advantage of this approach is that instead of blindly looking for the EM emission frequency of the CPU by scanning through the entire spectrum for suspected paŠtterns, the frequency is decided €first and as a result, targeted monitoring for speci€fic modulation paŠtterns becomes viable. ThŒis targeted monitoring helps to reduce, or even eliminate, issues such as signal interference and false positives.

4.4 File Signatures

Many types of digital multimedia content including images, audio, and video fi€les are stored in a compressed format for efficient storage and distribution [3]. As a result, when a computer starts playing an audio/video fi€le in a specifi€c format, e.g., MPEG-2 Audio Layer III, AAC, MPEG-4, etc., or aŠttempts to display a compressed image format, e.g., JPEG, GIF, etc., corresponding decompression so‰ftware has to process the content. Since the so‰ftware’s execution path will be governed by the media €file content, the instruction execution sequence will also depend on the media fi€le. ThŒerefore, it is possible that the CPU might emit EM paŠtterns unique to a speci€fic fi€le being handled. ThŒis could potentially lead to the ability to identify the fi€les being handled by a device.

While there have been aŠttempts to make EM emission signatures for hardware devices and speci€fic so‰ftware running on them for pro€filing purposes, such as RF-DNA technique [8], the possibility of pro€filing speci€fic media fi€les using the EM emission caused by them is a potential avenue for future exploration. Searching for a known fi€le, such as known illegal content, in a target device is a challenge that the digital forensics community has been aŠttempting to solve in efficient and e‚ffective ways as manual comparison is o‰ften overly arduous for the expert investigators [18]. When a device is handling a €file, passive observations of EM emissions can help to pro€file the €file being handled by the device. ThŒis can be later be compared with a known set of fi€le signatures to con€firm the access or processing of a speci€fic fi€le on the target device.

4.5 Packet Analysis at Network Devices

ŒThere are a wide variety of special purpose computers being used in various specialised application environments including network routers and switches. ŒThere can o‰ften be an operational need to investigate a live network. ŒThis focuses on the data-link and IP layers in the networking stack. In such cases, it is necessary to run network analysis so‰ftware tools on speci€fic interfaces at host computers [7]. Analysing the network purely based on the traffic going through routers and switches in order to observe live events is a challenging task. In situations like this, the EM emissions of routers and switches might be able to provide an approximate picture of the workload and traffic on the network. It has been shown that EM emissions observed from Ethernet cables can lead to identify the MAC addresses of frames being handled by networking devices [29]. In that demonstration, aŠttackers have used a technique similar to SEMA.

When IP packets are being switched at routers, the router has to update certain €fields in the packet including time-to-live (TTL) and the header checksum. A‰fter updating these €fields, the router forwards the packet to the relevant network interface. If the EM emission paŠtterns of the router forwarding a packet to an interface and processing a packet are distinguishable, there are opportunities to perform interesting analysis on routers by observing their EM emissions. Packets that contain a speci€fic payload, such as malware that comes from or is addressed to a speci€fic host, and network based aŠttacks, e.g., DoS aŠttacks, might be identi€fiable. Similarly, an aŠttacker could gather EM emissions from a router to eavesdrop on the data being delivered through a wired network. Such possibilities are important from a digital forensic perspective when network analysis tools cannot be aŠttached to a live system for analysis.

4.6 Easy Access to Electromagnetic Spectrum

EM side-channel analysis aŠttacks traditionally involve expensive hardware including RF probes, oscilloscopes, spectrum analysers, and data acquisition modules. Such devices are mostly used in EM insulated laboratory environments. Moreover the con€figuration and operation of these devices requires specialized domain knowledge. Information security specialists and digital forensic analysts might now have access to such hardware and might not possess the specialized knowledge required for their operation. While DIY enthusiast aŠttempts have been made to build such tools for lower costs, such e‚fforts come with a penalty of lower precision and accuracy. ThŒis situation places a signi€ficant barrier to the wide adoption of EM side-channel analysis.

Recent advancements in SDR hardware enable new opportunities for accessing radio spectrum for non-specialists. A‚ffordable SDR hardware and freely available so‰ftware libraries can be used to process and decode various wireless communication protocols. ŒThe ever-increasing processing power and memory capacity on personal computers supports the use of SDR so‰ftware tools at high sampling rates. EM side-channel analysis aŠttackers have recently started to use SDR tools as a more a‚ffordable alternative to the expensive RF signal acquisition hardware. Following this trend, digital forensic analysis should be possible through the leveraging of EM side-channels detected on SDR based hardware and so‰ftware platforms.

4.7 Advancements in Machine Learning

Recent advances that have been made in the area of arti€cial intelligence (AI) have demonstrated promising applications to many other domains across computer science. Various tasks where human intuition was required to perform decision making are now being replaced with machine learning and deep learning based algorithms. So‰ftware libraries and frameworks are becoming increasingly available in order to assist the building of applications that have intelligent capabilities. Examples include the automated detection of malicious programs, image manipulation, and network anomaly detection.

EM side-channel analysis techniques, such as SEMA and spectrogram paŠttern observations, that previously required human intervention, can be automated through the development of AI techniques. It is possible to extract beŠtter information from EM traces than the current manual observations are capable of achieving. ŒThere are several examples of existing work that has already leveraged AI techniques to recognize EM trace paŠtterns, which strongly hints the future role that can be played by AI algorithms in EM side-channel analysis for digital forensics [4, 16, 22, 31].

5 Discussion and Future Work

When digital evidence is presented to a court of law as a part of an investigation, the evidence acquisition procedure can get thoroughly questioned and challenged. ThŒis is due to the fact that legal processes follow strict procedures to ensure fairness to all parties involved. As a result, digital forensic evidence acquisition procedures are demanded to be documented and auditable. Current digital evidence acquisition procedures, practices and tools in use are time-tested to be resilient against such legal challenges. ThŒerefore, whenever a completely new way of acquiring digital evidence is introduced, it has to be thoroughly scrutinized to face reliability challenges in a court of law.

Many of the EM side-channel aŠttacks that have been demonstrated in the literature are performed in controlled laboratory conditions where the aŠttacker had the choice of target device selection. ŒTherefore, the aŠttackers had the freedom to avoid potential pitfalls that could a‚ffect the end result. In order to make such aŠttacks realistic and reliable enough to perform on any arbitrary device encountered, further research is necessary. Sometimes, a successful execution of an EM side-channel aŠttack can be easier for a malicious objective while the same aŠttack can be unreliable and insufficiently trustworthy for a digital forensic investigation. ThŒis situation hints that for EM side-channel analysis to be leveraged for digital forensic purposes, well tested tools and frameworks need to be developed so that the digital forensic community can gradually build trust with the technique.

Our future work is towards this goal of leveraging EM sidechannel analysis as a reliable digital forensic practice to overcome the currently faced challenges. Due to the lack of realistic and reliable aŠttack demonstrations, further evaluations are necessary to con€firm that various published aŠttacks are applicable on a wide variety of devices on the market. ŒThe manner to increase the reliability of these aŠttacks needs to be explored. Many digital forensic specialists working for law enforcement and industry may not be experienced in operating radio frequency data acquisition devices. ŒTherefore, easily operable tools are necessary.

6 Conclusion

ThŒis work discussed the challenges faced by digital forensic investigators due to encrypted storage on computing devices and IoT devices with non-uniform internal designs. EM side-channel analysis techniques which have been successfully demonstrated to leak critical information from computing devices is considered as a potential solution. Various applicable scenarios of the technique in the context of digital forensic domain are identi€fied. While the EM side-channel analysis domain is still in its infancy to address the demanding encryption issue in digital forensics, the aforementioned application scenarios indicate that the combination can produce promising results in the future.

References

[1] Dakshi Agrawal, Josyula R Rao, and Pankaj Rohatgi. 2003. Multi-channel aŠttacks. In International Workshop on Cryptographic Hardware and Embedded Systems (CHES). Springer, 2–16.
[2] Mohd Shahdi Ahmad, Nur Emyra Musa, Rathidevi Nadarajah, Rosilah Hassan, and Nor E‚endy Othman. 2013. Comparison between android and iOS Operating System in terms of security. In 8th International Conference on Information Technology in Asia (CITA). IEEE, 1–4.
[3] Vasudev Bhaskaran and Konstantinos Konstantinides. 1997. Image and video compression standards: algorithms and architectures. Vol. 408. Springer Science & Business Media.
[4] Robert Callan, Farnaz Behrang, Alenka Zajic, Milos Prvulovic, and Alessandro Orso. 2016. Zero-overhead pro€filing via em emanations. In Proceedings of the 25th International Symposium on So‡ftware Testing and Analysis. ACM, 401–412.
[5] Robert Callan, Alenka Zajic, and Milos Prvulovic. 2014. A practical methodology for measuring the side-channel signal available to the aŠttacker for instructionlevel events. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 242–254.
[6] Abbas Cheddad, Joan Condell, Kevin Curran, and Paul Mc KeviŠ. 2010. Digital image steganography: Survey and analysis of current methods. Signal Processing 90, 3 (2010), 727–752.
[7] Vicka Corey, Charles Peterman, Sybil Shearin, Michael S Greenberg, and James Van Bokkelen. 2002. Network forensics analysis. IEEE Internet Computing 6, 6 (2002), 60–66.
[8] Randall D Deppensmith and Samuel J Stone. 2014. Optimized fi€ngerprint generation using unintentional emission radio-frequency distinct native aŠttributes (RF-DNA). In Aerospace and Electronics Conference, NAECON 2014-IEEE National. IEEE, 327–330.
[9] Xiaoyu Du, Nhien-An Le-Khac, and Mark Scanlon. 2017. Evaluation of Digital Forensic Process Models with Respect to Digital Forensics as a Service. In Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS 2017). ACPI, Dublin, Ireland, 573–581.
[10] William Entriken. System Bus Radio. hŠps://github.com/fulldecent/ system-bus-radio. Accessed: 2018-01-26.
[11] Robin Getz and Bob Moeckel. 1996. Understanding and eliminating EMI in Microcontroller Applications. National Semiconductor (1996).
[12] Mordechai Guri, Assaf Kachlon, Ofer Hasson, Gabi Kedma, Yisroel Mirsky, and Yuval Elovici. 2015. GSMem: Data Ex€filtration from Air-Gapped Computers over GSM Frequencies. In USENIX Security Symposium. 849–864.
[13] Brian Hay, MaŠ Bishop, and Kara Nance. 2009. Live analysis: Progress and challenges. IEEE Security & Privacy 7, 2 (2009).
[14] Paul Kocher, Joshua Ja‚e, and Benjamin Jun. 1999. Di‚fferential power analysis. In Advances in Cryptology (CRYPTO ‘99). Springer, 789–789.
[15] Paul Kocher, Joshua Ja‚e, Benjamin Jun, and Pankaj Rohatgi. 2011. Introduction to di‚fferential power analysis. Journal of Cryptographic Engineering 1, 1 (2011), 5–27.
[16] Liran Lerman, Gianluca Bontempi, and Olivier Markowitch. 2011. Side channel aŠttack: an approach based on machine learning. In Proceedings of 2nd International Workshop on Constructive Side-Channel Analysis and Security Design (COSADE). Schindler and Huss, 29–41.
[17] David Lillis, BreŠ Becker, Tadhg O’Sullivan, and Mark Scanlon. 2016. Current Challenges and Future Research Areas for Digital Forensic Investigation. In Šthe 11th ADFSL Conference on Digital Forensics, Security and Law (CDFSL 2016). ADFSL, Daytona Beach, FL, USA, 9–20.
[18] David Lillis, Frank Breitinger, and Mark Scanlon. 2018. Hierarchical Bloom Filter Trees for Approximate Matching. Journal of Digital Forensics, Security and Law 13, 1 (01 2018).
[19] Vincent Liu, Aaron Parks, Vamsi Talla, Shyamnath Gollakota, David Wetherall, and Joshua R Smith. 2013. Ambient backscaŠtter: wireless communication out of thin air. ACM SIGCOMM Computer Communication Review 43, 4 (2013), 39–50.
[20] W. Liu, K. Huang, X. Zhou, and S. Durrani. 2017. Full-Duplex Backscatter Interference Networks Based on Time-Hopping Spread Spectrum. IEEE Transactions on Wireless Communications 16, 7 (July 2017), 4361–4377. DOI: hŠp://dx.doi.org/10.1109/TWC.2017.2697864
[21] Aine MacDermoŠ, Œar Baker, and Qi Shi. 2018. IoT Forensics: Challenges For thŒe IoA Era. In New Technologies, Mobility and Security (NTMS), 2018 9th IFIP International Conference on. IEEE, 1–5.
[22] Alireza Nazari, Nader Sehatbakhsh, Monjur Alam, Alenka Zajic, and Milos Prvulovic. 2017. EDDIE: EM-Based Detection of Deviations in Program Execution. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 333–346.
[23] Sheshidher Nyalamadugu, Naveen Soodini, Madhurima Maddela, Subramanian Nambi, and Stuart M Wentworth. 2004. Radio frequency identi€fication sensors. In ASEE Southeast Section Conference. 1–9.
[24] Romain Poussier, Vincent Grosso, and François-Xavier Standaert. 2015. Comparing approaches to rank estimation for side-channel security evaluations. In International Conference on Smart Card Research and Advanced Applications. Springer, 125–142.
[25] Jean-Jacques Qu‹isquater and David Samyde. 2001. Electromagnetic Analysis (EMA): Measures and counter-measures for smart cards. Smart Card Programming and Security (2001), 200–210.
[26] C. Ramsay and J. Lohuis. White Paper: TEMPEST aˆttacks against AES covertly stealing keys for 200 euros. Technical Report. Fox-IT, Netherlands. 10 pages. hŠttps://www.fox-it.com/nl/wp-content/uploads/sites/12/Tempest_aŠttacks_against_AES.pdf
[27] Hendra Saputra, Narayanan Vijaykrishnan, M Kandemir, Mary Jane Irwin, R Brooks, Soontae Kim, and Wei Zhang. 2003. Masking the energy behavior of DES encryption. In Proceedings of the conference on Design, Automation and Test in Europe-Volume 1. IEEE Computer Society, 10084.
[28] Mark Scanlon, Jason Farina, and M-Tahar Kechadi. 2015. Network Investigation Methodology for BitTorrent Sync: A Peer-to-Peer Based File Synchronisation Service. Computers & Security 54 (10 2015), 27 – 43. DOI:hŠp://dx.doi.org/10. 1016/j.cose.2015.05.003
[29] MaŠhias Schulz, Patrick Klapper, MaŠhias Hollick, Erik Tews, and Stefan Katzenbeisser. 2016. Trust the wire, they always told me!: On practical non-destructive wire-tap aŠttacks against Ethernet. In Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks. ACM, 43–48.
[30] Somayeh Soltani and Seyed Amin Hosseini Seno. 2017. A survey on digital evidence collection and analysis. In 7th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE, 247–253.
[31] Barron Stone and Samuel Stone. 2016. Comparison of Radio Frequency Based Techniques for Device Discrimination and Operation Identi€cation. In 11th International Conference on Cyber Warfare and Security: ICCWS2016. Academic Conferences and Publishing Limited, 475.
[32] Walter HW TuŠlebee. 2003. So‡ftware de€fined radio: enabling technologies. John Wiley & Sons.
[33] Wim Van Eck. 1985. Electromagnetic radiation from video display units: An eavesdropping risk? Computers & Security 4, 4 (1985), 269–286.
[34] Eva A Vincze. 2016. Challenges in digital forensics. Police Practice and Research 17, 2 (2016), 183–194.
[35] Satohiro Wakabayashi, Seita Maruyama, Tatsuya Mori, Shigeki Goto, Masahiro Kinugawa, and Yu-ichi Hayashi. 2017. POSTER: Is Active Electromagnetic Sidechannel AŠttack Practical? In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2587–2589.
[36] Marc WiŠeman and Martijn Oostdijk. 2008. Secure application programming in the presence of side channel aŠttacks. In RSA Conference, Vol. 2008.
[37] Marc F WiŠeman, Jasper GJ van Woudenberg, and Federico Menarini. 2011. Defeating RSA Multiply-Always and Message Blinding Countermeasures. In Cryptographers€ Track at the RSA Conference (CT-RSA), Vol. 6558. Springer, 77– 88.
[38] Chouchang Jack Yang and Alanson P Sample. 2017. EM-Comm: Touch-based Communication via Modulated Electromagnetic Emissions. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 118.
[39] Pengyu Zhang, Dinesh Bharadia, Kiran Joshi, and Sachin KaŠi. 2016. Hitchhike: Practical backscaŠtter using commodity wifi€. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems. ACM, 259–271.

Download the paper here.

Using IMAP Internal Date for Forensic Email Authentication

$
0
0

by Arman Gungor

Internal Date is an IMAP Message Attribute that indicates the internal date and time of a message on an IMAP server. This is a different timestamp than the Origination Date field found in the message header and can be instrumental in authenticating email messages on an IMAP server.

Let’s start with an example. The perpetrator wants to fabricate an email message and make it look like he sent it back in December 2016 from his GoDaddy email account to the Yahoo! email account of his business partner.

He takes a genuine message between the parties from December 2017, edits the subject and the message body to his heart’s content and makes sure to pick a suitable date in December 2016.

He considers producing this forged email message directly. But then, wouldn’t it be more realistic if this message were on the email server? So, he opens Thunderbird where he has his GoDaddy account set up and drags the forged email into his “Sent Items” folder. Thunderbird thinks for a second or two, and voilà—the message is there!

The message looks as follows on GoDaddy’s webmail interface:

Figure 1 — Email message with altered date, as seen on GoDaddy webmail

As seen in the screenshot above, the message is listed under the “Sent Items” folder in the mailbox, and it has a date of Dec 3, 2016.

Proud of his accomplishment, the perpetrator invites the forensic examiner to review his mailbox and authenticate the message.

The forensic examiner acquires the message from GoDaddy over IMAP and finds that the raw message looks as follows:

Figure 2 — Email message with altered date; raw view

The User-Agent: Workspace Webmail 6.9.36 header field is consistent with an email sent via GoDaddy webmail. Basic email metadata mirror what we see in GoDaddy’s webmail user interface. But the examiner caught a big break here: the Message-Id header field contains what seems to be a timestamp.

Message-Id: <20171203153135.4b5c8628937d16bc17cc44c9ad222e17.7ac1f537cc.wbe@email15.godaddy.com>

The Internet Message Format specification defines the Message-ID field as follows

The “Message-ID:” field provides a unique message identifier that refers to a particular version of a particular message. The uniqueness of the message identifier is guaranteed by the host that generates it (see below). This message identifier is intended to be machine readableand not necessarily meaningful to humans. A message identifier pertains to exactly one version of a particular message; subsequent revisions to the message each receive new message identifiers.

How the Message-Id field is populated is implementation-dependent. The host can generate the identifier in its own way, as long as the uniqueness is guaranteed. In this case, it seems that GoDaddy prepended the timestamp 12/03/2017 15:31:35 to the Message-Id value. This is almost an entire year after the timestamp we see in the message, which warrants further review. The forensic examiner does her own independent research and testing to confirm whether or not emails sent via GoDaddy webmail have Message-Id values that start with the timestamp reflecting when the message was sent.

It is important to note that not all Message-Id values contain timestamps. Here is one from Yahoo!:

Message-ID: <2103767838.439800.1512250767353@mail.yahoo.com>

So, let’s think about what else the forensic examiner can find to support her finding.

Internal Date Message Attribute

Another data point we can check on an IMAP server to verify a message’s timestamp is the IMAP Internal Date Message Attribute. This date is not part of the message itself—it is kept by the server and indicates the date and time of the message on the server. Similar, in a way, to file system timestamps for files.

The IMAP 4 Specification indicates that when the APPEND command is used to add a message to a mailbox, the Internal Date Message Attribute should be set to the date/time argument supplied with the APPEND command, if present, or to the current date and time by default.

Luckily for the forensic examiner, when messages are added to a mailbox using most mainstream email clients, the APPEND command is issued without a date/time argument. So, in most cases, the Internal Date Message Attribute should reflect the date and time when the message was added to the mailbox.

In this case, when the examiner looks at the raw IMAP communication log that Forensic Email Collector provides, she sees the following line:

INFO Imap(2)[6] Response: * 1 FETCH (UID 8 RFC822.SIZE 576 FLAGS (\Seen) INTERNALDATE “21-Jun-2018 17:51:47 +0000”ENVELOPE (“Sat, 3 Dec 2016 21:39:27 -0700” “Test Message” ((NIL NIL “john.doe” “godaddy.com”)) ((NIL NIL “john.doe” “godaddy.com”)) ((NIL NIL “john.doe” “godaddy.com”)) ((“jane.doe@yahoo.com” NIL “jane.doe” “yahoo.com”)) NIL NIL NIL “<20171203153135.4b5c8628937d16bc17cc44c9ad222e17.7ac1f537cc.wbe@email15.godaddy.com>”))

So, although the Origination Date found in the message header is 12/3/2016 21:39:27 (-07:00), the Internal Date kept by the server is 6/21/2018 17:51:47 (UTC). Based on the examiner’s findings so far, it looks like this message was originally sent in 12/3/2017 and then added to the mailbox on 6/21/2018. Let’s see if the examiner can find any additional evidence to support her findings.

Unique Identifier (UID) Message Attribute

The IMAP Protocol Specification defines the UID Message Attribute as a unique identifier that must not refer to any other message in the mail folder. Furthermore, UID values are assigned in a strictly ascending manner. Each message added to the mailbox is assigned a higher UID than the previous messages.

So, if our message with the fraudulent date was added to the mailbox after the fact, it should have a larger UID value compared to other genuine messages from the same time period.

In this case, checking the acquisition log confirms that the suspect message has, in fact, the largest UID value and sequence number in the “Sent Items” folder.

Because of this, it was also acquired as the last item in the “Sent Items” folder by Forensic Email Collector as acquisition takes place in ascending UID order. This corroborates the examiner’s finding that the suspect message was added to the mailbox on 6/21/2018 at 17:51:47 (UTC), which is just hours before the forensic examination.

Next Steps

The forensic examiner has found some interesting evidence from the email message itself and from the IMAP server. Two new dates have been identified: 12/03/2017, when the email message appears to have been sent based on its Message-Id, and 6/21/2018, when the email message appears to have been added to the IMAP mailbox.

The forensic examiner will now want to review the workstation where the suspect email message was created and sent, and look at a number of artifacts such as Shellbags, internet history, and LNK files and attempt to correlate the activity on the workstation to the information found on the suspect message and the IMAP server. With some luck, she might be able to determine which software the suspect used to download the email message, modify it and append it to the IMAP server. She might even find some evidence of the suspect Googling how to commit email forgery!

Conclusion

Forensic examiners review a long list of data points when authenticating email messages. In addition to the information one can find within the email message itself, email servers often contain metadata about a message that is stored outside of the message. Internal Date and Unique Identifier (UID) message attributes are two such data points that can be utilized to help confirm findings during forensic email investigations.

Additionally, when looking at an entire IMAP mailbox, forensic examiners can utilize the Internal Date and Unique Identifier (UID) message attributes to quickly identify and hone in on suspicious messages that have a significant discrepancy between their header dates and internal dates and sequence numbers.

About The Author

Arman Gungor, CCE, is a digital forensics and eDiscovery expert and the founder of Metaspike. He has over 21 years’ computer and technology experience and has been appointed by courts as a neutral computer forensics expert as well as a neutral eDiscovery consultant.

Techno Security & Digital Forensics 2018 – San Antonio September 17-19

$
0
0

From the 17th to the 19th of September 2018, Forensic Focus will be attending the Techno Security & Digital Forensics Conference in San Antonio, Texas, USA. If there are any topics you’d particularly like us to cover, or any speakers you think we should interview, please let us know in the comments.

Below is an overview of the subjects and speakers that will be featured at Techno Security. The conference has four tracks: audit / risk management; forensics; information security; and investigations, along with sponsor demos. Forensic Focus will be concentrating on the digital forensics track throughout the event.

Monday September 17th

The conference will begin at midday on the 17th of September, with specialists from SUMURI talking about APFS and considerations for forensic investigators. Alongside this, Jessica Hyde from Magnet Forensics will be discussing how the nature of communication is becoming increasingly interconnected, and taking a look at the implications of this for digital forensics.

GDPR is a hot topic internationally at the moment, so this will also be a subject of discussion on the first day, with John Wilson from Discovery Squared leading a session on what data processors need to know about GDPR requirements. Alongside this, Julie Lewis from Digital Mountain will show us how to acquire and analyse digital data from social media and smartphone apps.

Now that cloud storage is one of the most popular ways to save personal data, it’s becoming increasingly relevant to digital forensic investigations. Katherine Helenek from Digital Intelligence will talk us through trace analysis of cloud storage in the early afternoon, followed by product demos by Oxygen and Magnet Forensics.

John Priest from Cellebrite will spend some time in the afternoon discussing drones – “the threat from above” – and how these are being used in crimes around the world. Meanwhile Donald Malloy from OATH will be demonstrating how to secure smart devices, with a particular focus on new developments in the Internet of Things.

Several sessions will focus on computer security, including talks about the tenets of a robust information security program; the impact of resiliency on cyber security; and a look at how trade secrets are under attack and what to do about it. The final session of the day will look at the growing “digital data universe” – the sheer number of items forensic investigators have to examine nowadays – and how companies can adapt and thrive in this environment.

Tuesday September 18th

The day will begin at 8:00am with some investigations sessions, which are currently being developed. There will also be a talk by Abdul Hassan of the International Counter Terror Foundation looking at how to use social media in counter terror investigations. In the cyber security track alongside these talks, NIST’s framework will be discussed; and there will also be a conversation about recent advances and current controversies in facial biometrics.

Product demos from Magnet Forensics will be available throughout the morning, and CRU will be showcasing some of their new acquisition hardware and its abilities. SecureWorks will be talking through some of their case studies from 2018, with various different types of attacks under discussion.

Renata Spinks from Resilient Cyber Services will be talking about risk management in cyber security, alongside a demo from Oxygen Forensics looking at data acquired from drones. Meanwhile Rich Frawley from ADF Solutions will show some best practices in digital forensic acquisition, from getting a warrant to ensuring your paperwork is courtroom-ready.

What do you do about litigation in digital forensics? Perhaps it’s not something your company has given much thought to yet, especially if you’re new to the field. It’s an important part of running any company though, and when dealing with sensitive case data it can be fraught with complications. Gregory Braunton is going to spend some time on Tuesday morning talking through some of the pertinent points for consideration and how to make sure you’re up to speed.

Keith Leavitt from Cellebrite will be taking a look at how to identify and isolate malware on Android devices, while John Wilson from Discovery Squared will talk about Bitcoin and how it’s sometimes possible to uncover a cryptocurrency trail in an investigation.

Resuming the topic of large data sets and the interconnected nature of communication, Nick Drehel from AccessData will show attendees how to use QUIN-C to collaborate on investigations.

After lunch there will be product demos from Cellebrite, followed by Kimberly Calhoun from ILC looking at how artificial intelligence is being used in predictive policing analytics. Alongside this Jamie McQuaid from Magnet Forensics will talk about how mobile forensic investigations have changed over the past few years, and what developments we can expect in the future.

The mid-afternoon sessions will begin with a talk about the hidden recesses of the dark web and its marketplaces, alongside a demonstration by Amber Schroader of Paraben showing us how to process smartphones and the data we should find when we do. The final sessions of the day will focus on damaged device forensics and machine learning.

Wednesday September 19th

Jeff Shackleford from PassMark will kick off the final day of the conference with a talk about using Windows PowerShell and the command prompt as investigative tools. Kim-Kwang Raymond Choo will be looking at how to bridge the gap between research and practice when bringing new investigators into the field. ‘Securing the Digital Homeland’ will be another topic of discussion during the morning sessions, followed by a look at Alexa and similar devices from a security perspective.

Auditing big data systems is next on the list for the cybersecurity stream, while over in digital forensics we will be looking at how to use open source tools to get past encryption. Mark Spencer from Arsenal Consulting will share the same information he presented at Techno Security Myrtle Beach earlier this year, looking at high stakes evidence tampering and the failure of digital forensics. If you didn’t catch that session at Myrtle Beach, it’s certainly recommended!

At 10.30am Jason Hale from One Source Discovery will take a look at the current state of USB device forensics and how to improve it. Meanwhile Vico Marziale from BlackBag will show us how Mac’s Spotlight feature can be of use in investigations.

Triage and backlogs are always huge topics for investigators, and automating as much of the process as possible can be a big help. That’s what Andrew von Ramin Mapp from Data Analyzers will be focusing on directly after lunch, alongside another session about GDPR implications for the industry, and a look at how organisations are securing Microsoft cloud data on their devices.

Chuck Easttom will be showing us how to conduct dark web investigations in the afternoon, followed by Mark Hallman from SANS demonstrating how to filter Plaso data for use in investigations. The final session of the day will focus on how to use G Suite reports in digital forensic cases.

To view the full conference program and register to attend, please visit the official website. Forensic Focus readers can enjoy a 30% discount on the registration price by entering the code FFOCUSTX18 when booking. 

If there are any talks you would specifically like us to cover, or any speakers you’d especially like to see interviewed, please leave a comment below or email scar@forensicfocus.com.

Giving Back In DFIR

$
0
0

by Jessica Hyde, Magnet Forensics

A few months back I was on my way to BSides NoVa, having a conversation with someone competing in the CTF about where his team would donate the prize money to if they won. I suggested some organizations related to helping young people learn about Information Security. A few hours later, I was relaying the story to a friend and she mentioned that she wasn’t aware of many of the groups that I was referencing. At that point, I realized that information needed to be shared.

A few months later I was at BSidesRoc and heard an incredible keynote by Matt MitchellPractical Security: Real World Lessons. In this presentation, Matt talked about a gamut of ways that Information Security professionals could use their skills to help others. He spoke about work that he and other hackers do that has meaning in different ways. I was inspired and started looking for ways that we can use our skills as Digital Forensic professionals to give back.

DFIR Hierarchy of Needs

This thought, about using our skills for good, kept brewing. As I contemplated how this began to feel like a need, I realized that there was almost a Maslow’s Hierarchy of Needs for Digital Forensics. The more I thought about it, the more sense it made, and I realized I had seen other examiner’s mention a similar momentum through the pyramid of how they use their DFIR skills — and that the top of the pyramid was Giving Back in the DFIR community. The pyramid, as I described last week on the Cyber.Now podcast has 4 layers; Fundamentals and Training, Independent Casework and Continued Education, Sharing Information with the Community, and Giving Back.

The fundamentals and training that we all need to become digital forensic/incident response professionals form the base of the pyramid. One of the important realizations about this pyramid is that even when the lower level needs are met, those needs continue. As forensic professionals, it is imperative to continue our training throughout our careers and be cognizant that as new platforms, devices, operating systems, file systems, applications, etc. come into our space, we need to ensure that we continue to build on learning those fundamentals. If you are looking to cover the fundamentals and training necessary to begin your DFIR career, I recommend checking out resources like DFIR Training and About DFIR or taking a training course like AX100 Forensic Fundamentals.

From the fundamentals and training, we can progress to the independent casework and continued education. This is the area where we are rewarded by learning new things through analysis and learning to master individual skills. We get the satiety from knowing that we solved the puzzle presented in the case. You may have discovered a new artifact, used research you found from others, gotten data from a phone using advanced means, completed complex analysis to feel that sense of success. You also may find great satisfaction from not only the technical aspects of the work that you overcome daily, but the mission you serve be it exonerating the innocent, finding evidence that helps a victim, or finding the information to stop the bleeding in an intrusion and protect an important asset.

Once that competency is there, or even as the examiner is gaining it, the next level of the pyramid is sharing info with the DFIR community. Harlan Carvey examines this layer of the pyramid in his “Beyond Getting Started” blog post. Brett Shaver’s expanded on these thoughts in his “Sharing is Caring” blog post, which I encourage you to read. There are a multitude of ways that an examiner can share including sharing scripts, artifact information, teaching, responding to community questions, mentoring, podcasts, presentations, forensic challenges, creating test data, researching, and writing. Writing can include blogging, peer review, academic journals, and books. I detailed these thoughts and examples in a blog post late last year, “The Importance of Sharing in DFIR”.

Sharing itself is a way to give back to the community. But what about that top layer of the pyramid? What are the ways we can use our skills to give back beyond the traditional sharing of DFIR knowledge? What are ways that we can get even larger outreach and find new ways to share? After being inspired by Matt Mitchell’s keynote at BSidesRoc, I started keeping a list of ideas and organizations and I want to share those with you now. There are so many great people finding ways to give back to the broader community using their DFIR skills.

Ways to Give Back

As we take a look at ways to give back using our DFIR skills or to the community, I will be introducing example organizations and groups. As a disclaimer, inclusion in this blog does not imply endorsement by Magnet Forensics. Additionally, this post is in no way an exhaustive list of organizations or groups in each of the areas mentioned, but simply given as examples of groups in the area. I encourage everyone to carefully research and consider any organization they chose to assist in any way and ensure it aligns with their interests, ethics, and ideals.

Giving back can take a variety of formats. It could be sharing your skills, volunteering your time, or can be a donation. As different methods of sharing are discussed and examples provided, I encourage you to figure out what works right for you. Many of these ideas can be done on your own or through groups that already exist. Many times, the existing concept can be modeled in your community and you may want to look to organizations as a source for information and knowledge, or to donate you time, money or skills.

Teaching about Digital Safety and Security

Lots of groups would benefit from learning more about digital safety and security. Sharing the knowledge, you have can be invaluable to marginalized or vulnerable groups. This could involve anything from teaching young teens about Internet Safety by speaking at a school or camp, volunteering to share about phishing scams to elderly, helping victims of abuse to not be targeted by their abusers via their mobile devices, to bringing sharing information security tips in your community. In my previous neighborhood, for example, I volunteered to speak to a camp for teenage girls about Social Media Safety and Responsibility. We discussed a variety of topics and they asked great questions and we all learned together. There is a great article about how Eva Galperin is helping victims of domestic violence on Wired. Or maybe you can consider throwing a Cryptoparty like Matt Mitchell does to share information in your local community.

Help Others Learn About or Get a Start in DFIR

There are a variety of ways to help people learn about the field and get started. One of the great ways is to mentor others. If you are seeking out young people to mentor, consider an underrepresented person who may benefit from your experience and knowledge. Another idea is to participate in a Resume review at an Infosec conference like the sessions Lesley Carhart often hosts at InfoSec conferences. There is also a large list of organizations geared towards introducing young people and others to digital forensics, information security, and Science Technology Engineering and Math (STEM) as a whole. Some of these organizations are listed below and include links to their pages:

One example of a group that focuses on introducing Digital Forensics concepts to underrepresented youth is the Cyber Sleuth Science Lab which just completed a week long digital forensics camp for 80 high school age students in Baltimore, MD. In talking to a friend, Richie Cyrus, about possibly volunteering for a Cyber Sleuth Science Lab, he said something that stuck with me. He said of course, that his mother had taught him that it is our responsibility to “send the elevator back down.” I really liked that.

Scholastic Competitions

Another way to help young people who are interested in DFIR or the greater Information Security field is to volunteer to assist with the wide array of scholastic competitions that exist. David Cowen writes about his experience volunteering with the Collegiate Cyber Defense Competition and explains a bit about it in his blog post on his Hacking Exposed Computer Forensics Blog. It is a good example of what one of these events is like. Here is a list of other scholastic competitions.

Organizations / Conferences / Workshops Geared Toward Women

There are a variety of groups that are dedicated to inspiring those who identify as women to grow and develop in this field. These organizations often have a variety of ways in which you can participate. This just a small example of groups in this category. Actually, this post is being initially released when I will be participating in a DFIR Women’s Lunch being held in conjunction with DFRWS Conference where we are discussing ways to give back in the DFIR community. There are groups like:

Scholarships

There are also a variety of scholarships available. Many of these scholarships are to bring underrepresented groups to training or educational events. Please consider sharing information about these scholarships with people who may be interested in them.

Summary

I hope just as I was inspired to think of ways to give back to the greater good with my DFIR skills, that I have provided those of you looking to give back with ideas of ways to do so. There are a multitude of ways that you can help others. Giving back to others allows us all to serve the better good and ensure that we leave everything a bit better than we found it. Additionally, you are setting up positive experiences for others to associate with our profession which may expose others to the field or inspire future forensic examiners.

Do you have other ideas to contribute to the greater good using your skills or by inspiring others? If so please feel free to share. I look forward to hearing how people are helping others.

Questions or comments? More ways to share? Reach out to Jessica at: Jessica.Hyde@magnetforensics.com

This article was originally posted on Magnet Forensics’ blog. Magnet is a global leader in digital forensic technology with solutions being used in cases ranging from child protection to counter terror and everything in between. Find out more here.


Reducing The Mental Stress Of Investigators

$
0
0

by Eric Oldenburg, Griffeye 

We recently met up with Eric Oldenburg, Griffeye’s new Law Enforcement Liasion in North America, and heard about his new role. Here, he explains how reducing mental stress for investigators is a driving force for him, one that led him to work for Griffeye, and why the mental health of investigators is a subject that we must talk more about.


In 2001 I started working with child sexual abuse crimes at the Internet Crimes Against Children Taskforce (ICAC). Looking back, I can’t think of any other work that is more fulfilling, but before taking the job I had no idea how mentally difficult it would be. At that time, I don’t believe anyone had a great understanding of the trauma that it causes investigators and examiners to see it. But when you break it down, and you really think about it, you are watching children being abused, which is a terrible and unnatural thing. So as a result, it comes with a lot of problems.

When stress starts to take its toll

Speaking for myself, I started to feel mentally stressed after about four years. I often came home from work mad and I didn’t know why. My home life with my family suffered and my marriage was under a lot of stress – to the point where I almost got divorced. I also had physical issues. But thanks to my wife, I realized I needed a break from the job.

The plan was to just take a one-year break and then go back as a computer forensics examiner, but it ended up being four years. And the reason I eventually went back was that a lot of other great examiners started to leave the ICAC unit because of the same reason. It wasn’t just me – and I realized that this is a huge problem. So this time, my mission was to find a solution of alternate workflows and a way to minimize the exposure. This is when I first came into contact with Griffeye.

A man called Dave

One of the people who left as a result of mental stress was Dave. I often tell people about him because he was one of the best examiners I have ever met, but three years into it we lost him. One day he just raised his hand and said, “I can’t do this, it’s too much”.

My belief is that the better you are at your job as an examiner, the harder you work, and the quicker you get to the point where you can’t do it anymore. And for Dave, it got so far that the videos were playing in his head 24 hours a day, seven days a week. He just couldn’t get them out of his head.

He also ended up getting divorced because of it, and that is a unique problem. Usually, your partner is your support staff – the one you lean on. But when you are exposed to child sexual abuse every day, you don’t want to put that on your significant other. And also, you don’t want to talk about it because you don’t want to go through it again. So, that means you have nowhere to put it – there’s nowhere for this pain to go. And because of that, great investigators like Dave quit.

When I asked Dave about permission to tell his story, his answer was “Please, tell everybody. Tell everybody you can that it happened to me. I am a cautionary tale.” So, I tell his story as often as I can because it gets to the core of why the mental health of CSA investigators needs more resources and why solutions like Griffeye are so important. We need to ensure that great people keep doing great work to protect children.

The need to reduce exposure

Unfortunately, it takes around two to three years for most examiners to learn everything and be really good at the job, but that is almost the same time frame as it is to get exposure fatigue. So, it takes huge investments, time and experiences to create really good examiners – and when they finally get there they are so sick of the job that they leave. So, we lose both good people and great skills. That is why we must use solutions to extend the time that these examiners are in the game. In my opinion, the only way that is going to happen is to reduce the exposure.

I often say that people have an expiration date, and it differs from person to person. For one person it might be a couple of months before they can’t handle it anymore. For someone else, it’s ten years; and some people are just not suited for the job at all. But if we can reduce the exposure we have come a long way.

I liken it to a boxing match. If the boxer keeps the guard up and doesn’t get punched as much, he can stay all ten rounds in a fight. But if he constantly gets hit over and over and over again, he goes down. So, the mission is to keep the “fighters” in the fight longer, hopefully the whole ten rounds – and that is why I am so happy about working for Griffeye. The core of Griffeye is to help investigators in their job. And also, I am happy to see that the mental health of investigators is something that more and more people have come to recognize.

Find out more about Eric’s role at Griffeye and how he makes use of Griffeye Analyze to reduce exposure.

Have Your Say In The House Of Lords’ Select Committee On Science And Technology

$
0
0

Controversy has been raging around ISO 17025 ever since the standard was adopted for digital forensics back in October 2017. Although many people who work in the industry agree that standardisation is advisable and probably necessary if we are to keep moving forward, there have been many criticisms of ISO 17025 and its effectiveness when it comes to digital forensics.

The baseline of the problem seems to be that ISO 17025 was not specifically designed for digital forensics; instead, it takes the standards of ‘wet’ or traditional forensics and applies them to computing devices. This has a number of issues, not least the fact that technological advances are constantly happening; in a field where most large apps are being updated a couple of times per month as a minimum, it becomes very difficult to properly standardise tools and methodologies.

Another concern for many people is the cost associated with accrediting a lab and keeping up with ISO 17025. Reports of accreditation costing in excess of £50,000 have made some practitioners nervous about applying.

If you want your opinion on ISO 17025 to be heard by the people who make the decisions, now’s your chance. The House of Lords’ Science and Technology Select Committee is conducting an enquiry into forensic science and inviting individuals and companies to submit evidence for consideration.

In total there are seventeen questions making up the enquiry, three of which are specifically focused on digital forensics:

  1. Are there gaps in the current evidence base for digital evidence detection, recovery, integrity, storage and interpretation?
  2. Is enough being done to prepare for the increasing role that digital forensics will have in the future?
  3. Does the Criminal Justice System have the capacity to deal with the increased evidence load that digital forensics generates?

The current enquiry springs from the 2015 Government Chief Science Advisor’s Report, Forensic Science And Beyond, which included a section on ‘the domain of cyberspace’. The report discussed questions such as the global nature of cybercrime; the proliferation of devices and data; and a shortage of skills in the field. These are all questions that need to be addressed, and in our own recent survey Forensic Focus’ readers on the whole agreed that standardisation would be a positive step for the industry.

When asked whether their organisations were planning to attain ISO 17025 accreditation, the majority of respondents to our survey said ‘No’.

However, when asked about the necessity of a standard for digital forensics in general, many people replied more positively.

Almost 62% of respondents agreed that some means of standardisation is necessary for the community. However, people were less likely to advocate for ISO 17025 specifically, with just 23% of people agreeing that it would be good for digital forensics. Just under 23% of respondents were on the fence about whether ISO 17025 could be used to cover all necessary aspects of digital forensics standardisation, but only 6.67% of people thought it could.

So if we agree that standardisation would be helpful, why isn’t the digital forensics community embracing ISO 17025 with open arms? Forensic Focus asked respondents to our recent survey for their thoughts on the standard; here are some of the replies.

“[ISO 17025 accreditation is] too expensive, the money could be spent on training.”

The financial shortcomings presented a popular reason for people’s reluctance to put themselves forward for accreditation. Running a forensics lab is an expensive endeavour at the best of times, whether you’re creating your own tools or using other people’s. There will always be a new development needed, or training on a new product, or a new operating system that suddenly plunges everything into the dark again.

Several people spoke about their frustration at the onus being on the labs themselves to demonstrate digital forensics tools’ effectiveness, rather than on vendors:

“[ISO 17025 is] inappropriate for the UK LE digital forensics community. Standardisation is a good thing, so why are different forces receiving different advice and assessments? Why does each force have to validate their tool use, why aren’t the tool vendors being assessed directly?”

Conversations both online and offline recently have veered towards this topic, often with people expressing confusion about what they are even meant to do in order to become accredited.

“It’s an incredibly time consuming & expensive process – there’s no central governance to go [to] for help, or to share best practice. Everyone seems to be going it alone.”

Respondents also expressed concern that the standard was focusing on the wrong things, and meant that people were spending less time on the jobs at hand and more time on the bureaucracy required to keep up the standard, thus arguably having the opposite effect from its goal.

“ISO 17025 should have been driven from the centre and should not force each organisation to spend considerable time and effort to get to a place where it is obvious people need to be employed simply to [be] administrators and checkers. At the moment valuable time is spent not processing case work but checking others’ work or following a tick box regime rather than empowering people to think for themselves, solving problems in a logical way appropriate to the investigation in hand.”

Some are worried that larger corporations will see it as a money-making exercise rather than a way to ensure consistent standards across the industry, and that this may have a negative effect on digital forensics as a whole. The amount of time it takes to adhere to ISO 17025 is another frequently cited challenge.

“It is being massively interpreted across the public sector. It is supposed to set standards; however, to reach those new standards, inconsistent procedures are being put into place. ISO is seen by many as purely a money-making exercise and is not respected by a lot of colleagues. Where law enforcement is concerned, it has massively increased the time taken to examine an exhibit, with little or no benefit in return.”

Other respondents were skeptical about the usefulness of standardisation on the whole:

“It is liable to create too much emphasis on having the accreditation, which organisations are spending an obsessive amount of time on, in turn neglecting the core role of doing digital forensics. As long as protocols are adhered to within the law of the land then that should be sufficient. The evidence test in a courtroom will NOT be whether you have the ISO standard! A DFI whose organisation has ISO will likely achieve the same/similar results to a DFI who does not have ISO.”

The general feeling among the community seems to be that standardisation on the whole is a good idea, but that ISO 17025 might not be the right way to go about it. If you’d like to have your say and make your views heard, you can find more details about how to do so in this PDF or at Parliament.uk. Responses need to be no longer than six sides of A4 paper, and they must be submitted to the Committee by the 14th of September 2018.

Drone Forensics Gets A Boost With New Data On NIST Website

$
0
0

by Richard Press, NIST

Aerial drones might someday deliver online purchases to your home. But in some prisons, drone delivery is already a thing. Drones have been spotted flying drugs, cell phones and other contraband over prison walls, and in several cases, drug traffickers have used drones to ferry narcotics across the border.

If those drones are captured, investigators will try to extract data from them that might point to a suspect. But there are many types of drones, each with its own quirks, and that can make data extraction tricky. It would help if investigators could instantly conjure another drone of the same type to practice on first, and while that may not be possible, they can now do the next best thing: download a “forensic image” of that type of drone.

A forensic image is a complete data extraction from a digital device, and NIST maintains a repository of images made from personal computers, mobile phones, tablets, hard drives and other storage media. The images in NIST’s Computer Forensic Reference Datasets, or CFReDS, contain simulated digital evidence and are available to download for free. Recently, NIST opened a new section of CFReDS dedicated to drones, where forensic experts can find images of 14 popular makes and models, a number that is expected to grow to 30 by December 2018.

Kaitlyn Fox, a laboratory assistant at VTO labs, inspects an aerial drone while VTO chief technology officer Steve Watson reviews data from the drone. Photo courtesy of VTO labs.

“The drone images will allow investigators to do a dry run before working on high-profile cases,” said Barbara Guttman, manager of digital forensic research at NIST. “You don’t want to practice on evidence.”

The drone images were created by VTO Labs, a Colorado-based digital forensics and cybersecurity firm. NIST added the images to CFReDS because that website is well-known within the digital forensics community. “Listing the drone images there is the fastest way to get them out to experts in the field,” Guttman said.

Work on the drone images began in May of last year, when VTO Labs received a contract from the Department of Homeland Security’s (DHS) Science and Technology Directorate.

Aerial drones at the VTO Labs field research station in Colorado. Photo courtesy of VTO labs.

“When we proposed this project, there was little existing research in this space,” said Steve Watson, chief technology officer at VTO. The drone research was needed not only to combat drug smuggling, but also to allow officials to respond more quickly should a drone ever be used as a weapon inside the United States.

For each make and model of drone he studied for this DHS-funded project, Watson purchased three and flew them until they accumulated a baseline of data. He then extracted data from one while leaving it intact. He disassembled a second and extracted data from its circuit board and onboard cameras. With the third, he removed all the chips and extracted data from them directly. He also disassembled and extracted data from the pilot controls and other remotely connected devices.

“The forensic images contain all the 1s and 0s we recovered from each model,” Watson said. The images were created using industry standard data formats so that investigators can connect to them using forensic software tools and inspect their contents. The images for each model also come with step-by-step, photo-illustrated teardown instructions.

Watson was able to retrieve serial numbers, flight paths, launch and landing locations, photos and videos. On one model, he found a database that stores a user’s credit card information.

Investigators can use the images to practice recovering data, including deleted files. Universities and forensic labs can use them for training, proficiency testing and research. And application developers can use the images to test their software. “If you’re writing tools for drone forensics, you need a lot of drones to test them on,” Guttman said.

A description of the drone images and instructions for accessing them are available on the new drones section of the CFReDS website.

This article was originally published on NIST.gov.

ICDF2C 2018 – New Orleans September 10-12

$
0
0

From the 10th to the 12th of September 2018, Forensic Focus will be attending ICDF2C in New Orleans.Below is an overview of the subjects and speakers that will be featured at the conference. If there are any topics you’d particularly like us to cover, or any speakers you think we should interview, please let us know in the comments.

Monday September 10th

Following the general welcome and a keynote by Dr. Deborah Frincke from the NSA, the first full session will be on carving and data hiding. After lunch two workshops will be run by security experts Riscure, looking at how to extract secrets from encrypted devices.

In the evening there will be a gala dinner where attendees can network, discuss the talks that took place throughout the day, and get to know a little about the area.

Tuesday September 11th

The second day of the conference will be packed with sessions. The opening keynote will be given by Golden G. Richard III, and then after the coffee break researchers from the University of New Haven will look at cryptowallet application analysis. They will then demonstrate AndroParse, which extracts data from Android devices.

After lunch there will be a short sponsor talk by Atola Tech, and then a group from Old Dominion University will show us a hybrid intrusion detection system for worm attacks. Vikram Harichandran from MITRO will demo CASE, a new standard for improving digital forensic investigations, and then there will be a coffee break and poster session before we reconvene for the final talks of the afternoon.

Session III will be devoted to forensic readiness and will begin with researchers from the University of Pretoria showing us their new framework for ransomware investigations. After this a group from Estonia and Spain will look at how to forensically analyse data from an online game on the Steam platform. The final session will see Jieun Dukko and Michael Shin from Texas University showing their digital forensic investigation and verification model for industrial espionage.

Wednesday September 12th

Day three will begin with the team from University College Dublin giving an overview of solid state drive forensics: where we currently stand, and where to go next. Neil Rowe of the US Naval Postgraduate School will then show us how to associate drives based on their artifact and metadata distributions.

After the coffee break, researchers from the Air Force Institute of Technology will look at reconstructing digital forensic event graphs. The final session of the conference will see a group of researchers from Nanjing University and the University of Texas looking at multi-item passphrases and how to protect oneself against offline guessing attacks.

Following lunch there may be a sightseeing tour depending on interest.

To view the full conference program and register to attend, please visit the official website

If there are any talks you would specifically like us to cover, or any speakers you’d especially like to see interviewed, please leave a comment below or email scar@forensicfocus.com.

Database of Software “Fingerprints” Expands to Include Computer Games

$
0
0

by Richard Press, NIST

One of the largest software libraries in the world just grew larger. The National Software Reference Library (NSRL), which archives copies of the world’s most widely installed software titles, has expanded to include computer game software from three popular PC gaming distribution platforms—Steam, Origin and Blizzard.

The NSRL, which is maintained by computer scientists at the National Institute of Standards and Technology (NIST), allows cybersecurity and forensics experts to keep track of the immense and ever-growing volume of software on the world’s computers, mobile phones and other digital devices. It is the largest publicly known collection of its kind in the world.

To people who work in cybersecurity and digital forensics, the world is a vast and ever-rising ocean of digital objects. NIST’s Reference Data Set—a list of more than 40 million hashes, or digital “fingerprints” of known software files—helps them quickly find what they’re looking for.
Credit: K. Irvine/NIST

The NSRL does not loan out the software in its collection. However, NIST runs every file in the NSRL through an algorithm that generates a digital “fingerprint”—a 60-character string of letters and numbers, also known as a hash, that uniquely identifies that file. Every quarter, NIST releases an updated list of hashes to the public. The list, which NIST calls the Reference Data Set, or RDS, can be freely downloaded from the agency’s website. The latest RDS contains more than 40 million hashes, including those for the recently added video game files.

To people who work in the fields of cybersecurity and digital forensics, the world is a vast and ever-rising ocean of digital objects. The RDS allows them to navigate that ocean and quickly find what they’re looking for.

Many crimes today involve some form of digital evidence, and the NSRL helps investigators to process that evidence more quickly. If investigators have a seized hard drive or mobile phone, for instance, they can quickly hash all the files on that device, then compare that hash list to NIST’s RDS. All the files that match can be typically ignored because they are known software files that wouldn’t contain information relevant to the investigation.

“After they filter out all of the known files, they’re left with everything that’s not recognized,” said Doug White, the NIST computer scientist who runs the NSRL. “Those are the files that might be interesting.”

Digital forensic investigators at all levels of government and in private industry rely on the RDS to efficiently manage their caseload.

The NSRL contains operating system software, office software, media players, device drivers—all types of software files that are commonly installed on personal computers. In 2016, the NSRL expanded to include hundreds of thousands of mobile apps, which extended its usefulness to mobile phones.

The recent addition of gaming software to the NSRL reflects the growing popularity of that software category. “We’re not watching what gamers are doing,” White said. “But we need to include gaming software in the NSRL if we want to stay relevant.”

Among the video game titles added to the NSRL are “PlayerUnknown’s Battlegrounds,” “World of Warcraft” and “Mass Effect.”

“These games are insanely popular,” said Eric Trapnell, a NIST computer scientist who helped curate the collection and is a gamer in his spare time. “Some of them have install bases in the millions.”

Many of the titles were donated to the NSRL by Valve Software, which owns the Steam platform; Electronic Arts, which owns Origin; and Activision Blizzard, which owns Blizzard. Other titles were purchased if their install base was large enough to justify the expense. All titles in the NSRL are properly licensed and acquired.

While the NSRL exists primarily to support cybersecurity and law enforcement efforts, it is also considered a repository of culturally significant digital artifacts. While important books, films and audio recordings are preserved at the Library of Congress, the NSRL functions as a national software archive. Historians consider this important because most of modern culture is both produced and consumed using software.

“Think of all the PowerPoints and Word documents that have tremendous historical significance,” said Trevor Owens, head of Digital Content Management at the Library of Congress. He might have added digital artworks, maps and interactive media. “Those documents might be lost, if future historians don’t have access to a comprehensive collection of software.”

An earlier batch of video games was added to the NSRL two years ago, including first editions of “Mario Bros.,” “Asteroids” and “Sim City,” preserving these retro titles and associated artwork for posterity.

While law enforcement professionals and digital culture geeks might seem strange bedfellows, White says he’s not surprised by their shared interest in the software library. “We preserve the software and make the RDS available to the public,” White said. “The more people who find that useful, the better.”

This article was originally published on NIST.gov.

Word Forensic Analysis And Compound File Binary Format

$
0
0

by Arman Gungor

Microsoft Word forensic analysis is something digital forensic investigators do quite often for document authentication. Because of the great popularity of Microsoft Office, many important business documents such as contracts and memoranda are created using Word. When things go south, some of these documents become key evidence and subject to forensic authentication.

My goal in this article is to review a sample Word document in Word Binary File Format, take a look at the underlying data in Compound File Binary (CFB) file format and see what we can find out beyond what mainstream tools show us.

I chose a sample in Word Binary Format (i.e., .doc) rather than in Word Extensions to the Office Open XML File Format (i.e., .docx) because many other file types in the Microsoft universe, such as MSG files, are also based on the CFB file format. I consider CFB to be a treasure trove of forensic artifacts.

Target Document for Word Forensic Analysis

Our target Word document is a document created on 8/30/2018 8:19 PM (PDT) using Word 2007 on a computer running Windows 7 SP-1. It was saved as a DOC file by using the “Word 97-2003 Document” option in the file save dialog in Word. While installing Office 2007, the suspect had chosen “Chris Doe” and “CD” as his “User name” and “Initials” respectively. These preferences are shown in Word options as follows:

Manipulation by the Suspect

It is important to the suspect that this Word document appears to have been created in February 2007. He is somewhat tech-savvy and identifies the FILETIME structures in the Summary Information stream of the document. The creation date (GKPIDSI_CREATE_DTM), last save date (GKPIDSI_LASTSAVE_DTM), and last printed date (GKPIDSI_LASTPRINTED) timestamps look as follows:

Internal Timestamps in The Summary Information Stream Found during Word Forensic Examination

The suspect uses an online date converter and converts the date February 8, 2007 16:15:19 UTC to FILETIME format and arrives at the value “80456D539C4BC701”. He makes a working copy of the file using Windows Explorer, and then replaces the bytes above for the creation timestamp with his new FILETIME value.

The suspect checks the internal metadata of the Word document using freely available tools such as olefile and ExifTooland sees that the creation date internal timestamp is reported as he expected. The following is the output by olefile (emphasis added):

The suspect then fires up Word to see if his edits are recognized by Word as he intended. Word shows the properties of the file as follows:

To his surprise, the original creation date of 8/30/2018 8:19 PM (UTC -7) is still there! Running a search for the byte sequence for that FILETIME value (006A3A5CD940D401) returns no results. So, where is this date coming from?

He also notices a discrepancy between how Word counts the number of words in the document (538), and how ExifTool and olefile count them (539). He is not too concerned about this from a Word forensic authentication perspective.

Perplexed by the mysterious creation date, the suspect goes back to the drawing board, does more research and learns that Word documents also contain a Dop structure, which stores their creation (dttmCreated), last modification (dttmRevised), and last print (dttmLastPrint) dates as DTTM structures.

The DTTM structure is quite different than a FILETIME structure. It looks as follows:

* Day of the week is an unsigned integer starting with Sunday (0x0) and ending with Saturday (0x6).

The suspect finally finds the DTTM structure that represents the creation date of the document (dttmCreated). It looks as follows:

Word dttmcreated Value in Dop Found during Word Forensic Analysis

He then converts his desired creation date (Thursday, February 8, 2007 08:15 AM (UTC -8)) to DTTM as follows (note that the DTTM structure does not contain any data for seconds, nor does it contain time zone information):

This results in a DTTM value of 0F42 B286. Once the byte sequence is replaced, Word shows the internal creation timestamp as follows:

Word document metadata after manipulation

Pleased with his accomplishment, the suspect emails the manipulated document and calls it a “native ESI production”. This way, he thinks, he won’t have to worry about inconsistencies in the file system timestamps. Although, with some more effort, he is confident that he could doctor them, too.

Forensic Authentication of the Word Document

The forensic examiner receives a copy of the email containing the manipulated Word document for forensic authentication. The email is in MSG format, exported from the mailbox of the attorney who hired her. This is not ideal, but it is the best available copy she has access to at that moment.

After making a preservation copy, she starts by examining the attachments table in the MSG file. The IAttach interface shows the following MAPI properties for the attachment:

MAPI Properties for Attachment Found during Word Forensic Authentication

Manually decoding the FILETIME values for PR_CREATION_TIME and PR_LAST_MODIFICATION_TIME, she finds a creation timestamp of 9/10/2018 22:20:46.9509489 (UTC) and a last modification timestamp of 9/11/2018 04:20:34.1458881 (UTC). The email containing the attachment has creation and sent dates (PR_CREATION_TIME and PR_CLIENT_SUBMIT_TIME) that are both several hours later—9/11/2018 20:48 (UTC).

Considering the presence of high-resolution timestamps and the timing of when the email was sent, it is very likely that the creation and last modification timestamps the examiner identified above were the file system timestamps of the Word document on the suspect’s system as he attached the file to the email. Moreover, it is likely that the files resided in a file system with high timestamp resolution such as NTFS. She makes a note of these timestamps.

The forensic examiner then saves the Word attachment out to a folder for further analysis. She notes that when the attachment is saved, the creation file system timestamp is preserved (i.e., 9/10/2018 22:20:46.9509489 (UTC)), but the last modification file system timestamp is set to the time when she saved the attachment.

Keeping this in mind, she runs the file through X-Ways 19.5 and extracts internal file metadata to get her examination started. X-Ways shows the following information:

Word Document Metadata Extracted by X-Ways Forensics 19.5 during Word Forensic Authentication

There are a few things here that she finds interesting from a Word forensic authentication perspective:

Application Version (AppVersion)

X-Ways reported an AppVersion value of 12.0. Our forensic examiner wants to manually verify where this value is coming from. The Document Summary Information stream of the Word document contains a property named GKPIDDSI_VERSION. This property specifies the version of the application that wrote the property set storage.

In this case, the value of this property is set to 000C 0000. The 000C 0000 bytes indicate the major and minor version of the application, which is interpreted as 12.0 in the following manner:

0xC0000 is equal to 786,432 in decimal, which was the “version” value reported by olefile earlier in the article.

Word 12.0 is also known as Word 2007, which was released in late 2006. So, the application version does not pose a problem with the apparent creation date of the document, which is in February 2007.

Operating System Version (OSVersion)

In addition to the AppVersion, both the Summary Information and the Document Summary Information streams in a Word document contain a 4-byte PropertySetSystemIdentifier structure. The first two bytes of the structure indicate the major and minor versions of the operating system that wrote the property set. The last two bytes represent the OSType. According to the specification, OSType must be 0x0002.

Operating System Version Found during Word Forensic Analysis

In the screenshot above, you can see the PropertySetSystemIdentifier structure highlighted. The 06 and 01 values indicate the major and minor version of the OS respectively. Windows 6.1 represents Windows 7, which was released to the public in the second half of 2009, which is after the apparent creation date of the Word document.

It is easy to jump to a conclusion here and consider this a red flag. However, the forensic examiner knows that when the Word document is saved, the property sets are re-written and the AppVersion and OsVersion values are updated to reflect the application and OS that were used during the last save. Since the Word document was last modified in 2018, it is possible that Windows 7 was used in a subsequent save, but not necessarily when the document was initially created.

Root Entry Date

In addition to the internal creation and last modification timestamps, the Word document contains a FILETIME structure that represents the modification timestamp of the root entry of the CFB format file. This value looks as follows:

CFB Root Entry Modification Date Found during Word Forensic Examination

The FILETIME value 90AD48A1DA40D401 represents 8/30/2018 03:28:05.3530000 PM (UTC). The digital forensics expert notes a few things here:

  1. The root entry modification date is within the same minute as the internal last modification timestamp of the document. This makes sense, as saving the document via Word would cause the modification date of the root entry to be updated.
  2. The root entry timestamp has millisecond precision although the FILETIME structure allows for higher precision. This is consistent with a Word document in this format.
  3. This value matches what X-Ways reported as “Internal Modification”

Resolution of The Timestamps in The Summary Information Stream

When the forensic examiner looks at the internal timestamps found in the Summary Information stream of the document, she sees the following:

Resolution of The Timestamps found in The Summary Information Stream during Word Forensic Analysis

The timestamps are as follows:

Last printed date: 003C84C7D940D401 (8/31/2018 03:22:00.0000000)
Creation date: 80456D539C4BC701 (2/8/2007 16:15:19.0000000)
Last modification date: 00E0179EDA40D401 (8/31/2018 03:28:00.0000000)

As you will notice in the highlighted digits, both the last printed and the last modification timestamps have minute precision, while the creation timestamp has second precision. While the FILETIME structure allows for much higher precision, in my experience, the three timestamps found in the Summary Information stream of Word documents have minute precision.

The computer forensics expert notes that the creation timestamp found in the Summary Information stream of the Word document has inconsistent precision compared to the other timestamps. This could be because the timestamp was altered outside of Word.

Word Forensic Authentication Findings & Next Steps

The forensic examiner had limited information to work off of in this case. She had to look at the Word document in isolation and attempted to find if there were any inconsistencies. A summary of her key findings is as follows:

  1. The file system modification timestamp of the Word document (9/11/2018) did not match the last modification timestamps found inside the Word document (8/31/2018). Apparently, the file system timestamp was changed after the Word document was last saved. This may have been caused by someone altering the document outside of Word.
  2. The internal creation date timestamp found in the Summary Information stream of the Word document had an inconsistent resolution when compared to other timestamps in the same stream.

At this point, the forensic examiner will want to review the workstation where the Word document was created, modified and accessed. Examining the artifacts found on the workstation, she will attempt to confirm that the Word document was backdated, and find out how.

Conclusions and Notes

First of all, some of the readers might be thinking that this is not the most efficient way to backdate a Word document. It requires quite a bit of technical knowledge and leaves a lot of room for error. You are right! My goal was to highlight certain data structures that may be valuable to fellow digital forensic examiners—not to help the bad guys get more proficient in document forgery.

It is important to note that the mainstream digital forensics tools that I tested (i.e., X-Ways v19.5 and FTK v6.4, as well as freely available, general-purpose tools such as olefile and ExifTool) did not parse the DTTM structures in the Dop. So, when there was a discrepancy between the dates found in the Summary Information stream of the Word document and those in the Dop, I was able to observe the discrepancy only in Word, and by manual examination. It pays to check the Dop against the Summary Information stream if manipulation of the document is suspected.

The various FILETIME structures used throughout the CFB file format have different resolutions by design. For example, the creation, last modification and last print timestamps in the Summary Information stream of the document have minute precision; the edit time value (GKPIDSI_EDITTIME) found in the Summary Information stream of the document represents a duration in hundreds of nanoseconds rather than a full date; and the modification timestamp of the root entry has millisecond precision. It is important to be familiar with the resolution of each timestamp for word forensic authentication.

AppVersion and OSVersion are data points that can be helpful in identifying discrepancies in a Word document. For example, if the OSVersion points to Windows 10, but none of the timestamps in the document are after 2013, you might want to take a closer look at that document.

Finally, if you perform Word forensic authentication, I strongly recommend that you get familiar with the Microsoft specifications listed below as references. You will see that the CFB file format contains a ton of interesting information which could be valuable in your next investigation. You can also manually verify values parsed by your forensic tools, and find evidence that’s beyond what mainstream tools are able to report.

For example, comparing the outputs of X-Ways v19.5 and FTK v6.4, I found that FTK did not report the AppVersion or the root entry modification date, and reported the OSVersion of the Word document as 6.1.2, which is not entirely correct as the “2” at the end refers to the OSType, not version. Such interpretation issues can quickly be cleared up if you are able to write a quick script, or fire up your hex editor and take a look for yourself.

References:

  • Word (.doc) Binary File Format [MS-DOC]
  • Office Common Data Types and Objects Structures [MS-OSHARED]
  • Compound File Binary File Format [MS-CFB]
  • Object Linking and Embedding (OLE) Property Set Data Structures [MS-OLEPS]

About The Author

Arman Gungor, CCE, is a digital forensics and eDiscovery expert and the founder of Metaspike. He has over 21 years’ computer and technology experience and has been appointed by courts as a neutral computer forensics expert as well as a neutral eDiscovery consultant.

Walkthrough: Analyze DI Face Detection Recognition

$
0
0

Let’s check out the new features of the Face Detection within Griffeye Analyze DI. Make sure in the Analyze Forensic Market you have the Face and Video utility pack both activated before you create your case. Once you do so, we can go ahead and create a new case, and bring in our data. I’m going to call it the ‘Training Case – Faces’. Bring in the folder containing the images in our investigation.

Now, when you bring the case in, make sure that you have the Face Detection and Recognition turned on in the video options and, after the import is done, make sure that you have the face detection app checked so that it can run after it ingests the data. Now, Griffeye is going to go through its normal ingestion process, analyzing and doing what it needs to do, and once this process is done, it will begin running the face detection app that we can then use later, as you can see here.

Now realize, this may take a little bit of time, as it has to go through all the files and detect faces in each and every one of them.

Once the face detection has been completed, notice in the grid view that there’s an additional column now for the number of faces detected in each file, which we can sort by if we’d like. I’m going to sort this by least faces to most, scroll back up, and now I’m going to select an image with 16 faces in it, go back to the thumbnail view, and I’ve sorted by the amount of faces.

I can also filter down to files that have faces, files that have more than one face, or I can select all the files that don’t have faces. If I locate a file within my case that contains a face and I want to search for additional faces that are similar to that face, I can right-click, go to search, and then, Similar Faces. And this will bring me to the search results and show me all the files that it has found similar faces in, including video.

Let’s take a look at the video that the face was detected in a little more closely. I go to the File View, select Video Player, and I can filter this video down to just the segments that show the face. I can actually select a specific face if I’d like. This is the one it’s found in the video, so I’m going to select it. And notice that the play controls now have changed and I have a filtered view. The video is now playing only the segment that contains the face that I ran the search for.

I can also perform a similarity search based on that face by clicking on Video Chart, and I’m going to locate where in the video I want to perform a similarity search – right about here – and very similar to an image file, I can just select the area of the face and then click the button that says search for similar faces. And when it does so, it’s kicked me back out to the search view and has shown me the similar face results.

I can also perform an external image search for a similar face in the Search tab, making sure I have my search method checked to Similar Faces, and then I’ll add the external image of the face I’m looking for. It then runs the similarity search and gives me the results of all the files within the case that contain that face. Another way is dragging a file externally into the file view, and then using the selection tool to zoom in on the face, and then selecting the box to search for similar faces. Again, that sends me back to the search view, and now it’s shown me all the files, including videos, that have similar faces.

We can now create a visual diagram in Analyze Relations of any file selected based on similar faces found. We right-click, open Analyze Relations, and from here, when we select the Relations wheel, we have the option of Face. This will display a visual diagram of all the faces related to this particular face, and then we can expand it out, and continue on, and relate other files as necessary.

Analyze DI helps investigators work through complicated cases by improving triage. It makes it easy to import, process and review large amounts of information. Find out more on Griffeye’s website.


Opinion: Is ISO17025 The Right Standard For Digital Forensics?

$
0
0

by Rich2005

Standardisation is currently the subject of animated discussion among digital forensic examiners worldwide. In this opinion piece, Rich2005 looks at the challenges of the ISO17025 standard for digital forensics and why it might not be the best choice for the field. Please note that the views contained within this article are the opinions of its author and do not necessarily reflect the views of Forensic Focus.

In my opinion ISO17025 is a dangerous standard. It gives the illusion of accuracy and reliability, whilst in the real world it may actually lead to poorer results via static process-following, bulk-evidence production, and the assumption of reliable results at the expense of properly considering the complexities of individual cases.

You only have to read the changelogs for all the main forensics tools, which come out daily/weekly/monthly, and will have passed someone’s ISO testing, to know that they were never as reliable as that “tested” tool was purported to be. In fact, whatever baseline testing you do, there’s no assurance that just because you’ve tested it once, and it has passed, that it will do so the second time, with the same set of data, or a different set of data. 

To use a standard laboratory example: testing for the presence/value of one single marker, with one single method, you might have a huge raft of factors to consider which might influence the test. This testing for that single marker could well have an entire book’s worth of information documenting the processes, potentially influencing factors, tolerances, and so forth. This might include reams of data on testing in order to back up why this test and process can be used reliably, in mass-processing, and what the tolerance levels are for validity, trust and reliability.

The problem is that digital forensics tools are almost never testing for one value or marker in a data set that rarely changes in structure.

Instead we are often testing for a huge number of values, markers or structures, using logic which is often the best guess of a programmer based on known information. The source code of the program generating the value/marker/structure may not be available. We might be testing against a data set that continually changes in structure, and then have to try to interpret and present the results in an intelligible form!

On top of that, these values will regularly change as the originating programs are updated, hardware is updated, firmware is updated, and so on and so forth. Let alone any “cross-talk” from other programs or applications that might use or modify the generated structures subsequent to their creation.

You can be almost certain that any tool that is being used under ISO17025 certification right now has flaws in it that will not be detected by the limited testing people will do on it, simply to acquire the certification. Proponents of ISO17025 would say that finding some flaws is better than finding none, and to an extent that’s right, however in the real world there is a cost in doing this.

This cost is threefold: time, money, and potentially accuracy. The time and money aspects are relatively obvious, but the expense of accuracy might come because it leads to tools not being used until they are verified. That verification will likely not come immediately, and if there’s work that does need to be done immediately, and cannot wait for verification, then it could result in an older version of a tool being used. This version might contain bugs that a later version fixes, but if the latest version hasn’t been verified then the examiner would be forced to use the older version. Of course, it’s always possible that a new version of a tool introduces its own errors, but generally speaking it would seem logical that a more recent version of a tool is going to have fixed more problems than it has created. In my view it’s therefore a giant waste of time, effort, and money to generate the ridiculous mass of documentation ISO17025 requires, and get people into a rigid process-following mindset.

I would bet my house on the fact that the limited ISO17025 testing that every lab supposedly following it will do, which costs huge amounts of time and money, would still find very few errors, if any; and if it does, most would be the sort of obvious error/failure any vaguely competent examiner would have spotted anyway. Ultimately the field of digital forensics is so vast and complex that no results from a tool should be treated as if they are 100% reliable.

Instead of trying to try to prove that something inherently unreliable is reliable and trustworthy, the focus should be on how you could really have a better degree of confidence in any evidence produced. 

In the court case context, the best chance of spotting deficiencies in digital evidence produced by the prosecution is by a defence examiner reviewing the work (and vice versa). Therefore cutting down on legal aid and the time a defence examiner might get to assist a client, whilst ramping up things like ISO17025, is either mad or negligent.

The only benefit will be potentially to save government money at the expense of people in the justice system whilst punishing small companies and individual experts without massive budgets, for whom the cost of ISO17025 is disproportionately exorbitant in comparison to the monetary value of work they do. I say ‘monetary value of the work they do’ because many of the finest experts in this field – and I would venture many fields – often run their own small businesses, and aren’t part of a large corporate entity that might hope to spread the exessive cost across huge volumes of work, and indeed win it purely on the basis of ISO17025 certification, ahead of others who can’t realistically afford it or justify the cost.

Regardless of ISO17025, this field is always liable to be significantly at the mercy of an individual examiner’s skill and experience, and equally importantly the time they’re able to work on a job, or limited process they’re under instruction to follow.

For a long time there’s been a “race to the bottom” in digital forensics, and many perfectly competent examiners will be caught between the desire to investigate a case thoroughly and their employer wanting to turn a profit, perhaps limiting their time allowed to investigate the job, limiting the scope of the job, forcing them to adhere to a strict process for the purposes of ISO17025 or templated sales documents/contracts, and so on.

To use an example: certain jobs where possession of material is illegal might well have processes applied to the digital data to identify the illegal material, whether manual or automated, and then report the results, along with associated details of the systems etc. Of course we’d all want the who/what/when/where/why/how detailed in the report as much as possible, but bluntly this does not always happen (and forensic examiners simply do not get always unlimited time to investigate each item – I’d argue it’s not uncommon that they don’t get sufficient time to take a significantly well-rounded look at the case that a criminal matter might justify).

In my own personal experience I’ve had cases where it’s taken little more than an hour of reading the case papers and looking at a forensic image to be confident there’s no case to answer for a defendant (and seeing the case subsequently thrown out within minutes of the court date starting). This wasn’t because the original examiner did anything “wrong” or “in bad faith”. However they simply reported the presence of material, some supporting equipment/OS details, and the case progressed all the way to attendance at trial. At great expense to the state, no doubt, and undoubtedly distress to the accused.

To try to give an analogy in a fictitious case: let’s say someone’s murdered their wife by poisoning her.

The prosecution examiner gets given a couple of days to examine the computer and report their findings (not much time, I know – but don’t be surprised how little time might get spent on a job these days!) They process the computer in various tools, and after initial reviews don’t find anything of particular relevance, but then come across an ebook named “How To Murder My Wife” and another called “A History of Chemicals and Poisons”, both in the ebooks folder of the user directory for the main suspect.

So, this obviously gets flagged up immediately, and reported upon. It comes to court, this is presented, and along with other weaker circumstantial evidence from the prosecution, the individual is convicted, with (for the sake of argument) no defence expert allocated to examine the computer, and the suspect denying knowledge of these books or of having read them. Had that examiner been given slightly longer to examine the case, they might have presented different findings.

Upon appeal the suspect gets a defence examiner, who confirms the presence of the relevant ebooks, and confirms that they were in the ebooks folder, along with 10,000 other ebooks, many of which had likely been extracted from a single zip file, downloaded via a torrent program. The link to the torrent is identified in an email from one of their friends, saying there was a good book in there about car repair, and that they should have a look. The book about car repair is one of the few ebooks that had been more recently accessed and had registry evidence of being viewed. The poison and murder books had no such evidence. In light of this there’s a retrial and the accused is subsequently acquitted.

I am absolutely certain that if people do not get defence experts when they’re faced with computer-based evidence, then there WILL be miscarriages of justice. Or if defence experts carry out examinations to the letter of their request, in order to stay within their allocated time, the same issue will arise. Miscarriages of justice, or “close calls”, can easily occur, not just through bad intentions but also by simply not being allocated enough time to do a sufficiently thorough job.

Spending more money on things like ISO certification while there is immense pressure on forensic companies, forensic units, and legal aid, is like spending £100,000 on a sticker saying “seaworthy” for your boat because you’ve had someone verify that you’ve got a written process that you follow for maintenance, whilst the boat has a big hole in the side and is taking on water.

Obviously, getting an opposing expert is not a solution to all problems with digital evidence, and is only an extra layer of safety/verification, however I’d say it’s an infinitely more important one than ISO17025 will ever be (certainly by a factor of 10 or 100 times more important). As is providing sufficient time for an examiner to complete a competent job, rather than a brief tick-box job.

If they really do want to continue the obsession with testing tools, then they should set up a central body to verify forensic tools. They should then report on issues they identify publicly, so that everyone is aware, and can compensate for this whilst waiting for a manufacturer fix, or working around the issue.

I see little value in the rigid process documentation, or setting out decision trees a mile long, all which would have the end point of essentially trying to comply with the ACPO principles and produce best evidence with the least modification possible, whilst being documented.

They should scrap ISO17025 for digital forensics. It’s not fit for purpose, and simply will never achieve any meaningful degree of improvement in the reliability of evidence in this field. If they really want to prevent miscarriages of justice, then there are tangible things they could do to improve this rather than a pointless, expensive certification that hardly anyone in the field has any faith in.

What do you think? Do you agree with Rich2005 that strictly adhering to standards such as ISO17025 could lead to miscarriages of justice? Share your thoughts in the comments below, or if you’d like to submit your own opinion piece, you can email it to scar@forensicfocus.com.

The views contained within this article are the opinions of its author and do not necessarily reflect the views of Forensic Focus. 

Walkthrough: Oxygen Forensic Detective Latest Features

$
0
0

Within Oxygen, you’re able to not only connect one device, but several devices, and image them simultaneously. Oxygen’s extractor runs independently of Oxygen Detective, and that’s what allows you to run several different extractions at the same time, and there is no limit other than what the machine you were using will allow.

So again, with Oxygen Extractor, you’re able to open them up and have several open at the same time, and begin multiple device acquisitions. One of the other things that we’re able to do with our imaging is with drones, specifically DJI drones. We can get a full physical from a DJI drone by simply connecting to the device, applying an exploit, and then extracting the physical image from that drone via the micro USB port that is on the drone itself.

Oxygen was the first to start extracting data from cloud services, and now, with that, just like device extractions, we can now have multiple cloud extractions going at the same time. So, just like with devices, you’re able to extract data on several different cases if they all involve cloud data. And you again are only going to be limited to what machine you are using and the capabilities of that.

Next, I’m going to talk about one of the newest programs that are available within Oxygen Forensic Detective is our KeyScout program, which allows you to go on to a machine and extract all of the usernames and passwords. And in addition, you’re able to get Wi-Fi names and passwords as well. You can either start KeyScout on a machine that you’re operating on or you can add it to a removable media device and plug it into a target machine, and then run it, and it’s going to produce all the usernames and passwords.

So, when it’s up and running, if we hit Search, it’s now going to search across this computer and identify all of the usernames and passwords. Now, this is going to use all of the major browsers, like Chrome, Firefox, Internet Explorer, so not always is this going to be what was specifically typed on this machine, but could be carry-overs from other accounts that those browsers may use. And with this, you’re going to get tokens or passwords, and the program will tell you if it was a token or a password, and then again, you can also click on the tab the Wi-Fi access points, and this is going to give you all of the Wi-Fi that has been connected with the passwords. And again, this doesn’t always mean this computer was necessarily connected to these Wi-Fi access points, but the user was connected to these Wi-Fi access points, and that could be from a number of different devices as well.

Now, with Oxygen KeyScout, we would then save this file into what’s called an Oxygen credentials package. We save that, and then we’re able to take that file and bring it into our cloud extractions, and go across and search for all of the possible services that these usernames or passwords may work for in getting cloud data.

Within Oxygen Forensic Detective, there are several ways to go about extracting data from the cloud. You can open up Oxygen Forensic Cloud Extractor and you can start a new extraction, you can import that credentials package that you would have created using Oxygen Forensic KeyScout, you can extract iCloud tokens from a Windows PC if they’re … with the Windows application installed. You can search for credentials on a target computer within the extractor. And then the fifth option is also to be able to decrypt WhatsApp backup files that maybe you have found on an SD card or an internal memory of an Android device.

So, if we open up a new extraction, this is where we’d enter in all of our case details. Once we do that, then we’re able to select which platforms we want to extract data from, based on what usernames and passwords we have, or tokens. And all we have to do is simply click, add credentials, type in the username and password, hit Apply. And then it’s going to try to go out and validate that username and password, or, if you have the token, to try to extract that data.

So, if we go out, enter in our target account, and then enter in the password, we apply that – it now puts an indicator that we’re searching for one account on Facebook. And we can do this for any of these accounts that are available for cloud extraction. We hit Next, this is where it goes out and tries to validate our credentials or that token. If we get the green checkmark, it’s going to say … that means it’s good to go. And then, here we can also add any other additional cloud services, now that we know that that username and password does in fact work.

When we hit Next, it’s going to show based on what services we’re trying to extract from. So, this being Facebook, it’s going to show us all the different categories of data that could be available from Facebook cloud. Also, one of the important things that we can do is we can set a date range. And this could be very important for use on the criminal side, being if they’re very limited to the date range based on the incident that took place; or if this is on a civil case, what potential date range is important. Being able to go out and extract only the date range that is pertinent to your case. And with a lot of these cloud services, you definitely would want to put a date range, because there potentially could be years and years’ worth of data.

Now, that was one way to access cloud data. If you’ve extracted a device, whether it’s an Android or an iOS device, you potentially are going to have cloud accounts available on there. Within Oxygen Detective, it’s going to identify those cloud accounts for you. When you go in there, it’s going to tell you the service, the account with the usernames, and whether we were able to extract a password or a token.

Now, from here, we can either save this account data – and that’s going to save it, just like KeyScout did, into an Oxygen credentials package that we can then later import into our cloud extractor. Or, we can, right from here, extract with our cloud extractor, and what that’s going to do for us is it’s going to put in all of the case details for us, as you can see. And then, we can change this data or we can add data to it based on our case.

We hit Next. Now it’s going to automatically go out and identify all the potential services that we can extract data from. And again, it’s going to identify whether it was a username and password or if it was an authentication token. Once we have that, we can now hit Next, and it’s going to identify it, by the numbers, indicating how many accounts we’re possibly going to get with these different services.

Now, at this point, if we knew a username and password for some of these other accounts – for example, Dropbox or OneDrive – we could then enter in those credentials now as well. And again, if we hit Next, it’s going to take us to that same menu, where it’s going to identify and validate these accounts. Once they’re validated, we can then go into the extraction process, which is going to identify what data types we’re going to get from these different services. And also, we can then select our date range as well.

Also, one of the big additions to Oxygen Forensic Detective is for law enforcement, who utilizes Grayshift’s tool, the GrayKey – we can bring in those images as well by simply navigating to the Import GrayKey Image, and then being able to parse all of that great data that we’re able to get from a GrayKey.

ICDF2C 2018 – Recap

$
0
0

This article is a recap of some of the main highlights from the ICDF2C conference 2018, which took place in New Orleans, LA, USA from the 10th-12th September.

The program began on Monday 10th September with the usual welcome registration. The conference was held at Chateau LeMoyne in New Orleans’ French Quarter: a beautiful hotel complete with pool and resident terrapins!

Once attendees were registered we gathered in the conference room for the opening keynote address. Given by Dr. Deborah Frincke of the National Security Agency, it talked through some of the NSA’s techniques within the realm of digital forensics, and how cooperation works both within and between agencies in the USA and abroad. It was interesting to hear about the research and analysis conducted by such an important body, although of course there was a lot left unsaid.

An important point that came out of Dr. Frincke’s discussion was that it’s very easy to say intelligence agencies should share more, but if you have everybody sharing at all levels it ends up being too chaotic. It is therefore important to work out what counts as a ‘need to know’ and what is a ‘need to share’. This cycle needs to be constantly updated.

Following a coffee break, we reconvened for the next sessions, which were focused on data carving and hiding. I liked the way the sessions were grouped together, with two or three talks on the same topic following each other. It helped to keep things on track and meant that often the talks complemented each other really well.

We began with research on linear function detection approaches for memory carving, from Lorenz Liebler & Harald Baier of the University of Applied Sciences Darmstadt. This was followed by Thomas Göbel demonstrating fishy, a new framework for implementing filesystem-based data hiding techniques.

Monday afternoon was devoted to workshops by Riscure, which allowed attendees to get some hands-on experience.

The gala dinner at the nearby Royal Sonesta Hotel allowed attendees to network over dinner and continue discussing the topics that had been brought to light in the talks during the day.

Tuesday morning’s keynote was given by Golden G. Richard III, who talked about memory forensics and strongly recommended The Art of Memory Forensics by Case, Levy & Ligh for anyone who is looking for an in-depth walk through the topic. Richard highlighted the importance of memory forensics to the field as a whole, saying that ‘memory is the new hard drive’ and that anyone who isn’t yet au fait with memory forensics techniques is already falling behind.

Automation and machine learning were looked at as possible aids to forensic investigation, but while acknowledging their utility the speaker warned against leaning on them too heavily. It is important to remember that these are helpful tools, not catch-all solutions.

Following this session we saw two papers discussing Android forensics. The first, If I Had A Million Cryptos: Cryptowallet Application Analysis and A Trojan Proof-of-Concept, looked at the forensic analysis of cryptocurrency; and the next session focused on AndroParse, a new Android feature extraction framework and dataset from a team at the University of New Haven.

New Haven’s Cyber Forensics Research & Education Group publish a lot of interesting research, which you can access here.

Following lunch, Atola Technology gave a presentation and brief demo talking about damaged drives and other challenges facing digital forensic investigators today. They showed how their TaskForce tool can help to image damage drives and to deal with cases where several drives need to be imaged at once.

The next three sessions followed on from this theme, looking at common challenges in digital forensics and how they might be addressed. Hassan Hadi Latheeth Al-Maksousy and Michele C. Weigle presented a paper on hybrid intrusion detection for worm attacks, and then Vikram Harichandran from MITRE took to the stage to introduce CASE, which is quickly gaining popularity among forensic investigators. CASE stands for Cyber-investigation Analysis Standard Expression and looks to create an ontology for practitioners; you can find out more here.

Andrew Case then discussed the rise of memory forensics and reiterated how important it is becoming, especially in the face of modern threats.

“If you’re working in incident response, but you’re not getting a memory sample and doing memory analysis, there’s really no point.” – Andrew Case

The Best Paper awards were given out in the early afternoon of the second day, and this time the award went to two winners: Lorenz Liebler & Harald Baier for their work on memory carving, and Trevor Haigh, Frank Breitinger & Ibrahim Baggili for their paper If I Had A Million Cryptos: Cryptowallet Application Analysis and A Trojan Proof of Concept.

Forensic readiness was the overarching topic of the final three sessions of the day. First of all researchers from the University of Pretoria showed a readiness framework for ransomware intrusion, and then Raquel Tabuyo-Benito, Hayretdin Bahsi and Pedro Peris-Lopez’s paper looking at the forensic analysis of an online game on the Steam platform was the subject of discussion. The day ended with Jieun Dokko from Texas Tech University demonstrating a digital forensic investigation and verification model for industrial espionage.

On the final day we spent some time talking about developments that would be of use to the digital forensics world. Ibrahim Baggili talked about the need for more conferences where people sit around on circular tables, talk about what’s happening in the industry and work through potential solutions. A lot of conferences focus either on vendor demonstrations or academic research, without necessarily taking into account important current developments and creating working groups that could go off and do some good in the field.

Other suggestions for improvements to digital forensics as a whole included making changes to education, such as bringing together more multidisciplinary strands into computer forensics courses, and making available grants for students who want to present at conference but lack the funding to do so. You can read more suggestions from researchers at the University of New Haven here.

Multi-item passphrases was a hot topic of discussion on the last day of the conference, with Jaryn Shen from Nanjing University talking about public misconceptions of computer security and how they impact on user privacy as well as digital forensic investigations. More than 10% of users select one of the top 100 passwords, making their accounts much less secure. But how do we make passwords easy for users to remember, but hard for others to guess? This and other questions were discussed in the last session of the conference.

Next year’s ICDF2C conference will take place in Milan, Italy. Keep an eye on the website for more details – see you there!

Findings From The Forensic Focus 2018 Survey

$
0
0

Earlier this year, Forensic Focus conducted a survey of its members to find out a bit more about them, their roles in the industry, and common challenges facing digital forensic practitioners today. Below is a brief run-down of the results.

First of all, some demographic details. The majority of our members are situated in either the USA (36%) or the UK (22%). Other countries represented include Australia, Belarus, Belgium, France, India and Poland. 89% of respondents were male, and 11% female.

Law enforcement was the most popular sector, with 39% of respondents; slightly behind it at 35% were people working in the corporate sector. Among those who answered ‘Other’ were retired people, consultants, and individual freelancers. The vast majority stated their position as ‘Analyst’, with ‘Technician’, ‘Director’ and ‘Manager’ closely behind.

We also asked respondents about how they had entered the field in the first place. The most popular answers to this question were ‘After studying a related discipline’ and ‘Career move from law enforcement’. Several people reported having ended up in digital forensics almost by accident, with quotes like “After handling incidents in the past, I inadvertently created a new role for myself” being fairly common.

Moving on to the challenges faced by digital forensics examiners, the most common was encryption and anti-forensics techniques. The volume of data in each case was another important challenge, as were a lack of training and insufficient funding or resources.

ISO 17025 being a hot topic in digital forensics at the moment, we also included a few questions about this and its usefulness in investigations.

Interestingly, 42% of respondents said their organisations were not planning to attain ISO 17025 accreditation, with only 12% giving a definite ‘Yes’. However, 62% said they either agreed or strongly agreed with the statement “A formal means of standardisation is necessary for the digital forensics community”, demonstrating that the need for standardisation as a concept is agreed upon within the industry, but perhaps ISO 17025 might not be the best way to achieve this.

Only 1.75% of people believed strongly that ISO 17025 would help their organisation’s processes or prospects, and only 2.7% said they thought the standard covered all necessary aspects of digital forensics standardisation.

We then asked people to share their thoughts on ISO 17025 in particular, and standardisation in general, in a freeform comment box. Common responses included:

  • ISO 17025 is too expensive and this money could better be spent elsewhere, for example on training.
  • Tool vendors should be responsible for validation, rather than each group of users having to do so independently.
  • In the UK, police forces are being given different advice about, and assessments for, ISO 17025, which seems to defeat the object of standardisation.
  • Digital forensics moves too quickly for a standard such as ISO 17025 to keep up.

Some of the more in-depth comments included the following.

“ISO 17025 should have been driven from the centre and should not each force an organisation to spend considerable time and effort to get to a place where it is obvious people need to be employed simply to be administrators and checkers. At the moment valuable time is spent not processing case work but checking others’ work or following a tick-box regime rather than empowering people to think for themselves, solving problems in a logical way appropriate to the investigation in hand.”

“It is being massively interpreted across the public sector. It is supposed to set standards, however, to reach those new standards, inconsistent procedures are being put into place. ISO is seen by many as purely a money-making exercise and is not respected by a lot of colleagues. Where law enforcement is concerned, it has massively increased the time taken to examine an exhibit, with little or no benefit in return.”

“It is liable to create too much emphasis on having the accreditation, which organisations are spending an obsessive amount of time on, in turn neglecting the core role of doing digital forensics. As long as protocols are adhered to within the law of the land then that should be sufficient. The evidence test in a courtroom will NOT be whether you have the ISO standard! A digital forensic investigator whose organisation has ISO will likely achieve same/similar results to a DFI who does not have ISO.”

The final section of the survey allowed respondents to share their views about digital forensics in general, and to talk about any important points that had not come up so far. This drew some interesting responses, including some people discussing how most digital forensic events are catered towards criminal rather than civil matters:

“I would like to see some more distinctions drawn between forensics as it applies to criminal vs. civil matters. Every time I attend an event, I’m struck by the dichotomy, and frankly, how little of what is discussed applies in the civil sphere – to the point that it’s close to being a waste of time. I guess what I’m trying to say is that they’re very different areas, and next to nothing is catered to the civil side.”

The high cost of forensic tools was also a point of contention, with one respondent pointing out that this cost gets passed on to the client, meaning that fewer people employ forensic analysts than perhaps should.

Gender bias in digital forensics, which has been the subject of several talks and panel sessions at recent conferences, came up as a challenge in the survey.

“Still experiencing gender bias when asking for training dollars, the men typically get approved throughout the year, the women typically receive a single approval each year. It’s maddening.”

In summary, then, it seems digital forensics still has quite a way to go in several areas, from standardisation to gender bias. But on the whole people had positive things to say about the industry, and work is being done in several different areas to address such challenges as triage, encryption and accreditation.

Techno Security TX 2018 – Recap

$
0
0

This article is a recap of some of the main highlights from Techno Security TX 2018, which took place in San Antonio, Texas from the 17th-19th September.

The conference had four tracks: forensics; information security; audit / risk management; and investigations, along with sponsor demos. Forensic Focus attended the forensics and investigations tracks during the event.

Monday September 17th

Magnet Forensics’ Jessica Hyde opened the conference with a discussion on the proliferation of devices. With 20 billion connected devices projected to be online by 2020, this is a growing concern in the industry. And considering that the results of our latest survey show that data triage is one of the biggest challenges investigators face, it’s certainly a topic that requires attention. Hyde also mentioned the importance of verification and validation in the industry.

But how do we measure success in machine learning? This handy slide was a useful reference point.

For anyone who’s still not sure about the potential applications of machine learning, some of those featured in this talk included identifying things that fell outside the norm, for example finding unique patterns in network activity which may help you to find a bot or intrusion.

Presenters from Digital Intelligence then took to the stage to talk about the forensic analysis of cloud storage. Increasing amounts of personal and professional data are stored in the cloud, with options such as iCloud, Dropbox and Box.com growing in popularity. The reliability of dates and times was one common challenge encountered when forensically analysing cloud accounts, but we also learned some of the positive applications of cloud forensics.

Retired FBI agent Matteo Valles then spoke about how every industry is affected by theft of trade secrets, and how many stakeholders don’t take this area seriously enough. Sometimes people don’t even realise their companies have trade secrets, but they all do; it simply means whatever information you don’t want your competitors to have. This could be anything from new machine learning algorithms to your customer list. Other examples of trade secrets include software, source codes, research & development data, product specs, prototypes, future products, marketing plans, recipes, algorithms, merger & acquisition plans, customer lists, pricing info, suppliers & vendors, and formulas. As you can see, the area covers a multitude of subjects! And with instances of FBI cases concerning trade secrets having increased 100% since 2009, it’s very important to make this a priority for your business.

Nick Drehel, the VP of Training at AccessData, ran a session demonstrating how their Quin-C tool can help accelerate investigations using collaborative methods. Quin-C speeds up data processing and analysis time and has a flexible, customisable user interface that aids investigators particularly in larger cases.

Tuesday September 18th

Abdul Hassan’s talk on counter terror analysis using social media was as popular at this chapter as it was in Myrtle Beach; if you get a chance to see it, it’s well worth going along. Hassan’s vast experience in the industry makes this a fascinating delve beneath the surface of counter terror investigations.

ADF’s Richard Frawley spoke about best practices for on-scene investigations, which can be hampered by non-technical investigators who may not understand how to best preserve digital evidence. Triage once again came up as a challenge:

Jamie McQuaid from Magnet Forensics showed attendees how AXIOM can be used in fraud investigations. Now that fraud, like most other industries, has gone digital, it can be harder to trace what’s been happening and ultimately end with a conviction. One of the main ways investigators find the details they need is by uncovering a hard drive or storage device that was previously unknown to their client; but knowing where to look and what to look for is paramount.

At 10.30am a panel discussion was held on the topic of women in cyber security and digital forensics. This was a fascinating discussion in which five women talked about their experiences in the industry, how to address the challenges faced, and how to encourage the next generation.

Some of the excellent advice given included:

  • Always get a mentor – and not just a mentor, but multiple mentors, because they’ll help you to get out of your own head and figure out the right path.
  • If you can’t achieve something, have a plan B, plan C, plan D, and keep going until you reach something you can do.
  • If you don’t wake up jazzed in the morning about where you’re going [in life], then it’s time to move on.
  • Find yourself, where you belong, what works for you, and go with it. Be honest with yourself.
  • Remember that it’s OK to ask for mentorship, and it’s OK to offer it.

This marks the second recent discussion about women giving back in DFIR, after the Women In Forensics lunch which Forensic Focus co-hosted with Magnet Forensics at DFRWS in Rhode Island. It’s heartening to see this trend towards inclusion in the industry, and that so many people are enthusiastic about finding out how they can give back and help the next generation. Jessica Hyde wrote an excellent article back in July detailing some of the practical ways in which you can help.

In the afternoon Jamie McQuaid took to the stage again to discuss mobile device investigations and how to look for data on the mobile devices of the future. One of the main issues he pointed out was the number of investigators who don’t understand what their tools are doing, so are stumped by new updates to operating systems or apps.

McQuaid also pointed out that not all images are created equal. It is very important to understand what is included in the image you’re currently acquiring; both app-level and device-level encryption will have a significant impact, but things like this are overlooked surprisingly often. He also highlighted the importance of not being dependent on one single tool but instead having a number of different options in your arsenal, since no one tool can possibly keep up with all the device and application updates that are constantly being released.

Encryption was the number one challenge mentioned by respondents to our survey, and it’s partly due to a lack of knowledge about encryption techniques that people miss out on data they might otherwise have been able to acquire.

There are two different types of application-based encryption you may encounter:

(1) Encrypted databases & files, where the whole file is encrypted, and it can’t be opened with typical tools prior to decryption. (2) Encrypted content, which is a little more forgiving.

When an app says ‘end-to-end encryption’ they usually mean ‘encryption in transit’; there will often still be unencrypted data on the device. It is always worth double checking this when conducting your investigations, because you may be able to acquire more data than you’d thought.

Following on from this, Joe Sylve discussed the importance of snapshots in APFS investigations, which can be viewed as a webinar here if you missed it. Amber Schroader from Paraben then spoke about smartphone processing and the kinds of app data investigators should be able to find.

Wednesday September 19th

In another talk that was previously given at Myrtle Beach, Mark Spencer showed what happens when a high stakes failure occurs in digital forensics. Again, this is a presentation not to be missed: the scale and scope of the investigation in question is fascinating, and its effects far-reaching!

Meanwhile we learned about what data can be gleaned from Alexa and other voice-based assistants, which collect much more data than the average user probably realises. Jason Hale followed this with a talk about USB device forensics and how we can improve this area.

Chuck Easttom kicked off the afternoon sessions by discussing dark web markets and how to investigate them. This took the audience on a whistle-stop tour through some of the most nefarious sites around and gave a brief overview of how it might be possible to catch their owners. He also discussed some ways for investigators to keep themselves safe whilst working on dark web cases.

The final session of the conference looked at G Suite products and how much data can be found on these. Essentially it depends on the level of product bought; G Suite Basic only goes back six months, whereas the more advanced options store data for much longer. However it is still possible to acquire a fair amount of data even from G Suite Basic accounts:

If you do need to look at G Suite data in an investigation, Google’s online API explorer may be helpful.

The next chapter of Techno Security will take place in San Diego, CA from the 11th-13th March 2019. Next year’s Texas event will once again be held in San Antonio from the 30th September-2nd October, and the South Carolina chapter will be right in between these two, in Myrtle Beach from the 2nd-5th June. Find out more, submit your papers and register to attend on the conference website.

Viewing all 196 articles
Browse latest View live