Examining Data Runs of a Fragmented File in NTFS
While examining an acquired image of a flash drive in a recent case, I came across the need to manually recover a fragmented file from an NTFS formatted volume. I needed to manually perform this process for two reasons. First, I needed to validate my software and be confident that it was in fact producing correct results. Secondly, I wanted to manually replicate the process so that I could develop a deeper understanding of how a fragmented file is tracked by the Master File Table.
The plan is to recreate the steps to that will lead to a file becoming fragmented in an NTFS volume. Once we have successfully written a fragmented file in our test media, we will look at its MFT record to examine the data runs contained in the data attribute.
To conduct our test we will be using a 256MB Flash Drive. Since we are going to be adding data to this media and then examining it with a hex viewer, the first thing that we need to do to prepare this media is sterilize it. Sterilizing a drive is the process of writing a known hex value to every sector of a piece of media so that it can overwrite any and all data that previously resided on that piece of media. For the purposes of this article, I used Active Kill Disk, which is a light, powerful and free utility. While the media was being sterilized, I proceeded to the next step.
I navigated to the Desktop of my Windows 7 computer and created a folder named “Test”
Inside of this folder, I created three txt files. These txt files will be the files that we will be copying to our test media. The files are named TEST1.txt, TEST2.txt, and TEST3.txt. Each of these files contain 1000 bytes of data. TEST1.txt has 1000 number ones (1). Yes, one thousand of them, one after the other. TEST2.txt, has 1000 number twos (2), and TEST3.txt has 1000 number threes (3).
We will copy these files into the media in a specific order. The numbers in the files will aid us in identifying the files when looking at the media through the hex viewer.
Now that the media is sterilized, let’s format it. I pulled the media from the computer and inserted back into a USB port. Within a second or two, Windows 7 asked me to format the media.
Since this is a test to be conducted on the NTFS file system, I formatted the drive to NTFS. I chose an allocation unit size (cluster size) of 512 bytes, so that the bytes per sector and cluster size would be the same, 512 bytes.
My computer successfully formatted the drive without errors. The operating system assigned it logical letter G. I right clicked on the media and looked at the properties.
Now that the drive is formatted, we will now begin to write data to the media. Copy the TEST1.txt file from the Test folder and paste into the media. Next, copy and paste the TEST2.txt file onto the media.
At this point there should only be two files on the test media. Here is where it gets interesting. In order to fragment TEST1.txt we are going to add another 1000 ones to the file. Adding another 1000 bytes of data into the file will double its size from 1000 bytes to 2000 bytes. Open the TEST1.txt file in Notepad and add another 1000 ones onto the file, save it and close it.
This is what it should look like.
SIDE NOTE: Notice that Windows now reads the file as having 1.95kb of data. Even though I know that there are exactly 2000 bytes of data in the file, Windows only reads 1.95KB, rather than a rounded 2KB. Actually Windows is right. The reason why Windows reads 1.95KB of data is because there are actually 1024 bytes of data in a kilobyte (KB). From the 2000 bytes of data in the file, Windows used 1024 bytes to make up 1.0 KB. The remaining 976 bytes get divided by 1024, which is 0.952125. Windows now adds 0.95 to the 1.0KB and displays 1.95(KB) of data to us.
Let’s continue with the test. Now, copy the TEST3.txt file from the test folder and paste it into the test media. Now, go back to the TEST1.txt file and add another 1000 ones to the file. Test1.txt should now have 3000 bytes of data.
This is what is should look like.
Our test media is now complete and ready for examination. Let’s look at the media down at the hex level. For the purposes of this article, I used a demo version of WinHex 16.3.
After firing up WinHex and opening our test media as a physical device, I learned that WinHex is reporting that the TEST1.txt file starts in cluster 288 of the media, which also happens to be sector 288. The TEST2.txt file starts in cluster 290 of the media, which also happens to be sector 290. And lastly, the TEST3.txt file starts in cluster 294 of the media, which also happens to be sector 294. Notice that WinHex recognizes that Test1.txt is three times larger than TEST2.txt and TEST3.txt.
Let’s go to each one of the clusters and see what we find.
Cluster 288: First cluster of TEST1.txt
Cluster 289: Second cluster of TEST1.txt
Cluster 290: First cluster of TEST2.txt
Cluster 291: Second cluster of TEST2.txt
Cluster 292: Third cluster of TEST1.txt (Fragmented File)
Cluster 293: Fourth cluster of TEST1.txt (Fragmented File, continuation)
Cluster 294: First cluster of TEST3.txt
Cluster 295: Second cluster of TEST3.txt
Cluster 296: Fifth cluster of TEST1.txt (Fragmented File, continuation)
Cluster 297: Sixth cluster of TEST1.txt (Fragmented File, Final)
TEST2.txt and TEST3.txt were each 1000 bytes in length and each occupied two clusters on the media. TEST1.txt was 3000 bytes and it occupied 6 clusters on the media. When we first wrote TEST1.txt onto the media it was only 1000 bytes in length. At the time that it was first written to the media, it only occupied two clusters. When TEST2.txt was written to the media, it was written immediately after TEST1.txt. We then went back to TEST1.txt and added 1000 bytes, which caused the file to double in length. When the operating system recognized that TEST2.txt was occupying the immediate clusters after TEST1.txt, it had no choice but to write the extra 1000 bytes of data to the next available clusters, which were 292 and 293. This action caused TEST1.txt to become fragmented. A fragmented file is a file, whose file data is written to the disk in a non-contiguous manner, in a fragmented manner, hence the term fragmented file. We then added TEST3.txt and again went back to TEST1.txt and added another 1000 bytes of data. Because the data in the TEST1.txt file was written to three different areas of the media, its MFT record should contain three data runs it its data attribute. Let’s look at the MFT’s data attribute.
Below is TEST1.txt’s MFT record. Highlighted in blue, is the record’s data attribute. Notice the attribute identifier of 0x80000000.
Here is a closer look.
The attribute in fact contains three data runs. Here are all the data runs.
Let’s analyze each one of the runs individually.
The first runlist has a value in hex of 0x21 02 20 01. The right nibble of the byte 0x21 indicates how many bytes are used to calculate the number of contiguous clusters in the run (1=one byte, that byte is 0x02). The value of that byte 0x02 tells us that the file is contiguous for 2 clusters. The value of the left nibble of byte 0x21 tells us that the last two bytes in the run (0x20 01) will indicate where the starting cluster of the run is located. In this runlist the starting cluster is located at cluster 0x20 01. This hex value of 0x20 01 converted into little endian, 0x01 20, indicates that the decimal value of the starting cluster is cluster 288 for the aforementioned contiguous 2 clusters.
The second runlist has a value in hex of 0x11 02 04. The right nibble of the byte 0x11 indicates how many bytes are used to calculate the number of contiguous clusters in the run (1=one byte, that byte is 0x02). The value of that byte 0x02 tells us that the file is contiguous for 2 clusters. The value of the left nibble of byte 0x11 tells us that the last byte in the run (0x04) will indicate where the offset in clusters where the file’s data will continue. In this runlist the offset is 0x04. This hex value of 0x04 converted into little endian is also 0x04. 0x04 is a decimal value of 4. The file’s data will continue at offset 4 from cluster 288, which is cluster 292, for the aforementioned contiguous 2 clusters.
The data belonging to TEST1.txt was written to clusters 288, 289, 292, 293, 296 and 297. The file’s data was written to a total of six clusters that were all accounted for by the file’s Master File Table record.
If this test helped you understand data runs of a fragmented file, and you were able to use it in the course of your investigation, we would like to hear from you. Please post your comments or email the author of this article firstname.lastname@example.org.