Examining Data Runs of a Fragmented File in NTFS

Jun 03, 2012




While examining an acquired image of a flash drive in a recent case, I came across the need to manually recover a fragmented file from an NTFS formatted volume. I needed to manually perform this process for two reasons. First, I needed to validate my software and be confident that it was in fact producing correct results. Secondly, I wanted to manually replicate the process so that I could develop a deeper understanding of how a fragmented file is tracked by the Master File Table.

The Goal

The plan is to recreate the steps to that will lead to a file becoming fragmented in an NTFS volume. Once we have successfully written a fragmented file in our test media, we will look at its MFT record to examine the data runs contained in the data attribute.

The Test

To conduct our test we will be using a 256MB Flash Drive. Since we are going to be adding data to this media and then examining it with a hex viewer, the first thing that we need to do to prepare this media is sterilize it. Sterilizing a drive is the process of writing a known hex value to every sector of a piece of media so that it can overwrite any and all data that previously resided on that piece of media. For the purposes of this article, I used Active Kill Disk, which is a light, powerful and free utility. While the media was being sterilized, I proceeded to the next step.

I navigated to the Desktop of my Windows 7 computer and created a folder named “Test”

3 test

Inside of this folder, I created three txt files. These txt files will be the files that we will be copying to our test media. The files are named TEST1.txt, TEST2.txt, and TEST3.txt. Each of these files contain 1000 bytes of data. TEST1.txt has 1000 number ones (1). Yes, one thousand of them, one after the other. TEST2.txt, has 1000 number twos (2), and TEST3.txt has 1000 number threes (3).

9 test1

We will copy these files into the media in a specific order. The numbers in the files will aid us in identifying the files when looking at the media through the hex viewer.

4 test_0

Now that the media is sterilized, let’s format it. I pulled the media from the computer and inserted back into a USB port. Within a second or two, Windows 7 asked me to format the media.

1 format

Since this is a test to be conducted on the NTFS file system, I formatted the drive to NTFS. I chose an allocation unit size (cluster size) of 512 bytes, so that the bytes per sector and cluster size would be the same, 512 bytes.

2 format

My computer successfully formatted the drive without errors. The operating system assigned it logical letter G. I right clicked on the media and looked at the properties.

5 formatted

Now that the drive is formatted, we will now begin to write data to the media. Copy the TEST1.txt file from the Test folder and paste into the media. Next, copy and paste the TEST2.txt file onto the media.


At this point there should only be two files on the test media. Here is where it gets interesting. In order to fragment TEST1.txt we are going to add another 1000 ones to the file. Adding another 1000 bytes of data into the file will double its size from 1000 bytes to 2000 bytes. Open the TEST1.txt file in Notepad and add another 1000 ones onto the file, save it and close it.

This is what it should look like.


SIDE NOTE: Notice that Windows now reads the file as having 1.95kb of data. Even though I know that there are exactly 2000 bytes of data in the file, Windows only reads 1.95KB, rather than a rounded 2KB. Actually Windows is right. The reason why Windows reads 1.95KB of data is because there are actually 1024 bytes of data in a kilobyte (KB). From the 2000 bytes of data in the file, Windows used 1024 bytes to make up 1.0 KB. The remaining 976 bytes get divided by 1024, which is 0.952125. Windows now adds 0.95 to the 1.0KB and displays 1.95(KB) of data to us.

Let’s continue with the test. Now, copy the TEST3.txt file from the test folder and paste it into the test media. Now, go back to the TEST1.txt file and add another 1000 ones to the file. Test1.txt should now have 3000 bytes of data.

This is what is should look like.




Our test media is now complete and ready for examination. Let’s look at the media down at the hex level. For the purposes of this article, I used a demo version of WinHex 16.3.

After firing up WinHex and opening our test media as a physical device, I learned that WinHex is reporting that the TEST1.txt file starts in cluster 288 of the media, which also happens to be sector 288. The TEST2.txt file starts in cluster 290 of the media, which also happens to be sector 290. And lastly, the TEST3.txt file starts in cluster 294 of the media, which also happens to be sector 294. Notice that WinHex recognizes that Test1.txt is three times larger than TEST2.txt and TEST3.txt.


10 winhex sectors

Let’s go to each one of the clusters and see what we find.

Cluster 288: First cluster of TEST1.txt


Cluster 289: Second cluster of TEST1.txt


Cluster 290: First cluster of TEST2.txt


Cluster 291: Second cluster of TEST2.txt


Cluster 292: Third cluster of TEST1.txt (Fragmented File)


Cluster 293: Fourth cluster of TEST1.txt (Fragmented File, continuation)


Cluster 294: First cluster of TEST3.txt


Cluster 295: Second cluster of TEST3.txt


Cluster 296: Fifth cluster of TEST1.txt (Fragmented File, continuation)


Cluster 297: Sixth cluster of TEST1.txt (Fragmented File, Final)



TEST2.txt and TEST3.txt were each 1000 bytes in length and each occupied two clusters on the media. TEST1.txt was 3000 bytes and it occupied 6 clusters on the media. When we first wrote TEST1.txt onto the media it was only 1000 bytes in length. At the time that it was first written to the media, it only occupied two clusters. When TEST2.txt was written to the media, it was written immediately after TEST1.txt. We then went back to TEST1.txt and added 1000 bytes, which caused the file to double in length. When the operating system recognized that TEST2.txt was occupying the immediate clusters after TEST1.txt, it had no choice but to write the extra 1000 bytes of data to the next available clusters, which were 292 and 293. This action caused TEST1.txt to become fragmented. A fragmented file is a file, whose file data is written to the disk in a non-contiguous manner, in a fragmented manner, hence the term fragmented file. We then added TEST3.txt and again went back to TEST1.txt and added another 1000 bytes of data. Because the data in the TEST1.txt file was written to three different areas of the media, its MFT record should contain three data runs it its data attribute. Let’s look at the MFT’s data attribute.

Below is TEST1.txt’s MFT record. Highlighted in blue, is the record’s data attribute. Notice the attribute identifier of 0x80000000.


Here is a closer look.

Data Attrib

The attribute in fact contains three data runs. Here are all the data runs.

data runs all

Let’s analyze each one of the runs individually.

First runlist

data run 1

The first runlist has a value in hex of 0x21 02 20 01. The right nibble of the byte 0x21 indicates how many bytes are used to calculate the number of contiguous clusters in the run (1=one byte, that byte is 0x02). The value of that byte 0x02 tells us that the file is contiguous for 2 clusters. The value of the left nibble of byte 0x21 tells us that the last two bytes in the run (0x20 01) will indicate where the starting cluster of the run is located. In this runlist the starting cluster is located at cluster 0x20 01. This hex value of 0x20 01 converted into little endian, 0x01 20, indicates that the decimal value of the starting cluster is cluster 288 for the aforementioned contiguous 2 clusters.

Second runlist

data run 2

The second runlist has a value in hex of 0x11 02 04. The right nibble of the byte 0x11 indicates how many bytes are used to calculate the number of contiguous clusters in the run (1=one byte, that byte is 0x02). The value of that byte 0x02 tells us that the file is contiguous for 2 clusters. The value of the left nibble of byte 0x11 tells us that the last byte in the run (0x04) will indicate where the offset in clusters where the file’s data will continue. In this runlist the offset is 0x04. This hex value of 0x04 converted into little endian is also 0x04. 0x04 is a decimal value of 4. The file’s data will continue at offset 4 from cluster 288, which is cluster 292, for the aforementioned contiguous 2 clusters.

Third runlist

data run 3


The data belonging to TEST1.txt was written to clusters 288, 289, 292, 293, 296 and 297. The file’s data was written to a total of six clusters that were all accounted for by the file’s Master File Table record.

Get the E01 used for this article here.

If this test helped you understand data runs of a fragmented file, and you were able to use it in the course of your investigation, we would like to hear from you. Please post your comments or email the author of this article carlos@epyxforensics.com.

Post by Pete McGovern

Comments are closed.