Chapter – 14
Programming for “Raw File” Recovery
Raw File Recovery
There are many specific file types which have some specific sequence or combination of characters written in the starting and ending of the file. We can analyze these combinations easily with the help of any disk editing program. We can also use EDIT command of DOS to study the structure of file in ASCII format.
The specific sequence or combination of character which is present in the starting of the file is usually called the header and the sequence or combination of characters which is stored in the ending of the file is called the footer of the file.
If we have lost our data in such type of disk crash that no FAT or Root Directory information is available to recover the data, we can use headers and footers to search these specific file types. The header indicates the starting of the file of that particular type and the footer indicates the end of file of that particular file type.
Here we are using the raw structure of particular file type to recover the data therefore the recovery technique is called Raw File Recovery. The surface of the disk is searched sector be sector to find the header and footer information.
Although the Raw File Recovery may have a wide area of application, but there are some specific cases of recovery where it may help a lot. For example, by mistake if you have run any data wiping program in the disk which had some important files but till you stop the program, all the information of MBR, DBR, FAT, and Root directory including Operating system files are wiped out.
In such case even format recovery programs may not help you to recover the data. Here you can use Raw file Recovery to recover the files of those specific file types by searching the headers and footers.
Not only this, even you can recover data in such cases, where you have got such a hard disk in which you have deleted all the logical partitions of the disk, recreated the partitions of different size then before and even you have installed the operating system.
Now you get remembrance, that you had some important data in the disk before partitioning and formatting it. If you have just installed the operating system, there are a lot of chances for the file to be recovered.
The factors that affect the performance of Raw File Recovery are, Fragmented data and the amount of data overwritten by some another other data. However you can your self find more and more areas of application for raw file recovery.
The procedure or almost the rules to search the files with raw file recovery program consider the following conditions:
- Search the header of the file or multiple file types simultaneously in the sectors of the disk.
- If header of any file type is found, save the data in a file and check the following four conditions to close and save the file
- The footer of that file type is found
- The another header of the same file type is found
- The header of another file type is found
- No another header or footer for the defined file types in the program is found and the size of the file in which you are storing the data reaches to the maximum size limit, which you defined for the file size, in your program.
The information should be stored in the file including the data of the sectors in which you found the header and footers of the file type.
Headers and footers of some important file types
The headers and footers of some important file types have been given in the table given next. The footers given in the table are either in the end of the file of specified file type or are in the ending Offsets of the file such that you can use them as footers to recover the data.
You can also search yourself for headers and footers, different from these file types, by using the EDIT command of DOS or by using any disk editing tool. I have used the hexadecimal system to represent the information to make it easy to understand.
Extension |
Header (Hex) |
Footer (Hex) |
DOC |
D0 CF 11 E0 A1 B1 1A E1 |
57 6F 72 64 2E 44 6F 63 75 6D 65 6E 74 2E |
XLS |
D0 CF 11 E0 A1 B1 1A E1 |
FE FF FF FF 00 00 00 00 00 00 00 00 57 00 6F 00 72 00 6B 00 62 00 6F 00 6F 00 6B 00 |
PPT |
D0 CF 11 E0 A1 B1 1A E1 |
50 00 6F 00 77 00 65 00 72 00 50 00 6F 00 69 00 6E 00 74 00 20 00 44 00 6F 00 63 00 75 00 6D 00 65 00 6E 00 74 |
ZIP |
50 4B 03 04 14 |
50 4B 05 06 00 |
JPG |
FF D8 FF E0 00 10 4A 46 49 46 00 01 01 |
D9 (“Better To Use File size Check”) |
GIF |
47 49 46 38 39 61 4E 01 53 00 C4 |
21 00 00 3B 00 |
PDF |
25 50 44 46 2D 31 2E |
25 25 45 4F 46 |
Writing a program for Raw File Recovery
The coding of the program for Raw File Recovery of Microsoft Word files (.DOC Extension) has been given next. The program searches for the files in the sectors of the disk and saves the recovered file automatically by creating the name of file automatically.
The path specified by the user to save the files is used as destination path to save the recovered data. If the destination directory does not exist, program can create the destination up to one directory level.
The recovery program given here supports even the large size disks to search and recover the data. The program has been written to search the data in the second physical hard disk.
/* Raw File Recovery Program to Recover the Microsoft Word Files */
#include<stdio.h>
#include<dos.h>
/* Structure to be used by getdrivegeometry function using INT 13H Extension, Function Number 0x48. */
struct geometry
{
unsigned int size ; /* (call) size of Buffer */
unsigned int flags ; /* Information Flags */
unsigned long cyl ; /* Number of Physical
Cylinders on Drive */
unsigned long heads ;/* Number of Physical
Heads on Drive */
unsigned long spt ; /* Number of Physical
Sectors Per Track */
unsigned long sectors[2] ; /* Total Number of
Sectors on Drive */
unsigned int bps ; /* Bytes Per Sector */
} ;
/* Structure of Disk Address packet format, to be used by the readabsolutesectors Function */
struct diskaddrpacket
{
char packetsize ; /* Size of Packet, generally 10H */
char reserved ; /* Reserved (0) */
int blockcount ; /* Number of Blocks to Transfer */
char far *bufferaddress ; /* address to Transfer
Buffer */
unsigned long blocknumber[2] ; /* Starting Absolute
Block Number */
} ;
///// Function to get Drive Parameters \\\\\
unsigned long getdrivegeometry (int drive)
{
union REGS i, o ;
struct SREGS s ;
struct geometry g = { 26, 0, 0, 0, 0, 0, 0, 0 } ;
i.h.ah = 0x48 ; /* Function Number 0x48 */
i.h.dl = drive; /* Drive Number */
i.x.si = FP_OFF ( (void far*)&g ) ;
s.ds = FP_SEG ( (void far*)&g ) ;
/* Invoke the specified function number of INT 13H extension with Segment Register Values */
int86x ( 0x13, &i, &o, &s ) ;
printf("\n Head = %lu, Sectors Per Track = %lu, Cylinder = %lu\n",
g.heads,g.spt,g.cyl);
/* If get drive Geometry function Fails, Display Error Message and Exit */
if(g.spt==0)
{
printf("\n Get Drive Geometry Function Fails....");
printf("\n Extensions Not Supported, Press any Key to
Exit...");
getch();
exit(1);
}
return *g.sectors; /* Return The Number of
Sectors on Drive */
}
unsigned long file_size=0, i=0;
unsigned long start_file=0, end_file=0;
unsigned long Sectors_in_HDD2=0, loop=0;
char buffer[512], filename[80], temp[8];
char path[80];
unsigned int result,num=0;
/* Header of Microsoft Word Files */
char header[10] = {0xD0,0xCF,0x11,0xE0, 0xA1,0xB1,0x1A,0xE1};
/* Footer of Microsoft Word Files */
char DOC_footer[14] =
{0x57,0x6F,0x72,0x64, 0x2E,0x44,0x6F,0x63,
0x75,0x6D,0x65,0x6E,0x74};
/// Start Of main \\\
void main()
{
clrscr();
/* If total no. of hard disks attached is less
then two, Display Error Message and Exit. */
if(((char)peekb(0x0040, 0x0075))<2)
{
printf("\n\n You Must Have At least Two Hard Disks
Attached to your Computer To Run This");
printf("\n Program. This Program has been developed
to recover the Data of Second Hard Disk.");
printf("\n Press any Key to Exit... ");
getch();
exit(1);
}
Sectors_in_HDD2=getdrivegeometry (0x81);
printf("\n Total Sectors in second Hard Disk = %lu",
Sectors_in_HDD2);
printf("\n\n \"You must save the recovered files in
another Hard Disk, Not in the Same Disk,");
printf("\n in which you are searching the lost
data.\"");
printf("\n\n Enter The Destination Path to save the
Recovered Files...\n ");
gets(path);
/* check if destination directory exists or Not */
if(access(path, 0) != 0)
{
/* if Destination directory does not exist, create
the Directory up to one level */
if(mkdir(path)!=0)
{
printf("\n Could Not Create Directory \"%s\"",
path);
printf("\n Check Path..., Press any key to
exit...");
getch();
exit(1);
}
}
strcat(path,"\\Ptt");
/* Function to Hide (and show) Cursor on the screen */
show_hide_cursor ( 32,
gotoxy(15,18);cprintf("[ %d ] Files Recovered...",
num);
/* search for the data until the ending sector of the disk */
while(loop<Sectors_in_HDD2)
{
/* Read one Sector (Sector No. = loop) */
readabsolutesectors ( 0x81, loop, 1, buffer );
gotoxy(19,16);cprintf("Scanning Sector Number = % ld",
loop);
if(kbhit())
{
show_hide_cursor ( 6, 7 ); /* Retrieve the
cursor before
Exit the program
*/
exit(0);
}
/* if specified header is found */
if((memcmp ( buffer, header,7))==0)
{
/* logic to provide the file name to automatically
create the files to save the recovered data */
strcpy(filename, path);
itoa(num,temp,10);
strcat(filename, temp);
strcat(filename,".DOC");
start_file=loop; /* starting sector of file */
gotoxy(5,19);cprintf("File Found..., Saving As %s",
filename);
num++;
////////////// File Close Conditions \\\\\\\\\\\\\\\\
file_size=0;
while( file_size<5000000)
{
loop++;
file_size+=512;
readabsolutesectors ( 0x81, loop, 1, buffer );
gotoxy(19,16);cprintf("Scanning Sector Number = % ld" ,
loop);
/* if file size reaches up to maximum size of 5MB */
if(file_size>=5000000)
{
end_file=loop; /* Ending Sector of File */
Recover_the_file();/* write the data to file */
break;
}
/* if footer of DOC file is found */
for(i=0;i<512;i++)
{
if( memcmp(buffer+i,DOC_footer,12)==0 )
{
end_file=loop; /* Ending Sector of File */
Recover_the_file();/* write the data to file */
break;
}
}
/* if another header is found */
if( memcmp(buffer,header,7)==0 )
{
loop=loop-1;
end_file=loop; /* Ending Sector of File */
Recover_the_file();/* write the data to file */
break;
}
if(kbhit())
{
show_hide_cursor ( 6, 7 );
exit(0);
}
}
}
loop++;
}
////////While Loop Ends Here
/* display message for completion of search and recovery */ if(loop>=Sectors_in_HDD2 )
{
gotoxy(17,23);cprintf("The Saving of files in the Disk is
Completed !!");
gotoxy(17,24);cprintf("Press Any Key to Exit...");
show_hide_cursor ( 6, 7 );
getch();
}
}
The structure geometry is used by getdrivegeometry function using INT 13H Extension, Function Number 0x48 to get the various parameters of the disk.
The structure diskaddrpacket is for Disk Address packet format, to be used by the readabsolutesectors Function.
The Function getdrivegeometry (int drive) is to get Drive Parameters of the disk specified physical drive number drive.
(char) peekb(0x0040, 0x0075) is used to find the number of hard disks connected to the computer, stored at memory location represented by segment 0040H:offset 0075H. If total number of hard disks attached is less then two Display Error Message and Exit.
Sectors_in_HDD2=getdrivegeometry (0x81); finds the various parameters of the second physical hard disk (0x81) and returns the total number of sectors of the disk.
The statement if(access(path, 0) != 0) checks the accessibility of the path given by the user. If destination directory does not exist, the destination is created up to one level and if the given path checked by condition if(mkdir(path)!=0) is illegal, error message is displayed.
The file names of automatically created files to save the recovered data are created such that the first three characters of the files are given PTT by strcat(path,"\\Ptt"); function. It is done so to avoid the duplicate file names in the destination directory. Therefore the file names of recovered files are given in format of “PTTxxxxx.DOC”
The Function show_hide_cursor ( 32, 0 ); is used to Hide the Cursor from the screen where show_hide_cursor ( 6, 7 ); retrieves the cursor back to screen.
The function readabsolutesectors (0x81, loop, 1, buffer); Reads one Sector of the second physical hard disk specified by sector number loop.
If the header of the file is found, start_file = loop; sets the start_file to starting sector number of the file to be recovered. The program follows the three conditions given next, to find the ending sector of the file:
- If file size reaches up to maximum size of 5MB
- If footer of DOC file is found
- If another header is found
The long integer end_file is set to the ending sector number of the file by end_file=loop; if any one condition out of three is satisfied. Now the data of the sectors, starting from sector number start_file to sector number end_file is saved to the file with the function Recover_the_file( ).
The coding of the function Recover_the_file( ) has been given next:
/* Function to save the data of the sectors starting from sector number start_file to sector number end_file */
Recover_the_file()
{
FILE *fp;
if((fp=fopen(filename, "wb"))==NULL)
{
gotoxy(10,23);printf("Error Opening File %s",
filename);
getch();
exit(1);
}
for(i=start_file;i<=end_file;i++)
{
gotoxy(19,16);cprintf("Scanning Sector Number =
%ld", i);
readabsolutesectors ( 0x81, i, 1, buffer );
fwrite(buffer,512,1, fp);
}
fclose(fp);
gotoxy(15,18);cprintf("[ %d ] Files Recovered...",num);
gotoxy(5,19);cprintf(" ");
return;
}
The coding of the function readabsolutesectors has been given next. The function uses the INT 13H Extension and function number 42H to read the sectors.
For the detailed description of the function, refer the chapter “Making Backups” discussed earlier in this book. The coding of the function is as follows:
//// Function to read absolute sector(s) \\\\
int readabsolutesectors ( int drive,
unsigned long sectornumber,
int numofsectors,
void *buffer )
{
union REGS i, o ;
struct SREGS s ;
struct diskaddrpacket pp ;
pp.packetsize = 16 ; /* packet size = 10H */
pp.reserved = 0 ; /* Reserved = 0 */
pp.blockcount = numofsectors ; /* Number of sectors
to read */
/* for Data buffer */
pp.bufferaddress = (char far*) MK_FP ( FP_SEG((void far*)buffer), FP_OFF((void far*)buffer));
pp.blocknumber[0] = sectornumber ; /* Sector number
to read */
pp.blocknumber[1] = 0 ; /* Block number */
i.h.ah = 0x42 ; /* Function Number*/
i.h.dl = drive ; /* Physical Drive Number */
/* ds:si for buffer Parameters */
i.x.si = FP_OFF ( (void far*)&pp ) ;
/* ds:si for buffer Parameters */
s.ds = FP_SEG ( (void far*)&pp ) ;
/* Invoke the specified Function of INT 13H with
segment register values */
int86x ( 0x13, &i, &o, &s ) ;
if ( o.x.cflag==1)
return 0 ; //failure
else
return 1 ; // success
}
The following function is used to hide or to show the cursor on the screen. The function uses Interrupt 10H, Function 01H to set the cursor type. The coding is as follows:
show_hide_cursor( ssl, esl )
int ssl, esl ;
{
union REGS i, o ;
i.h.ah = 1 ;
i.h.ch = ssl ;
i.h.cl = esl ;
i.h.bh = 0 ;
int86 ( 16, &i, &o ) ;
return;
}
show_hide_cursor( 32, 0 ) hides the cursor and show_hide_cursor( 6, 7 ) retrieves the cursor back. ssl is starting line for cursor and esl is ending line for cursor.
The little description of Function 01H of INT 10H is as follows:
INT 10H (16 or 0x10)
Function 01H (or 0x01) --> Set Cursor Type
Call with: AH = 01H
CH bits 0-4 = starting line for cursor
CL bits 0-4 = ending line for cursor
Returns: Nothing.
Comments:
The function is used to set the cursor type by selecting the starting and ending lines for the blinking hardware cursor in text display mode. In the graphics modes, the hardware cursor is not available.
Page Modified on: 17/01/2022