Wednesday, 30 January 2019

PE File Infector Basics

Welcome back! If this is your first visit to VeXation you may want to start by reading about the project or the development environment I'm using. In this post I'll describe some of my experience starting on a Windows 95 file infector virus.

The first objective of a file infector is to add its own code to another file. In my case the files will be executables. The second objective of a file infector is to make sure the newly added virus code is run in addition to the original executable code. If the virus code isn't run it can't propagate to new executables. If the original executable code isn't run then the virus broke the program it infected and will probably be detected before spreading very far.

To make things manageable I started with the first objective: adding code to another executable. I prefer to work in small chunks where possible so I chose to break the task up as follows:
  1. Finding new target executables.
  2. Deciding if a target is suitable for infection.
  3. Adding a new section to the target with the correct size and metadata.
  4. Writing the virus code into the new section of the target.
As I introduce each topic at a high level I'll share some snippets of my assembly code so far. Towards the end I'll share the full assembly source code along with some pointers, and will share how I validated my work with some handy low level tools.

Generation 0

Initially it took me some time to wrap my head around "Generation 0" of a virus versus subsequent generations. I'm not certain if anyone else uses this "generation" terminology but it's what made sense to me.

Typically when you encounter a virus as an end user it's from running a benign program that was infected by a virus. Have you ever asked yourself how that program was infected? Chances are high that it was infected by another infected benign program. If you imagine tracing these infections backwards eventually there must have been a "Generation 1", the first benign executable infected by the virus.

How was the generation 1 program infected? The author of the virus must have conspired to do this using what I call "Generation 0" - a program built to bootstrap the infection process. Unlike "Generation n" there is no benign functionality in generation 0, it exists only to infect.

Having some terminology in mind was important because I found later on there were practical considerations to be made based on whether the code executing is generation 0 of the virus or a subsequent generation.

Finding target .EXEs

One of the classic problems of virus development is making sure that your creation doesn't escape the "lab" or destroy your development system. I imagine this was extra tricky before virtualization was easy. With the potential for disaster in mind I decided to start by only finding target executables to infect within the same directory as the generation 0 program. It isn't very difficult to recursively search other directories down the road.

This simple infection strategy also made development easier. For example I wrote my `Makefile`'s `run` target to copy a clean calc.exe from C:\Windows into the current directory before running the generation zero program. Everything is neatly contained in the working directory.

Finding target files requires using Win32 API functions. I found a copy of win32.hlp for Windows 95 that I use as my primary reference for the available Win32 APIs, their arguments and their return values. Pay particular attention to return values! Some API functions (e.g. FindNextFileA) return 0 for errors. Other API functions (e.g. FindFirstFileA) return something non-zero for errors (e.g. 0xFFFFFFFF for FindFirstFileA).

1995's API docs aren't so bad after all

To keep things simple I have been limiting my code to ASCII compatibility which means using the "A" variant of some win32 APIs (for ASCII) vs the "W" variant (for Wide Chars). Remember to drop the "A" suffix when looking up documentation (e.g. search for FindFirstFile in the win32.hlp index not FindFirstFileA). Assembly programmers have to care about "A" vs "W" where normally the Visual C++ runtime hides this distinction from programmers with compile time magic.

To find files in the current directory requires using a combination of FindFirstFileA and FindNextFileA. The first is used to start a directory traversal and the second is used to continue it. By providing a pointer to the null terminated string "*.exe" as the lpFileName I'm able to start a traversal of all executables (if any!) in the current directory.



Deciding if a target is suitable

This is another classic virus dilemma. Not all target programs are created equal and infecting the wrong program can be disastrous, breaking the target and making the infection inert.

What should be checked?

A good executable infector needs to check that:
  1. The target file is a true executable (e.g. not something else renamed to have an .exe extension).
  2. The target file is a supported executable format (e.g. a PE executable).
  3. The target file is a "normal" PE executable (e.g. not a DLL).
  4. The target file's code is the right architecture (e.g. x86 code).
  5. The target file is for a supported Windows version (e.g. Win95 not NT).
  6. The target file has space for the infection, or can support adding space.
  7. The target file hasn't already been infected.
There's a lot to consider! Since I'm targeting Windows 95 I knew the answer to all of the above involved understanding the Portable Executable (PE) format. This is the native executable format for Win95 and supplies ways for a diligent virus writer to check all of these things.

I found there was no better resource for understanding PE's than Matt Pietrek's classic from 1994: "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format". It's a lot to take in at once but I hope calling out the most important parts as I go along will make it a bit more accessible. If you've been around the block with modern Windows much of this will be familiar because Windows still uses PE executables!

If you're more visually minded then Ange Albertini's excellent PE 101 Illustrated is another great companion resource to have handy.

PE 101 illustrated by Ange Albertini


Checking a target executable file

At a high level checking a target file is pretty straight forward. I needed to:

  1. open the file for reading.
  2. check the overall file size.
  3. memory map the file.
  4. carefully check offsets within the memory mapped contents.

To open an existing file I need to use the counter-intuitively named CreateFileA function from the Win32 API. This returns a handle pointer for the file that can be used for further operations. The handle doesn't allow reading from the file by itself but can be used with API calls that can.

To get the file size I used the handle from CreateFileA with the GetFileSize function. Windows supports file sizes from 0 to 2^64 bytes so the return value from GetFileSize is split into a lower order DWORD (four bytes) returned in the eax register (the win32 api uses the "stdcall" calling convention) and a higher order DWORD (stored using the pointer provided as input to GetFileSize). Since I'm only concerned with verifying a target file is at least big enough to hold the expected PE header structures I can ignore the pointer to the higher order DWORD and just examine the lower order DWORD returned  directly from GetFileSize.

There are two paths forward to read and write data from the file handle. Either using SetFilePointer and ReadFile/WriteFile or using memory mapping. I chose to use memory mapping because it involved making less API calls and seemed slightly more straight forward.

Memory mapping requires calling CreateFileMappingA using the file handle from CreateFileA to get yet another handle, this time to a file mapping object. I was able to use the file mapping object handle with MapViewOfFile to map a specified portion of the underlying file into memory. The return value from this function is a pointer to the region of memory the file was mapped and it's possible to read and write from this region to access and change the file's contents.


Offsets to check

With the file memory mapped it's possible to check whether it can be infected by examining key offsets within the DOS and PE headers. Initially when I was looking at example PE infector source code from this era I found most were using numeric offsets as magic numbers throughout the code. For example, here's one snippet I found that copies the file and section alignment from a PE header using numeric offsets:


I'm not sure if this is because programmers were often ripping working assembly from viruses found in the wild missing source level context or if it was just how people did it at the time. Either way I found it made code that was difficult to read.

Life was much easier when I used my assembler's ability to define structures. TASM/MASM both support the STRUCT keyword for this. If there was a STRUCT for the PE header and its optional header then the section and file alignment fields could be accessed without using numeric offsets like 0x3C and 0x38:


I found that the MASM32 distribution came with a great windows.inc file that contained many predefined structure definitions, including the ones most important for PE manipulation:  IMAGE_DOS_HEADER and IMAGE_NT_HEADERS. One thing I noticed was that the windows.inc field names didn't always match up to the PE docs exactly (e.g. SecHdrVirtualAddress vs VirtualAddress). The reason for this is because MASM (and TASM in compatibility mode) doesn't locally scope names within structs, meaning there can be only one structure field named VirtualAddress across all structs. If another struct needs a field for a similar purpose it can't use the same name and has to add a prefix (e.g. SecHdr for the section header).

Example IMAGE_OPTIONAL_HEADERS struct

For this project I needed to check each target file for following things to decide if it could be infected:
  1. There should be an IMAGE_DOS_HEADER DOS MZ header at the very start of the file.
  2. The e_lfanew pointer from the IMAGE_DOS_HEADER should point to an IMAGE_NT_HEADERS PE header.
  3. The IMAGE_NT_HEADERS' OptionalHeader's Subsystem field should be Windows GUI or Windows CUI (e.g. a graphical or command line Windows program).
  4. The IMAGE_NT_HEADERS' FileHeader's Machine field should be i386.
  5. There shouldn't be an IMAGE_SECTION_HEADER present with the same name as the    virus code section (or it's already infected).
  6. There should be enough space for an additional IMAGE_SECTION_HEADER to be added.


Adding virus code

There are lots of ways to add new virus code to an existing PE executable. Three common ways I considered were:
  1. Adding the virus code in one piece in the unused padding of an existing code section.
  2. Breaking the virus code into pieces and putting those pieces in unused space  found in an existing code section.
  3. Adding the virus code as a whole new code section.
Option 1 is straight forward but could mean some target files won't have enough space to be infected. PE code sections are usually aligned on disk by 0x200 bytes (the actual alignment is specified in the PE header), meaning that in the best case if an existing code segment were 0x201 bytes before being aligned with padding the virus could be up to 0x1FF bytes. In the worst case if the existing code segment was exactly 0x200 bytes before alignment then there would be 0 bytes left for an infection! Since I wasn't sure what an "average" amount of padding was for executables from this era, or how big my virus would eventually be I decided not to pursue this approach to start with.

Option 2 is more complex but can make better use of free space in places other than the section padding. The complexity involved in breaking up the virus code into little segments and connecting them up at run-time was too much to  make this a good choice when I was just starting out.

Option 3 is what I settled on. This option gave me freedom to make my virus as big as I wanted and looked like a good place to start. The only space constraint this approach has is making sure that there is enough room in the PE header for one new section metadata entry.

The downsides of this approach relate to anti-virus. Putting it bluntly adding a new section is not subtle. You can identify whether a file is infected or not based on the presence of the new section (in fact that's how I prevent reinfection). You could even "inoculate" files against infection by adding a benign section with the same name as the virus section. Unlike options 1 and 2, this option also changes the infected file's size on disk. Lastly, from an AV perspective it's easy to disinfect infected programs by restoring the original entrypoint and deleting the malicious segment. Since I was mostly unconcerned with AV these downsides were acceptable.

At a high level adding a new section means:
  1. Increasing the NumberOfSections WORD in the IMAGE_NT_HEADERS' FileHeader.
  2. Finding the end of the IMAGE_SECTION_HEADERS array and adding one more entry.
  3. Calculating the correct VirtualSize, SecHdrVirtualAddress, SizeOfRawData and PointerToRawData for the new IMAGE_SECTION_HEADER.
  4. Setting the correct SecHdrCharacteristic flag for the new IMAGE_SECTION_HEADER.
Increasing the NumberOfSections is self explanatory. Finding the end of the IMAGE_SECTION_HEADERS array requires some math. The start of this array is always immediately after the end of the base PE IMAGE_NT_HEADERS structure. I knew where the start of the PE structure is (the e_lfanew pointer from the IMAGE_DOS_HEADER) and I knew the size of the IMAGE_NT_HEADERS structure. That gives me the end of the PE structure and the start of the IMAGE_SECTION_HEADER array. I can calculate the offset to the end of the array as the number of sections (NumberOfSections) multiplied by the size of each IMAGE_SECTION_HEADER structure.



Having to calculate two sizes (VirtualSize, and SizeOfRawData) and two offsets (SecHdrVirtualAddress and PointerToRawData) for the section header metadata may seem strange at first. The duplication is more understandable when put in context. PE sections describes something that exists both on disk, and eventually once loaded by the OS, in memory. These two contexts have different requirements. As one example on disk it's beneficial for code to be aligned to suit the filesystem. In memory it's beneficial for code to be aligned to suit memory pages. Having one format that can describe both the "virtual" (in memory) and the "physical/raw" (on disk) makes sense and allows for a lot of flexibility.

Calculating the right virtual size and raw data size required knowing the original unaligned size of the virus code. TASM provides a handy tool for this known as the "location counter symbol".

Borland Turbo Assembler 5.0 "Location Counter Symbol" Docs

By placing a label (viral_payload) at the very start of the virus code I could use the location counter symbol ($) in an equate that provided an accurate size constant (viral_payload_size) for the rest of the code to use:

viral_payload_size EQU $ - viral_payload

This equate stayed current as I tweaked the code and avoided having to update a fixed value. The unoptimized assembly linked to later in this post ends up having viral_payload_size EQU 38Eh, a pretty beefy 910 bytes.

Aligning the virtual and raw sections according to the required alignment sounds difficult but is just a way of saying that the unaligned size must be made evenly divisible by the alignment value. The calculation is:

(((originalSize - 1) / alignment) + 1) * alignment

A typical value for the PE optional header section alignment is 0x1000, so the adjusted VirtualSize of the new section assuming the viral_payload_size is 0x38E is:

VirtualSize = (((0x38E - 0x1) / 0x1000) + 0x1) * 0x1000 = 0x1000 = 4096

For the SizeOfRawData a typical file alignment value is 0x200, so the calculation is:

SizeOfRawData = (((0x38E - 0x1) / 0x200) + 0x1) * 0x200 = 0x400 = 1024

In both cases you can see the aligned size ends up larger than the original virus size. The extra space in the file and in memory will be empty padding and that's why file infectors often find the space they need in existing executable segments.

The SecHdrVirtualAddress value is a relative virtual address (RVA) that specifies where the section will start in memory relative to where the loader puts the executable. Many things are specified as RVAs so it's important to be familiar with this concept. I wanted my new section to start after the end of the existing final section in the target executable which meant the SecHdrVirtualAddress for the new section needed to point at the last section's SecHdrVirtualAddress plus the last section's VirtualSize.

Similar to the SecHdrVirtualAddress, the PointerToRawData value is an RVA that specifies where the section will start relative to the beginning of the PE file on disk. The new section should start after the last section on-disk so the PointerToRawData value needed to be the last section's PointerToRawData plus the last section's SizeOfRawData.

The last part was setting the SecHdrCharacteristics flag. The new section contains code and should be executable and readable. In the future I know I'll want the virus code to be able to modify parts of its own section so I also wanted the section to be writable. All told this meant the flag value was the combination of the IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE, IMAGE_SCN_MEM_EXECUTE and IMAGE_SCN_CNT_CODE bitmasks.



Writing the virus code into the new section

The last trick I needed to perform is to expand the target's overall file size. The new section metadata that gets added is inside of existing slack space between the end of the IMAGE_NT_HEADERS structure and the beginning of the first section and didn't require changing the file size. The new section content will be added at the end of the file and so I had to enlarge the overall executable to make room after the last section's contents.

Increasing the file size turns out to be pretty easy and only required remaping the file using CreateFileMappingA and MapViewOfFile again, adjusting the arguments to account for the new space. Because I'm modifying a PE file on disk I need to adjust by the new section's SizeOfRawData not the original unpadded size or the VirtualSize.


After the enlarged view of the file is mapped it was just a matter of copying the generation 0 code that is currently executing from memory and into the new section. For this I used a fun bit of self-referential code that relies on the viral_payload label and the new SizeOfRawData to know where to begin copying from and how many bytes to copy.



Assembly code

The full code that implements all of the above is available in the VeXation Github
repo under the `minijector` folder. "Mini" because it isn't a finished virus, "jector" because saying "injector" over and over was driving me batty.

Assuming you have the same dev environment set up as I do you can build the project with make (or with debug symbols using make -DDEBUG). If you want to step through an infection process run make run which will copy a clean calc.exe into the project directory and then run td32 on the minijector.exe generation 0 binary to let you step through the infection process.

A few high level notes:

  • The majority of the good stuff is in minijector.asm.
  • I used .model flat, stdcall at the start so that subsequent call instructions for Win32 APIs are handled correctly by default. Win32 uses the stdcall calling convention and it's a pain to push arguments onto the stack in reverse order manually.
  • I didn't use any of TASM's "Ideal" mode and the code should be MASM compatible.
  • My windows.inc file is a stripped down version of what comes with the MASM32 SDK. Using the unaltered full windows.inc with default tasm settings results in build errors because it's SO BIG. I cut it down to only what minijector uses.
  • I chose to name the virus segment ireloc. That's because most PE binaries already have a reloc section and an idata section. ireloc sounds like it should belong, no? :-)
  • I tried to write defensive code. Lots of public virus source code skips checking error returns from API calls or assumes the DOS/PE headers aren't malicious/invalid and I hope paying attention to that stuff makes my extremely   simple virus marginally less lame. That said, I'm sure I messed something up and a malicious PE file could be crafted that will crash the infection routines (Send me a sample if you make one!)
  • I used a lot of locally scoped (@@ prefixed) labels as something between a comment and a marker post for navigating the sourcecode. This is probably a "quirk" of my own style and not a great practice.
  • I'm a pretty novice assembly programmer, so (kind) feedback is welcome!

Verifying the work so far

Whenever computers are involved I find it's helpful to verify my work in as many ways as possible. Since PE files exist both at-rest on disk and in-memory at-runtime I found it useful to verify both states of an infected calc.exe looked like I expected after running the minijector.exe generation 0 infector in the same directory.

Borland Turbo Assembler 5.0 comes with a handy program called tdump short for (hold your laughter) "Turbo Dump". This command lets you easily see PE metadata for a given executable. I've included the tdump output I referenced for a clean calc.exe here, and the tdump of an infected calc.exe here.

There is one important difference in the output that I used to verify the work so far. The original calc.exe has the following sections in the object table:


The infected calc.exe has one more section (the virus section!) in addition to all of the above:


It was also reassuring at this point to see an RVA value (corresponding to the the SecHdrVirtualAddress field) that was higher than all of the previous section's RVA values, as well as a Physical Offset (corresponding to the PointerToRawData field)  that was higher than all of the previous section's Physical Offset's. Also reassuring was seeing the two sizes of the new section both seemed properly aligned and matched my earlier padding calculations.

As mentioned before the new virus section is not subtle and its flags value (corresponding to the SecHdrCharacteristics field) make it even less so. I set the characteristics flag of the section to be read/write as well as executable in preparation for future work and a writable code section will likely tickle some AV heuristics.

To be extra sure things were working as expected I used the values from tdump as a map into a raw hexdump -C of both the clean calc.exe and the infected calc.exe. I cheated here and ran hexdump from my Linux host because I really didn't want to find a hex dump utility for Windows 95. I included the hexdump output from a clean calc.exe here and from the infected calc.exe here.

Looking at the diff between the two hexdumps showed what I was expecting. Early in the file there was a diff to the number of sections (a 0006 changed to a 0007). There is also an addition of new section metadata including the .ireloc string bytes (2E 69 72 65 6C 6F 63 00 == ".ireloc\0"). Towards the end of the file was a big blob of new data that isn't present in the original file (the virus code!).

That confirmed things look good at-rest on disk, but what about at-runtime when the infected calc.exe is loaded into memory? To verify this I turned to the trusty Turbo Assembler debugger td32.

Running the infected calc.exe in td32 initially generated a warning about there being no debug symbols (this is expected). After closing that I found myself at address 0x0040534E. I was able to turn back to the tdump output to understand why that is. The PE was loaded at the base address 0x00400000 and tdump says the entry point RVA is 0x0000534E. Add those two together and you get 0x0040534E, the start address of the original calc.exe code and the location the debugger is paused.

Since I didn't change the entry point of calc.exe to point to the new segment and the virus code I had to go out of my way to find it in memory to look at the dissassembly. I found the easiest way to do that (while execution is still paused at the start of the calc.exe code) was to:

  1. right click the viewing area and choose "Go To".
  2. enter the expression 00400000 + 00013000
Enter the target expression

The disassembly should look familiar

Why is 0x00013000 used in the expression? Back in the tdump output I saw that value is the RVA of the virus segment. Adding it to the base address the infected PE was loaded at (0x00400000) gives the start of the virus code in memory.

Looking at the disassembly that was now in view let me quickly see that the right code was injected. The most obvious "tell" for me was the comparison with 0xFFFFFFFF done a few instructions after the start of the disassembly. That's the cmp eax, INVALID_FILE_HANDLE_VALUE instruction from line 61 of minijector.asm. Neat!

The combination of the tdump output, the hexdump output, and the state of the program at runtime in td32 all gave me confidence that I'm on the right track.

Conclusion

Wow, that was a lot of work! Before getting too excited there's a few hard realities to face. First off, all of the code I injected is entirely inert. Since the entry point RVA wasn't changed it won't ever be run. There won't be a generation 2. Second, and more importantly, even if the code was run it wouldn't work! There's three important reasons why:

  1. It assumes it was loaded in the same location as generation 0/minijector.exe, not the new segment location in calc.exe!
  2. It references locations that were part of a data segment that wasn't copied to the infected file!
  3. It assumes the locations of all of the kernel32.dll win32 API addresses  that are used won't change from where they were for generation 0!

Why go through all the trouble of building a file infector that only infects once? Well, you have to start somewhere :-) This was the easiest way I could think to start out and it let me write a more "vanilla" program. The remaining problems help illustrate the unique requirements of virus code compared to normal program code. The solutions are quite interesting and unfortunately will have to wait until next time.

As always, I would love to hear feedback about this project. I'm finding it somewhat challenging to decide on what level of detail to share and what knowledge to assume (e.g. about general assembly programming) so thoughts here are particularly welcome! Feel free to drop me a line on twitter (@cpu) or by email (daniel@binaryparadox.net).

Thursday, 17 January 2019

Getting set up

Virus writing starts with a development environment. Here's how I set up a Windows 95 VM and my development tools. If this is your first visit to VeXation you might want to read the "Welcome" post introducing the project.

Before going too far I think its valuable to get into a 1995 mindset: Coolio's Gangster's Paradise is the #1 single. Quebec narrowly remains a province of Canada. A pog collection was still cool. Netscape only recently released SSL and faces challenge from a brand new web browser called Internet Explorer. A top of the line home PC was something like a 486 with a 1 GB harddrive and a whopping 8mb of RAM.

Pretty wild! With the right mindset established let's take this new Windows 95 thing for a spin.

Get hype, we're installing Windows 95

Software Choices

Virtualization
I'm writing this on Linux and will be using VirtualBox for virtualization. There are some eccentric new ways to run Windows95 as an Electron app but VirtualBox is the devil I know. In theory most of this setup could be adapted to macOS or modern Windows but you'll have to try that on your own. Luckily for me most of the hard work involved in setting up Windows 95 with VirtualBox was covered by Socket3's blog post on the subject. Rather than duplicate that effort I will defer to that post for the basic setup instructions and only point out areas of difference. 

Windows 95 Version
There are a number of versions of Windows 95 available. To make things simple I chose to use the same one as Socket3: Windows 95 OSR 2.1. Later on I'll want to test code on a few versions to make sure differences in patch level don't break things.


Assembler/Linker/Debugger
I don't know if its true but I get the impression the prevailing choice for writing ASM malware in the 90s was Microsoft Macro Assembler (almost always referred to as MASM). To spice things up a little bit I decided it would be fun to try using Borland Turbo Assembler (almost always referred to as TASM). I ended up choosing Borland Turbo Assembler 5.0. It was easy to install and supported 32bit Win32 development. TASM has fairly extensive MASM compatibility so this turned out to be an OK decision. If all else fails its fun to say "Borland" out loud (I recommend you try it).

Are you ready to pinch individual MOV instructions? I sure am!


Text Editor
This part was a real struggle. I tried a few random "Programmer's Text Editors" that I could remember (Notepad++, UltraEdit, etc) but couldn't get any of them to install on a fresh Windows 95 OSR 2.1 system. This was probably because I was using newer versions meant for Win98+ but it was tedious digging up old installers to try. Ultimately I ended up choosing an ancient freeware program called the Programmer's File Editor (PFE). Compared to a modern IDE it is somewhat feature bare but it sure is... "authentic". Using Programmer's File Editor does provide important features missing from notepad.exe like line numbers and being able to open files > 64KB. Best of all PFE is probably Y2K safe.

The brushed steel effect is how you know we're going to be working close to the metal.

Installing Windows 95 OSR 2.1

To install Win95 in VirtualBox I followed Socket3's blog post on the subject.  If you're following along you'll want to download the following:
  1. Windows 95 OSR 2.1 OEM CDROM ISO.
  2. Windows 95 Bootdisk floppy disk image
  3. SciTech Display Doctor 7.0 Beta
You will also need a valid Win95 "Certificate of Authenticity" serial number. These are all over the internet but I'll save you a Google and share the one I used (if you're a cop stop reading this): 24796-OEM-0014736-66386

I chose to use some virtual machine settings that aren't true to the period to make life a little more bearable. I created a VM with a Pentium processor, 64mb of RAM and a 2GB disk. Don't forget to disable VT-x/AMD-V and Nested Paging in the CPU settings.

Using a floppy disk image to bootstrap CDROM drivers to be able to run the Win95 installer CD is certainly nostalgic. In the VM the overall install time is quite fast compared to 1995 when it would often take most of an hour.

BANANA powered CD-ROM

Socket3's instructions worked as described except for a few minor things I had to adapt:
  1. The CDROM drive did not default to drive R:\ - instead it was drive D:\ like I would have expected. I replaced any references to R:\ with D:\. I also skipped editing autoexec.bat to rename the CDROM drive.
  2. Despite it being drive D:\ I did have to follow the described process of copying *.cab files from the CDROM to the harddrive before starting the installation process or trouble would ensue.
  3. The process of copying CDROM files to drive C:\ in Socket3's post is described before the fdisk and format process that prepares drive C:\. I had to reverse this order and prepare the drive before copying to it. Remember to restart the VM after partitioning and formatting.
I'm having Fun™ already

Getting the somewhat sketchy SciTech Display Doctor driver installed and configured is a little fiddly but the improved screen resolution is worth it. I had to drag around the SciTech application interface a bit before I could see the "Apply" button I needed to click to change the Display and Graphics Drivers to follow Socket3's instructions.

Following the network setup instructions was important to me because I knew I would want to copy files to the VM from my host machine (installers, goat files, etc). If you want to do the same make sure not to forget to reinstall "Client for Microsoft Networks" after you install the TCP/IP protocol because it will be removed when you first remove NetBEUI and IPX/SPX.  I put the VM on a network that did not have a DHCP server and so I also had to configure the system IP Address, Gateway, and DNS Configuration manually. Get used to restarting your VM because Literally Everything requires rebooting in Windows 95: installing a new driver, changing the screen resolution, changing the system IP address, you name it.

One important thing to point out is that Socket3's approach to VM networking bridges the VM to your host machine's network adapter which means you may effectively be putting a Windows 95 machine on the big scary Internet unless you're careful. I gave up on using VirtualBox NAT so ultimately I "airgap" the VM by strategically connecting/disconnecting the network adapter virtual cable when I want to send/receive files. The irony of getting a virus while trying to develop a virus would be too much for me to bear.

File Sharing

There are no VirtualBox guest additions for Windows95 which means I couldn't use conventional means to share files between the host and the guest. I tried mounting the VM's FAT32 disk image directly into Linux using various tricks but found it unreliable and annoying because the guest had to be shut down first. Out of the box Win95 OSR 2.1 has IE 4.0 so browsing the web to download tools is a nightmare. Barely any sites will work and you're almost guaranteed to be more hacked than Marriott. The best solution I could find without getting lost in yak shaving was to enable Windows file sharing in the VM. I use a Linux samba client from my host machine to interact with the VM shared folder.

File sharing isn't enabled out of the box so after configuring TCP/IP and connecting the virtual cable again (plz don't hack me) I enabled and configured it. To do so yourself:
  1. Right click Network Neighbourhood and choose properties
  2. Click the ugly button labelled "File and Print Sharing"
  3. Click "I want to be able to give others access to my files"
  4. Click OK
  5. Insert the Windows 95 install CDROM again if you've removed it
  6. Click Yes to restart your computer (of course you have to restart your computer for this)
  7. Create a folder in your C:\ drive called "portal"
  8. Right click the folder and choose properties
  9. Click the "Sharing" tab
  10. Configure a share name and access type for the folder
I found it easiest to access the shared folder on the VM from my host using a command line SMB client called, unsurprisingly, "smbclient". To download a TAR archive of some files in the portal directory is as easy as running:
smbclient //VMNAME/PORTAL "" -N -Tc backup.$(date +%Y-%m-%d-%s).tar <files>
To poke around in an interactive mode is even easier:
smbclient //VMNAME/PORTAL ""
The smbclient man page has plenty more information. Remember to disconnect the virtual network cable when you're done. Also note that "VMNAME" is a placeholder for the name I chose to identify the computer in Windows file sharing settings, not the VirtualBox VM name.

Enabling file and print sharing
Enabling file and print sharing pt2
Configuring a file share

Installing Borland Turbo Assembler 5.0

Overall this was a straight-forward process. To begin I had to download the three floppy disk images for the Borland Turbo Assembler 5.0 installer. You can find these on Win32World as a 7z archive. If you're following along you'll need a way to unzip 7z files to get at the individual disk images you can mount as virtual floppy disks for the VM using VirtualBox.

After mounting disk01.img to the virtual floppy drive I was able to begin the installation. One way to do this is to:
  1. Open My Computer
  2. Double click the A:\ drive
  3. Double click the "Install" icon
  4. Hit enter as required to proceed
  5. Leave all the defaults and choose Start Installation
  6. Mount disk #2 and disk #3 when prompted
  7. When TSM_RDME.TXT is displayed the installation is complete. Hit ESC to exit. Don't forget to eject the floppy disk before you reboot the VM or the next boot-up will try to boot from the Borland floppy and fail.
T-t-t-t-turboooooo


Next I had to add the TASM tools to the %PATH% environment variable so I can use the tools easily from a command prompt without a fully qualified path. For Win95 this means editing the Autoexec.bat script. One way to do so:
  1. Open My Computer
  2. Open the C:\ drive
  3. Right click "Autoexec" and choose "Edit"
  4. The file will probably be empty since this is a fresh installation. I added the following line without the surrounding quotes:
    "Set PATH=%PATH%;C:\TASM\BIN"
  5. Save the file and exit
  6. Reboot (of course)
I was able to verify that I had a working installation and gain some basic familiarity with the tools by building one of the Win32 sample applications that comes with TASM. To do this:

  1. Open a command prompt (My favourite way is to hold the windows key, hit "r", enter "command" and then hit enter)
  2. Change to the "wap32" example directory by running "cd C:\TASM\EXAMPLES\WAP32"
  3. Build the sample application by running "make". If you get the error "Bad command or file name" you should double check your %PATH% was setup correctly. It's expected to see two warning messages about heap and stack reserve size (Shout out to the Borland dev that shipped example code that builds with warnings...).
  4. Run the built application by running "wap32" in the console
  5. You should see an ugly win32 application window appear
  6. You can try debugging the application by running "td32 wap32". Because we built the sample application without debug symbols the debugger will warn us about this fact and only show disassembled machine code
  7. You can try debugging *with* symbols by running "make -B -DDEBUG". The -B forces a rebuild even though the sourcecode hasn't changed. -DDEBUG passes a "DEBUG" argument to the Makefile so it can change the assembler and linker command line flags
  8. After rebuilding with debug symbols you can run "td32 wap32" again and it should not warn about missing debug symbols anymore and instead show you a source code listing. Much nicer!
Building wap32 example without debug symbols.
wap32 in all its glory.

Debugging wap32 without debug symbols
Debugging wap32 with debug symbols

Wrapping Up

Cool! A real development environment. If you followed along I recommend that you create a snapshot of the VM state at this point so no matter how screwed up things get you can always return to a fresh setup. It would also be useful to duplicate these setup instructions to make a few more Windows 95 VMs that you can use to test your virus down the road without infecting your development machine.

A full development environment in action
Bonus fact: If you've read Fabien Sanglard's excellent Game Engine Black Book: Wolfenstein 3D you might noticed the td32 debugger interface looks just like the one shown in Chapter 3. John Carmack and id software used Borland Turbo C++ for the development of Wolf 3D and it came bundled with Borland Turbo Debugger (though we've installed a newer version).

At this point I spent most of my time reading the Borland Turbo Assembler 5.0 manual and browsing through the WAP32 source code. It was a pretty good starting example for win32 programming in x86 assembly. Once I had a grasp of WAP32 I dipped my toes into some of Iczelion's tutorials. These are mostly MASM based but work with TASM with minimal fuss. Compared to 2019's version of development tutorials and medium dot com think-pieces I found these older "community style" tutorials very endearing if not always crystal clear.

You might have been surprised to see the "make" command show up in a Win95 dev environment. I admit I was. Initially I assumed it would be something similar to GNU Make and I definitely set myself up for disappointment. Instead it's some kind of proprietary Borland flavour of Make that is missing a lot of what I associate with GNU Make. It takes some getting used to, especially when paired with the lousy command.exe shell Win95 offers. Overall its still nicer (to me anyway) than writing BAT files.

Coming up

With the dev env ready I can move on to more interesting topics. Next time I'd like to talk about the theory behind PE infectors and some challenges that we'll face compared to standard application development.

I would love to hear feedback about this project, especially if you were someone writing assembly code in this era. Feel free to drop me a line on twitter (@cpu) or by email (daniel@binaryparadox.net).

Until next time,

Monday, 14 January 2019

Welcome to VeXation

Over the 2018 Christmas holiday I decided to sit down and start on a project that has been on my mind for a long time: writing a 90s computer virus. I decided to start this blog to chronicle my progress and hopefully spark your interest in retro computer viruses.

VX?



Growing up I was enamoured with the .txt files that came out of what was then called the "computer underground". Within this space of angsty teens and early internet culture I was particularly in love with a corner mostly referred to as "VX": Virus eXchange. Here you'd find people like "Lord Julus" writing posts with names like "Anti-Debugger & Anti-Emulator Lair" and groups like 29a (666 in hexadecimal - I hope you can already see the appeal) putting out ezines and prototype virii.

Remember when websites had landing pages?


Much of this era was painstakingly catalogued on a website called "VX Heavens". While its original home met an unfair demise there are still mirrors online and archives to download. There were certainly problematic aspects of the VX scene I can't commend but the spirit of creativity, discovery and sharing of knowledge was truly unique.

Why write a 90s virus?

For broader context, in 1995 I was 7 years old. I was probably closer to 12 when I first started reading vx files. I probably understood less than 2% of anything I read at the time. Revisiting this subject as a 30 year old with a decade of professional experience is fascinating. Now I can both enjoy the spirit that appealed to me as a kid while also appreciating and implementing the technical aspects.

Despite being over 20 years old lots of the techniques detailed by great VXers of the era remain relevant fundamentals. The core technologies addressed by many articles still exist and are in use. In fact I bet whatever computer you're reading this on right now is still executing PE or ELF executables.

It may not be reflected in citations and RSA conference talks but topics that are now foundational areas of computer security and reverse engineering were pioneered not by academia or industry but by bored teenagers who wanted to make other people's computers show a weed leaf.

Why write a Windows 95 virus?

To start writing a virus I needed to decide on platforms and targets to support. My choice of Windows 95 was driven by both practical and emotional reasons.

From a historic standpoint Win95 marked an interesting inflection point in the VX scene. The DOS era had huge number of viruses and VX publications but with the release of Win95 much of the community's knowledge was becoming obsolete. Just to start Win95 had a brand new executable format, ran in protected mode, and operating system functions were accessed through a new API instead of raw interrupts.

Windows 95 was also the first operating system I can remember exploring. My first computer was a Tandy 1000 but it was fairly primitive and I was too young to explore deeply. By the time my family had a Win95 machine I was enthralled by it. I spent a lot of time looking at every single control panel setting and corner of the filesystem. We didn't have home Internet at the time so what counted as "fun" on the computer was certainly broader than today. I'm sure many of you that grew up in this time remember hours lost in mspaint.exe.

There are lots of great Windows 95 resources available online.


From a practical standpoint Win95 is appealing because it is modern enough to have things that will make my life easier (TCP/IP, filesharing, approachable development tools, a flat memory space, basically zero security features). It also runs reasonably well in VirtualBox so we can avoid purchasing any beige monstrosities on eBay.

For a challenge and to match VXer preference of the time I'll be writing my virus in pure x86 assembly. I have limited experience writing assembly so I wanted to be sure I picked a platform that wasn't too esoteric. There's a good deal of resources available about both general Win95 Win32 programming in assembly as well as VX specific resources for Win95. I've spent most of my career focused on Linux and UNIX systems so there's a lot of new ground for me to cover programming on and for Windows.

Lastly, targeting an operating system 18 years past the end of extended support means I can experiment without worrying too much about destructive consequences and moral hangups. Don't run Windows 95 in the real world, I'm begging you.

Where to start?

With some of the "Why" out of the way let's talk about short term goals and what I want to start out building.
  1. Vanilla infector. My goal is to infect PE executables to propagate the virus without impacting the functionality of the infected application. Put simply, if you run an infected program it should work as intended to avert suspicion but also spread the virus by finding new programs to infect.
  2. KISS. To start I'll ignore advanced techniques (polymorphism, encryptors, anti-debugging, etc) and focus on a minimal viable PE infector. This means I will largely be ignoring anti-virus detection to start with. I'll point out where I'm making a decision that will aid AVers but I will postpone developing countermeasures. (Spoiler: Even using 90s tricks without concern for evasion will turn out to bypass detection from a surprising number of AV engines)
  3. Windows 95 compatibility only. It isn't too difficult to also support Windows NT and Windows 98 but I'm already biting off a lot so focusing on one platform will help manage complexity.
  4. No hardcoding offsets! Even though I'm ignoring Windows NT and 98 I want to be reasonably confident the virus will run on different patch levels of Windows 95.
  5. 100% assembly. It's tempting to use C but to make this more of a challenge and to match the preferences of the VX scene in this era I'll use x86 assembly only (likely targeting the 80386 or 80486).
  6. Period accurate tools. As much as anything I want to feel like a 90s Windows VXer and that means not using vim and a familiar toolset. Instead I'll find a 1990s text editor and use the compilers/debuggers available at the time.
This is ambitious and I tend to abandon projects before they're finished so I hope that by keeping the scope constrained initially and focusing on sharing in-progress work I can finish the above and move on to the more interesting advanced topics :-)

Up Next

My next post will focus on setting up a Windows 95 VM, configuring it with internet access, setting up filesharing, and installing a x86 development environment.

I would love to hear feedback about this project, especially if you were someone active in the 90s vx scene. Feel free to drop me a line on twitter (@cpu) or by email (daniel@binaryparadox.net).

Until next time,