DCSIMG
June 2011 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

June 2011 - Posts

SELA Developer Days 2011 – Windows Internals

The SELA Developer Days conference has been adjourned :-)

My one-day session today, titled Windows Internals for Busy Developers, was something I came up with a couple of months ago and was sure it wouldn’t be popular – after all, we have a five-day Windows Internals course and most people interested enough in the subject would want to attend the full training with labs, demos, and detailed walkthroughs of all Windows components.

I was surprised to find 16 attendees in my class, all eager to learn about Windows architecture and components, diagnostic tools, kernel debugging, and internal data structures. I think it’s really awesome that developers (busy as we are!) find time to attend a training session which doesn’t have a well-defined outcome; at the end of the day, I’m not sure if I can promise that you’ll be better at something in particular, but I’m willing to bet that you will have learned something new and become a better developer or systems architect.

We had time today for the following:

Other sessions today included Debugging the Web with Fiddler (I am so out of touch with Web developers :-)), WPF and Silverlight Smart Client Architecture Deep Dive with MVVM and Beyond (winner of the longest title), and Introduction to Test-Driven Development. I’ve already heard great feedback about the other sessions.

image

This concludes the conference; I’ve had a great time, although I can barely feel my feet after a week of full-day sessions. We are looking forward to hosting similar conferences during the year, what with BUILD/Windows giving us lots of new subjects to cover :-)

SELA Developer Days 2011 – Improving the Performance of .NET Applications

There’s just one day left for the SELA Developer Days, and today I delivered my session titled Improving the Performance of .NET Applications. In this brief one-day session I wanted to distill the best practices and tools for measuring various performance metrics, but also provide some insight into OS and CLR internals relevant to high-performance development.

Other sessions today included Parallel Programming: One Step Beyond, Windows Phone Mango, Introduction to Scrum, and Visual Studio 2010 Testing Tools. I really wish I could attend Bnaya and Yaniv’s session on parallel programming – they showed lots of cool stuff, including TPL Dataflow, Rx integration, and even some HPC cluster coolness.

image

So what have I done today? There was a big group of over 50 people at the full-day session, and we had enough time to go over:

  • Using sampling and instrumentation in the Visual Studio profiler to obtain initial execution time readings and more detailed information
  • Tracing memory allocations and heap structure using CLR Profiler, Visual Studio memory allocations, and ANTS Memory Profiler
  • Profiling cache misses along with a couple of examples where cache performance is crucial
  • How managed types are laid out in memory and how this affects the performance of value types in particular
  • How the garbage collector operates and why finalization is a bad idea :-)

SELA Developer Days 2011 - C++ Debugging

I’m keeping up with the updates from the SELA Developer Days conference. Yesterday our classes were full to the brim with attendees – some of the sessions delivered were Parallel Programming in .NET 4.0, Introduction to Windows Phone 7, and a feature-packed day on TFS 2010 and Visual Studio 2010.

image

I delivered yesterday a session on C++ debugging, in which we covered the following topics:

  • How to read x86 and x64 assembly listings created by the C/C++ compilers
  • How to match debugging symbols to the debugged process or dump
  • How to generate crash dumps and hangs dumps
  • How to analyze crash dumps in Visual Studio 2010 and how to perform initial automatic analysis in WinDbg
  • How to use UMDH to find a memory leak in a C++ application
  • How to traverse wait chains between critical sections, threads, and other synchronization mechanisms
  • How to diagnose heap corruptions and catch them at the earliest point possible
  • How to approach stack corruptions with minimal information

During this very busy day we also had the chance to do a little hands-on work – a one-hour lab on generating crash dumps and pinpointing the crash cause in Visual Studio and WinDbg, and a half-hour lab on debugging a deadlock in an MFC application.

If you attended my session and have any further questions, I’ll appreciate if you let me know, as always. Today and tomorrow I’m delivering two more sessions, so stay tuned for more updates.

SELA Developer Days 2011 - .NET Debugging

As I wrote about a month ago, this week is a very busy one for us – we’re hosting the SELA Developer Days conference at SELA’s headquarters. The conference registration was truly overwhelming – there are close to 600 participants scheduled to attend the conference’s 25 workshops during the week!

image

I’m a little biased, but after teaching the .NET Debugging one-day session to a group of 40 enthusiastic developers today, I think the conference organization was superb so far, and that we really managed to create the kind of easygoing, stress-less learning atmosphere which is not so easy to attain at technical conferences.

Some of the highlights of what my attendees saw today:

  • How to read x86 and x64 assembly listings
  • How to generate crash dumps and hang dumps automatically and on-demand
  • How to analyze (“root-cause”) a crash dump using Visual Studio or WinDbg
  • How to find a memory leak in the managed heap using SOS and CLR Profiler
  • How to follow wait chains between managed threads
  • How to pinpoint managed assembly loading failures

If you’re attending the conference during the week, feel free to come up and say hi – I will be delivering sessions tomorrow, on Wednesday and on Thursday. If you’re reading this after the conference and have any unanswered questions, topics you think we should have covered, or any feedback on the conference organization and delivery, I would really appreciate if you could let me know.

Baby Steps in Windows Device Driver Development: Part 4, Kernel Debugging

Now that you have a driver running on the target system, it’s time to learn how to debug it if the need arises.

In the first part, you configured the virtual machine for kernel debugging over a virtual serial port, and connected to the kernel debugging session using WinDbg. Familiarity with WinDbg commands for unmanaged debugging is a major plus here, but there are numerous new extension commands that are available only in kernel-mode which you will have to learn anyway.

Commands you might need:

  • bp to set a breakpoint
  • u to disassemble code
  • dt module!STRUCT 0xdeadbeef to inspect an object
  • dd 0xdeadbeef L10 to show the first 10 DWORDs at the specified address

Here are some things to try now that your driver is in the picture:

  • Set a breakpoint in your DriverEntry and inspect the DRIVER_OBJECT you received
  • Set a breakpoint in your DriverDispatch with a condition on the IRP containing the IOCTL_SAYHELLO code you defined

And here are some general ideas to play with while you’re in the kernel debugger:

  • Set a breakpoint in KiSystemService and disassemble the system service dispatcher, the component that handles a system call request from user-mode and calls the appropriate service in kernel mode using a lookup table
  • Inspect the system service table
  • Locate the system’s active process list—it is in the global variable KiProcessListHead which points to a LIST_ENTRY structure—and use it to traverse the link nodes (KPROCESS and EPROCESS structures)

Next up: using a real-to-life kernel-mode API to do something remotely useful from our driver.

Baby Steps in Windows Device Driver Development: Part 3, Receiving IOCTLs

Now that we can compile, deploy, install and start a driver, it’s time for something more interesting. In this post, we’ll send controls to our driver from a user-mode application.

To receive information from the outside, we need to teach our driver to respond to device I/O control codes (IOCTLs) which can be delivered to it from user mode using the DeviceIoControl Win32 API. We have already seen how our driver can tell Windows what the unload routine is using the PDRIVER_OBJECT structure. Handling IOCTLs is very similar, we just need to provide another routine or two.

Before our driver can receive IOCTLs, it needs to create a device object that can be opened from a user-mode application. An application can open a device object using the CreateFile Win32 API and then issue IOCTLs or read/write requests on the device object handle.

Add the following to your DriverEntry to create a device object and register the DriverDispatch routine to handle IOCTLs directed at your device:

#define FILE_DEVICE_HELLOWORLD 0x00008337

NTSTATUS DriverEntry(
    PDRIVER_OBJECT DriverObject,
    PUNICODE_STRING RegistryPath)
{
    NTSTATUS status;
    WCHAR deviceNameBuffer[]  = L"\\Device\\HelloWorld";
    UNICODE_STRING deviceNameUnicodeString;
    WCHAR deviceLinkBuffer[] = L"\\DosDevices\\HelloWorld";
    UNICODE_STRING deviceLinkUnicodeString; 
    PDEVICE_OBJECT interfaceDevice = NULL;

    DbgPrint("DriverEntry called\n");
    RtlInitUnicodeString (&deviceNameUnicodeString,
                          deviceNameBuffer);
    status = IoCreateDevice (DriverObject,
                             0,
                             &deviceNameUnicodeString,
                             FILE_DEVICE_HELLOWORLD,
                             0,
                             TRUE,
                             &interfaceDevice );
    if (NT_SUCCESS(status))
    {
      
RtlInitUnicodeString (&deviceLinkUnicodeString,
                             deviceLinkBuffer);
       status = IoCreateSymbolicLink(
                  
&deviceLinkUnicodeString,
                   &deviceNameUnicodeString );

       DriverObject->MajorFunction[IRP_MJ_CREATE] =
       DriverObject->MajorFunction[IRP_MJ_CLOSE]  =
       DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] =
                                   DriverDispatch;
       DriverObject->DriverUnload = DriverUnload;
    }

    return status;
}

What does DriverDispatch have to do? Not much, and indeed for now we are going to handle only a single IOCTL that will trigger a “Hello World” message:

#define IOCTL_SAYHELLO (ULONG) CTL_CODE( FILE_DEVICE_HELLOWORLD, 0x00, METHOD_BUFFERED, FILE_ANY_ACCESS )

NTSTATUS DriverDispatch(
    IN PDEVICE_OBJECT DeviceObject,
    IN PIRP Irp)
{
    PIO_STACK_LOCATION iosp;
    ULONG  ioControlCode;
    NTSTATUS status;
    DbgPrint("DriverDispatch called\n");

    iosp = IoGetCurrentIrpStackLocation (Irp);
    switch (iosp->MajorFunction) { 
    case IRP_MJ_CREATE:
        status = STATUS_SUCCESS;
        break; 
    case IRP_MJ_CLOSE:
        status = STATUS_SUCCESS;
        break; 
    case IRP_MJ_DEVICE_CONTROL:
        ioControlCode =
            iosp->Parameters.DeviceIoControl.IoControlCode;
        if (ioControlCode == IOCTL_SAYHELLO) {
            DbgPrint("Hello World!\n");
        }
        status = STATUS_SUCCESS;
        break; 
    default:
        status = STATUS_INVALID_DEVICE_REQUEST;
        break;       
    } 
    Irp->IoStatus.Status = status;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    return status;
}

All that’s left is to use the driver from a user-mode application by opening a handle to its device and issuing an IOCTL for it to print a message. In our particular case, something like the following will suffice:

HANDLE hDevice;
DWORD nb;
hDevice = CreateFile(
             TEXT("\\\\.\\HelloWorld"),
             GENERIC_READ | GENERIC_WRITE, 0, NULL,
             OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DeviceIoControl(
             hDevice, IOCTL_SAYHELLO,
             NULL, 0, NULL, 0, &nb, NULL);
CloseHandle(hDevice);

You are ready to test your driver and your user-mode application. Compile the driver, deploy it to the target machine, register and start it using the OSR Driver Loader (go back for instructions). When the driver is running, run your user-mode application to issue the IOCTL and watch DebugView for debug printouts.

Next, we’ll see how to use WinDbg as a real kernel debugger to set breakpoints and step through your driver’s code.

The Future of Microprocessors—Must Read for Developers

Long-time readers of this blog know that I really don’t like rehashing someone else’s thoughts and linking to material that isn’t my own. However, the ACM article The Future of Microprocessors (S. Borkar, A. Chien) warrants an exception to this rule.

If you can afford the time (approx. 2 hours), I strongly recommend that you read the article instead of my somewhat incoherent ramblings below. If you’re looking for an executive summary highlighting some of the biggest challenges and likely solutions and are willing to sacrifice accuracy or presentation, read on :-)

Moore’s Law and related observations have clouded somewhat the way we look at application performance. For years, it has been common to assume that CPU-bound software doubles its performance every year or two, making it easy to process bigger sets of data, support higher display resolutions, or handle faster and wide network streams. With the end of the free lunch, parallelism rises its ugly head and forces us to think how processing can be broken into multiple parts and executed simultaneously on multiple cores. For some workloads this is easy, and other workloads warrant highly inventive parallel algorithms that often deviate significantly from their sequential counterparts. The cost of synchronization, inter-core communication, and cache coherency drill a large hole in the high-level language abstractions to which we have grown used.

Which of these trends is going to dominate the next 20 years of microprocessors? Are we in for a 1000x increase in processor speeds or numbers of processor cores? Or is there something completely different that we will have to embrace, revolutionizing again the way we reason about software performance?

The Future of Microprocessors addresses some of these questions in a very accessible way. First, it outlines the way some of the greatest advances in processor performance have been achieved:

  • Increasing individual transistor speeds by scaling them down to unthinkably tiny size (from 10 micrometers 40 years ago to 30 nanometers today)
  • Microarchitecture tricks including multi-cycle execution, pipelining, branch prediction
  • Multiple layers of cache memory, bringing down stalls associated with fetching data directly from main memory

Unfortunately, some of these “automatic” trends cannot proceed further without significant changes. The primary challenge is the capacity (power- and area-wise) that is reasonable for a desktop CPU. The limited energy budget limits the advances attainable by reasonable scaling from today’s technology far below the 1000x performance increase expected by 2030. Some of the ways of addressing this challenge are outlined below.

Multiple cores don’t have to deliver linear speedup. From a power efficiency perspective, it might make sense to have a set of small cores with lower single-thread performance but a reasonable energy signature. A hybrid approach, where large cores are used for certain workloads and smaller cores are used for highly parallel execution, is also a feasible alternative.

Data movement across the interconnect and within the processor caches must be rethought to accommodate for the given energy budget. With only a 30x increase in processor performance and assuming an average of 1mm propagation distance for instruction operands, 90% of the processor’s energy budget would be consumed by memory movement alone. This restriction leads to the need for (unconventionally) larger register files to allow more data to be stored within 0.1mm of the relevant execution units.

The interconnect network between CPU components and multiple CPU cores is in need of another radical redesign. This calls for multiple types of buses, combining commonplace packet-switching networks with circuit-switched networks.

Finally, some of the work might befall upon us developers—an inevitable conclusion may lead in the direction of software being the solution to the scaling problems. Some of the conveniences afforded by modern hardware, such as a flat address space and coherent caches might have to break down and be replaced by explicit software alternatives, that could go as high up as the programming languages themselves. (By the way, some of these trends are becoming visible already with NUMA systems on modern servers and desktops, and the challenges associated with scaling even OS kernels to 256 processor cores or more.)

I wonder how long it takes until another “The Free Lunch Is Over” article will be in place. It might be 5 years, or 10, or 20, but it’s hard to see now how our current software is going to adapt to the processor scaling trends.

Baby Steps in Windows Device Driver Development: Part 2, “Hello World” Driver

In this installment, we will compile and deploy our first driver. You should have all the tools installed already.

Windows device drivers are reactive programs—all they really do is respond to events, somewhat similar to GUI programs. The kinds of events drivers recognize include:

  • Loading the driver into memory and unloading it from memory
  • Adding a new hardware device for which the driver is responsible
  • Transitioning to a power-savings mode
  • Reading and writing from a device
  • Handling an interrupt arriving from a device

A driver handles these events by registering functions that Windows invokes. In this post, we will use only two of these functions, invoked when a driver is loaded and unloaded.

Type the following into your favorite code editor and save it as HelloWorldDriver.c:

#include <ntddk.h>

void DriverUnload(
    PDRIVER_OBJECT pDriverObject)
{
    DbgPrint("Driver unloading\n");
}

NTSTATUS DriverEntry(
    PDRIVER_OBJECT DriverObject,
    PUNICODE_STRING RegistryPath)
{
    DriverObject->DriverUnload = DriverUnload;
    DbgPrint("Hello, World\n");
    return STATUS_SUCCESS;
}

This is your first driver. Very simple indeed—all this driver does is print a couple of debugging messages when you load it and unload it. You will need a couple of build files that tell the build engine what to do with your sources. Here are the files:

File name: SOURCES

TARGETNAME = HelloWorldDriver
TARGETPATH = obj
TARGETTYPE = DRIVER

INCLUDES = %BUILD%\inc
LIBS = %BUILD%\lib

SOURCES = HelloWorldDriver.c

File name: makefile.def

!INCLUDE $(NTMAKEENV)\makefile.def

To compile the driver, open the build environment for your target OS from the Windows Driver Kits start menu folder. The “checked” build is akin to Debug mode, and the “free” build is equivalent to Release mode.

image

All that’s left now is to compile the driver. In the WDK build environment command prompt, navigate to the directory containing your driver’s source code, the SOURCES and makefile.def files, and then run the build command. If there have been no errors, you should see a .pdb file and a .sys file created in a subdirectory.

And now—deployment time. Copy your files over to the target system (preferably a virtual machine) and run the OSR Driver Loader. Point it to your driver’s location, click “Register Service” and then “Start Service”. Your driver should be running!

image

If you want to see the debugger output, you need to use a utility like Sysinternals DebugView. In DebugView, hit Ctrl+K to enable kernel debug spew, and then start/stop your driver a few times. You should see load and unload messages pile up in the DebugView window.

How far is this driver of ours from being useful? From talking to actual hardware? Quite far, and I’m not sure we’ll make it that far :-) However, in the near future we sure are going to run some interesting code in kernel mode and in the somewhat farther future see how rootkits use drivers to hide processes and files.