Mercurial Extractor

This is an expansion of part of a talk I did for OWASP East Midlands.

If you actually read the articles posted up here you may have read about the svnpristine extractor that was written in October 2012 and not released until February 2013 (hey, it takes me a while).

The crux of that script was that I was stuck on a site where the company managed releases of their web based software by using a subversion archive, and then left the subversion working files on their web server. Allowing me to bypass file type checking and directly download the web.config file. Profit!

So, SVN has been done to death, not just by me, but by others, including a nifty script that combines my pristine extractor with another one to extract from the older entries style of subversion repository.

It's time to jump onto the next on: Mercurial.

Mercurial is another repository management tool, designed as a replacement for Bitkeeper, which is used by all sorts of people, including Mozilla and python (fortunately not their websites).

It tracks checked out files and directories and the versions of them through a hidden .hg directory which has some files that we can abuse for information gathering as bypassing file type checking:

.hg/hgrc is the resource file which lists the configuration for Mercurial. So far I've only seen default variable populated in the paths section, which shows the source of the repository (and potentially, another target).

.hg/dirstate is the most important file for stealing other files from the host: it is a binary formatted file which acts, like SVNs wc.db, as an information directory about the files in the repository. This include inode like file information (size, last update, length) and the name of the file.

.hg/store/data is our target: it stores a pristine copy of all of the files in the Mercurial repository, with the file names prepended by a .i extension. The .i file has a binary header and deflates the actual data content to reduce size over the wire.

So, TL:DR, if you know the format of the binary files and somebody is dopey enough to leave the .hg file in place on a web root, you can extract pretty much all the files in the repository.

So without further ado, have a tool to do this for us: hg-decode (I can really name those tools, can't I). Usage is primitive as I'm hoping somebody with more time and enthusiasm can recode it into something more useful:

hg-decode localhost:8080

Internally this maps into http://argv[0]/.

hg-decode can be downloaded from here: hg-decode.pl

Next up: git!