Content management systems and blogging platforms such as Joomla, WordPress, and Drupal make starting a new blog or website simple, and they’re relatively common in a shared hosting environment or even an enterprise network. All systems have their own challenges in terms of installation, configuration, and patch management, and these CMS suites are no exception. When an overworked sysadmin or a hapless web developer doesn’t follow all security and installation procedures, it can be easy pickings for an attacker to gain access to the web server.
Because we can download any open source web application and locally determine its file and directory structure, we can create a purpose-built scanner that can hunt for all files that are reachable on the remote target. This can root out leftover installation files, directories that should be protected by .htaccess files, and other goodies that can assist an attacker in getting a toehold on the web server. This project also introduces you to using Python Queue objects, which allow us to build a large, thread-safe stack of items and have multiple threads pick items for processing. This will allow our scanner to run very rapidly. Let’s open web_app_mapper.py and enter the following code:
import Queue import threading import os import urllib2 threads = 10 target = "http://www.blackhatpython.com" directory = "/Users/justin/Downloads/joomla-3.1.1" filters = [".jpg",".gif","png",".css"] os.chdir(directory)
We begin by defining the remote target website and the local directory into which we have downloaded and extracted the web application. We also create a simple list of file extensions that we are not interested in fingerprinting.
web_paths = Queue.Queue()
This list can be different depending on the target application. The web_paths variable is our Queue object where we will store the files that we’ll attempt to locate on the remote server.
for r,d,f in os.walk("."): for files in f: remote_path = "%s/%s" % (r,files) if remote_path.startswith("."): remote_path = remote_path[1:] if os.path.splitext(files) not in filters: web_paths.put(remote_path)
We then use the os.walk function to walk through all of the files and directories in the local web application directory. As we walk through the files and directories, we’re building the full path to the target files and testing them against our filter list to make sure we are only looking for the file types we want. For each valid file we find locally, we add it to our web_paths Queue.
def test_remote(): x while not web_paths.empty(): path = web_paths.get() url = "%s%s" % (target, path) request = urllib2.Request(url) try: response = urllib2.urlopen(request) content = response.read()
On each iteration of the loop, we grab a path from the Queue, add it to the target website’s base path, and then attempt to retrieve it.
print "[%d] => %s" % (response.code,path) response.close()
If we’re successful in retrieving the file, we output the HTTP status code and the full path to the file.
except urllib2.HTTPError as error: #print "Failed %s" % error.code pass
If the file is not found or is protected by an .htaccess file, this will cause urllib2 to throw an error, which we handle so the loop can continue executing.
for i in range(threads): print "Spawning thread: %d" % i t = threading.Thread(target=test_remote) t.start()
Looking at the bottom of the script, we are creating a number of threads (as set at the top of the file) that will each be called the test_remote function. The test_remote function operates in a loop that will keep execut- ing until the web_paths Queue is empty.
Let’s Check Our Code
For testing purposes, I installed Joomla 3.1.1 into my Kali VM, but you can use any open source web application that you can quickly deploy or that you have running already. When you run web_app_mapper.py, you should see output like the following:
Spawning thread: 0 Spawning thread: 1 Spawning thread: 2 Spawning thread: 3 Spawning thread: 4 Spawning thread: 5 Spawning thread: 6 Spawning thread: 7 Spawning thread: 8 Spawning thread: 9  => /htaccess.txt  => /web.config.txt  => /LICENSE.txt  => /README.txt  => /administrator/cache/index.html  => /administrator/components/index.html  => /administrator/components/com_admin/controller.php  => /administrator/components/com_admin/script.php  => /administrator/components/com_admin/admin.xml  => /administrator/components/com_admin/admin.php  => /administrator/components/com_admin/helpers/index.html  => /administrator/components/com_admin/controllers/index.html  => /administrator/components/com_admin/index.html  => /administrator/components/com_admin/helpers/html/index.html  => /administrator/components/com_admin/models/index.html  => /administrator/components/com_admin/models/profile.php  => /administrator/components/com_admin/controllers/profile.php
You can see that we are picking up some valid results including some .txt files and XML files. Of course, you can build additional intelligence into the script to only return files you’re interested in—such as those with the word install in them.