Skip to main content

Why blindly piping a script into Bash is a bad idea

I’m a regular listener of Steve Gibson’s Security Now! podcast. The last several episodes (episodes 557, 558, and 559) have discussed the security implications of piping a script into Bash (or some other script interpreter, such as Ruby or Python). For example, the install instructions for Homebrew, a fantastic package manager for Macs that I use on my work laptop, offers a really slick 1-liner for installation:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

This idiom is so common that there’s an entire Tumblr site showing various examples.

So, why is this a bad idea?

Well, let’s take a step back. In the above Homebrew example, you can view the content of the script easily just by visiting https://raw.githubusercontent.com/Homebrew/install/master/install. You can read through the script, clear as day, in your browser to verify that there’s nothing fishy going on. Looks great, right?

Not so fast! Homebrew is hosted by GitHub, which we can trust. However, a number of other random scripts aren’t hosted by sites that we might inherently trust (just look at the Tumblr site for examples). If a script is hosted by a malicious site, it turns out they can easily trick the user into thinking the script is ok, when it’s really not. I’ve cooked up a simple example to show this.

Suppose I wrote a script that you might want to install:

bash <(curl -s https://research.gfairchild.com/bash_pipe/script.sh)

You can open up the script in your browser (https://research.gfairchild.com/bash_pipe/script.sh), you can see that it's clearly a friendly script, right? Wrong! If you actually run the script in bash, you get this:

$ bash <(curl -s https://research.gfairchild.com/bash_pipe/script.sh)
I am a bad script! :(

What happened?

I created a single simple PHP file, script.php. I also setup an .htaccess file to rewrite the URL so that it really does look like you're downloading a bash script. The magic happens in script.php:

<?php
    if(substr($_SERVER['HTTP_USER_AGENT'], 0, 4) === 'curl' or
       substr($_SERVER['HTTP_USER_AGENT'], 0, 4) === 'Wget')
        echo 'echo "I am a bad script! :("';
    else
        echo 'echo "I am a good script! :)"';
?>

Here's the .htaccess file:

RewriteEngine On
RewriteRule ^script\.sh$ script.php [NC]

Here, all I do is detect the HTTP request's user agent. In short, the user agent is tacked on to most HTTP requests and tells the server how it is being contacted. For example, I'm running Firefox 46, and my user agent string is this:

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0

curl and wget have their own user agents. My system curl and wget user agents are curl/7.35.0 and Wget/1.15 (linux-gnu), respectively. PHP makes it really easy to check the user agent, so all I have to do is check to see if the user is using curl or wget. If they are, I send them the malicious script. If they aren't using curl or wget, I show them the good script. That way, if a user checks the script in their browser but doesn't check it once it's downloaded using curl, my malicious attempt is successful!

So what do I do?

These sorts of 1-liner scripts aren't inherently bad. Installing Homebrew without their install script would be a nightmare! So what do you do?

Simple: download the file using curl or wget, inspect it on the machine it's going to be installed on, and then run the script. Don't take your browser's word for it. Keep in mind that the script might actually need to download/install more things (perhaps the script itself curls a file), so if you want to thoroughly vet a script, it may be a small rabbit hole. But it's obviously doable. In the case of Homebew, you'd do this:

$ wget https://raw.githubusercontent.com/Homebrew/install/master/install
$ vim install  # review the install script for any malicious content
$ /usr/bin/ruby install

Ultimately, as Steve Gibson says, we have to trust someone or else we'll never use a computer. However, it's important to recognize when attacks are relatively easy and prioritize spending time analyzing those situations. Obviously, I'm not going to break open my CPU and use a scanning electron microscope to ensure it's doing what it says it is. But there are small, relatively quick things we can do to help maintain a secure environment, and not blindly piping scripts into Bash is one of those things.

Converting a Django ForeignKey to a GenericForeignKey

I’m currently working on a Django project called the SWAP that contains lots of disease outbreak time series data. Previously, the model looked something like this (I’m using Django 1.9 with Python 3.4+):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from django.db import models
 
class Outbreak(models.Model):
    ...
 
class TimeStep(models.Model):
    outbreak = models.ForeignKey(Outbreak)
    timestamp = models.DateTimeField()
    count = models.PositiveIntegerField(blank=True, null=True)
 
    class Meta:
        ordering = ['timestamp']
 
    def __str__(self):
        return '{} - {}'.format(self.timestamp, self.count if self.count else '[empty]')

We’re working on incorporating disease spread forecasts into the project. Forecasts actually require the same sort of time series infrastructure that outbreaks do. It seemed wasteful to create a new TimeStep class just for forecasts, so I started doing some research and quickly stumbled upon generic relations.

Django’s documentation on this topic is somewhat lacking, so it took a good amount of digging around to figure out how to structure things and migrate my existing models. The following resources were really useful in my search:

The last link was the most useful because it discussed how to create the migrations; however, it was written for South, which is now outdated because South has been absorbed into Django as the Migrations framework. I’m writing this post to describe how I migrated a model using a ForeignKey to a GenericForeignKey.

1. Add the necessary fields

Two primary fields are needed in the TimeStep class, content_type and object_id:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from django.contrib.contenttypes.models import ContentType
from django.db import models
 
class TimeStep(models.Model):
    outbreak = models.ForeignKey(Outbreak)
 
    content_type = models.ForeignKey(ContentType, null=True, blank=True)
    object_id = models.PositiveIntegerField(null=True, blank=True)
 
    timestamp = models.DateTimeField()
    count = models.PositiveIntegerField(blank=True, null=True)
 
    class Meta:
        ordering = ['timestamp']
 
    def __str__(self):
        return '{} - {}'.format(self.timestamp, self.count if self.count else '[empty]')

A GenericForeignKey is essentially a tuple, (content_type, object_id). The content type and object ID are all Django needs to perform a lookup. We’ll actually add the GenericForeignKey field later.

Note that null=True, blank=True is added to the new fields temporarily so that the migration step doesn’t complain about them being blank. We’ll remove those requirements later.

Now, make and run the migration:

./manage.py makemigrations
./manage.py migrate

2. Populate the new fields using a data migration

The fields now exist, but they’re empty. We need to create a data migration. To do this, use the following command to create an empty migration, replacing swap with your project’s name:

./manage.py makemigrations --empty swap

For me, this created a new file under swap/migrations/ called 0039_auto_20160307_1408.py. The final migration looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# -*- coding: utf-8 -*-
# Generated by Django 1.9.2 on 2016-03-07 14:08
from __future__ import unicode_literals
 
from django.db import migrations
 
 
def migrate_foreign_key(apps, schema_editor):
    """
        Data migration to populate the GenericForeignKey fields.
    """
    TimeStep = apps.get_model('swap', 'TimeStep')
    ContentType = apps.get_model('contenttypes', 'ContentType')
 
    outbreak_content_type = ContentType.objects.get(app_label='swap', model='outbreak')
 
    for timestep in TimeStep.objects.all():
        timestep.content_type = outbreak_content_type
        timestep.object_id = timestep.outbreak.pk
        timestep.save()
 
 
class Migration(migrations.Migration):
 
    dependencies = [
        ('swap', '0038_auto_20160307_1408'),
    ]
 
    operations = [
        migrations.RunPython(migrate_foreign_key),
    ]

migrate_foreign_key is a very simple function that properly specifies the content_type of all TimeStep objects as an Outbreak. It then copies each outbreak’s primary key into the object_id.

Note that the call to apps.get_model is necessary. You cannot just import your model and use it. As the official docs say, this is so that we get the correct versioned model in this context.

Run the migration:

./manage.py migrate

3. Cleanup

Finally, I need to do 3 things:

  1. Remove the old ForeignKey
  2. Add the new GenericForeignKey
  3. Remove the null and blank requirements

After I do these 3 things, my TimeStep model now looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db import models
 
class TimeStep(models.Model):
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')
 
    timestamp = models.DateTimeField()
    count = models.PositiveIntegerField(blank=True, null=True)
 
    class Meta:
        ordering = ['timestamp']
 
    def __str__(self):
        return '{} - {}'.format(self.timestamp, self.value if self.value else '[empty]')

Finally, make and run the migrations:

./manage.py makemigrations
./manage.py migrate

Running the migration may cause Django to prompt you to fill in values for the content_type and object_id fields. If it does, there should be an option to not fill in a value due to a data migration; select that option.

I added a couple new methods to TimeStep to make querying easier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db import models
 
class Outbreak(models.Model):
    ...
 
class ForecastSeries(models.Model):
    ...
 
class TimeStep(models.Model):
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')
 
    timestamp = models.DateTimeField()
    count = models.PositiveIntegerField(blank=True, null=True)  # we allow for a null blank count so that we can allow the analyst to specify the start/end of the interval
 
    # store the content types for each type of object that will use TimeStep
    _content_types = dict()
    _content_types[Outbreak] = ContentType.objects.get_for_model(Outbreak)
    _content_types[ForecastSeries] = ContentType.objects.get_for_model(ForecastSeries)
 
    class Meta:
        ordering = ['timestamp']
 
    def __str__(self):
        return '{} - {}'.format(self.timestamp, self.value if self.value else '[empty]')
 
    @staticmethod
    def get_outbreak_timesteps(outbreak):
        """
            Get a particular Outbreak's time series. You *cannot* do `TimeStep.objects.filter(outbreak=o)` because this class
            uses a GenericForeignKey.
        """
        return TimeStep.objects.filter(content_type=TimeStep._content_types[Outbreak], object_id=outbreak.pk)
 
    @staticmethod
    def get_forecast_series_timesteps(forecast_series):
        """
            Get a particular ForecastSeries' time series.
        """
        return TimeStep.objects.filter(content_type=TimeStep._content_types[ForecastSeries], object_id=forecast_series.pk)

Because I can no longer do TimeStep.objects.filter(outbreak=o), querying is a tad more complex, but it doesn’t have to be. These methods allow me to do TimeStep.get_outbreak_timesteps(o) or TimeStep.get_forecast_series_timesteps(f) to pull a particular model’s time series.

That’s it! Nice and easy. Let me know if you find this useful or have any questions.

Installing the requirements for Pillow 3 on Debian

I’m working on installing Mezzanine, a CMS written in Django, for a project I’m working on. Mezzanine requires Pillow, an imaging library for Python. Pillow requires/recommends a number of libraries. It took me a little bit to figure out how to get (mostly) everything working on Debian 8.2 (Jessie). Here’s a command to install all of Pillow’s requirements in one fell swoop:

sudo aptitude install libjpeg62-turbo-dev libopenjpeg-dev libfreetype6-dev libtiff5-dev liblcms2-dev libwebp-dev tk8.6-dev

When I run pip install -v pillow, I see this in the output:

PIL SETUP SUMMARY
--------------------------------------------------------------------
version  Pillow 3.0.0
platform linux 3.4.2 (default, Oct  8 2014, 10:45:20)
 [GCC 4.9.1]
--------------------------------------------------------------------
*** TKINTER support not available
--- JPEG support available
*** OPENJPEG (JPEG2000) support not available
--- ZLIB (PNG/ZIP) support available
--- LIBTIFF support available
--- FREETYPE2 support available
--- LITTLECMS2 support available
--- WEBP support available
--- WEBPMUX support available
--------------------------------------------------------------------
To add a missing option, make sure you have the required
library, and set the corresponding ROOT variable in the
setup.py script.

Unfortunately, it doesn’t seem like Pillow recognizes TCL/TK (despite the fact that I installed tk8.6-dev, which includes tcl8.6-dev), so I can’t get Tkinter support working. I also installed OpenJPEG via libopenjpeg-dev, but the version in Debian seems to be too old:

$ aptitude show libopenjpeg-dev
Package: libopenjpeg-dev                 
State: installed
Automatically installed: no
Multi-Arch: same
Version: 1:1.5.2-3
Priority: extra
Section: libdevel
Maintainer: Debian PhotoTools Maintainers <pkg-phototools-devel@lists.alioth.debian.org>
Architecture: amd64
Uncompressed Size: 111 k
Depends: libopenjpeg5 (= 1:1.5.2-3)
Description: development files for OpenJPEG, a JPEG 2000 image library - dev
 OpenJPEG is a library for handling the JPEG 2000 image compression format. JPEG 2000 is a wavelet-based image compression standard and permits progressive transmission by pixel and resolution accuracy for progressive downloads of an encoded image. It supports lossless and lossy compression, supports higher compression than JPEG 1991, and has resilience to
 errors in the image. 
 
 This is the development package
Homepage: http://www.openjpeg.org

Tags: devel::library, role::devel-lib

The Pillow docs state that version 2.0.0 and 2.1.0 are supported, so 1.5.2-3 must be too old.

If someone knows how to get Tkinter or OpenJPEG working, please let me know in the comments! I don’t think it’ll matter much in the end, but it’d be nice to have all of Pillow’s functionality available.

UPDATE: I was able to get Tkinter working with the help of the Pillow devs. Issue #1473 has the full discussion, but the main takeaway is that I had to install python3-tk, which enables Tkinter support.

Additionally, the Pillow docs actually contain a Building on Linux section that I missed before. It more or less echoes what I lay out in this blog post. This is the final command I had to use:

sudo aptitude install libjpeg62-turbo-dev libopenjpeg-dev libfreetype6-dev libtiff5-dev liblcms2-dev libwebp-dev tk8.6-dev python3-tk

Unfortunately, OpenJPEG still isn’t supported, but that’s just because Pillow requires a newer version than is contained in the Debian repos; build it from source if you need it, and you should be good to go.

yelpapi updated with Phone Search API support

Recently, Yelp added a new API, a Phone Search API. This allows the user to look up businesses by phone number. I just finished adding Phone Search API support to my yelpapi Python project. After looking around, none of the other Yelp v2.0 API implementations support the Phone Search API yet. It also appears as though all the other Yelp API implementations still rely on pre-defined classes to represent search results, so I’m sure it’ll be a while before they add support for the new API.

Call me Dr. Geoff

I don’t often write personal blog entries, but this warranted it. As of just a couple weeks ago, I am officially not a student anymore. I am not a student. I’ve been a student for, what, 25 years straight? To suddenly not be a student and have the freedoms (and salary) that come with that is jarring. And to top it off, not only am I not a student, but I now have a Ph.D. People can call me Dr. Geoff.

dissertation page 1 Here are some stats on my dissertation, titled Improving Disease Surveillance: Sentinel Surveillance Network Design and Novel Uses of Wikipedia:

  • page count: 151
  • word count: 34,573
  • character count (with spaces): 222,941
  • number of references: 198
  • number of tables: 10
  • number of figures: 16

The dissertation will be posted on The University of Iowa’s Institutional Repository some time soon. It’ll be open access. I’m really proud of it. Once it’s published, I’ll post a link here in case anyone wants to read it.

My defense couldn’t have gone better. All the publicity our Global Disease Monitoring and Forecasting with Wikipedia paper has gotten, which just so happens to be chapter 3 of my dissertation, couldn’t have been timed better. I may be a little biased, but chapter 4 of my dissertation, which uses some natural language processing techniques to elicit disease information from article content, is pretty damn cool stuff too. That paper should be submitted in about a month.

I’ll be sticking around Los Alamos National Laboratory (LANL). I’ve become quite fond of this place. I work with some amazingly talented people on some extremely cool work. I mean, we did a Reddit AMA that hit the front page! Besides that, LANL is located in a really neat little town that suits me perfectly; it has one of the best climates I’ve ever experienced, and it’s great for biking in the summer and snowboarding in the winter.

Overall, I feel like a tremendous stress has been lifted from my shoulders. I can now work with fewer distractions and more tenacity. Perhaps more importantly, I no longer feel guilty for doing fun things in my off time. Grad school has this inherent ability to make you feel guilty when you’re not working. It’s certainly nothing my advisor (Alberto Segre) or LANL mentor (Sara Del Valle) pushed on me; it’s just something all grad students feel. I’ve always maintained that’s it’s incredibly important to separate work from life, but when you’re in grad school, that’s often easier said than done.

During the decompression phase after my defense, I realized that I’ve never gone on a vacation. Sure, I’ve done little weekend snowboarding trips or backpacking trips, but I’ve never taken a real vacation. How could I? I’ve been a student practically since I was born! In the short term, I’m going to be snowboarding a lot. I’m going on a cruise with my sister and some good friends in late January. I want to travel a lot next summer; I’m thinking about a long motorcycle trip with my buddy Rajeev.

Whatever happens, I am done with school. Forever. Here’s to the next phase of life!

How to fix the Home/End/Page Up/Page Down keys for OS X terminal and vim

People all over the internet complain about Apple’s (incorrect) mapping of the Home, End, Page Up, and Page Down keys. I spend a lot of time in the terminal and in vim, and it’s important to me that these keys function properly. Here’s what I needed to do in order to get these keys working properly in the terminal and in vim:

  1. Open up the terminal preferences.
  2. Go to the Settings tab, and select the desired profile.
  3. Go to the profile’s Keyboard tab.
  4. Add (or edit) the Home key so that it sends this text to the shell: \033OH
  5. Add (or edit) the End key so that it sends this text to the shell: \033OF
  6. Add (or edit) the Page Up key so that it sends this text to the shell: \033[5~
  7. Add (or edit) the Page Down key so that it sends this text to the shell: \033[6~

There are some other commonly recommended sequences (e.g., \033[1~ instead of \033OH), but the sequences above are the only sequences I’ve found that work in both the terminal and vim.

How to change a remote repository URL in Git

I just ran into a situation where I needed to change a remote URL for a personal repository in Git. The project lived on a server at work, but I’m going to be going out of town for several weeks starting tomorrow. I need this project, and unfortunately, I can’t access it from home due to the work firewall.  What I decided to do is just move the repo to my personal server for now. Here’s how I did it (if it’s not obvious, I work over SSH).

First, I just wanted to see the current configuration:

~/Documents/project> git remote show origin
* remote origin
  Fetch URL: olduser@oldserver.com:/path/to/project.git
  Push  URL: olduser@oldserver.com:/path/to/project.git
  HEAD branch: master
  Remote branch:
    master tracked
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (up to date)

Next, I need to SSH into the new server and create a new bare repo into which I’ll push my project. Since I store my git projects in /srv/git, I need to make sure I give the appropriate ownership to the project.

~$ cd /srv/git/
/srv/git$ sudo mkdir project.git
/srv/git$ sudo chown newuser:newuser project.git/
/srv/git$ cd project.git/
/srv/git/project.git$ git init --bare
Initialized empty Git repository in /srv/git/project.git/

The new server is now ready. All that’s left is for me to change the remote repo URL of the project on my local machine and then just push the project to the new server.

~/Documents/project> git remote set-url origin newuser@newserver.com:/srv/git/project.git
~/Documents/project> git push
Counting objects: 37567, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (37556/37556), done.
Writing objects: 100% (37567/37567), 88.91 MiB | 3.76 MiB/s, done.
Total 37567 (delta 4931), reused 0 (delta 0)
To newuser@newserver.com:/srv/git/project.git
 * [new branch]      master -> master

That’s it! All pushes/pulls from now on will happen with the new server. Pretty easy!

eclipse-hasher updated and re-released on GitHub

Several years ago, I released an Eclipse plugin called Hasher. Hasher’s goal is to output values of common hash algorithms (MD5, SHA-1, SHA-256, SHA-384, and SHA-512 right now) of files selected in Eclipse. Hasher had fallen by the wayside and last worked under early version of Eclipse 3. I recently had a need for it for a personal project and decided to update it. Turns out, quite a bit has changed. Most notably, Eclipse actions are now deprecated in favor of Eclipse commands. Code using commands is much cleaner, but it’s quite a bit different, so I essentially had to rewrite the entire plugin. Also, I had some dependency issues that plagued me for far too long, but thanks to Stack Overflow, I was finally able to get things straightened out.

Hasher is now live on GitHub (https://github.com/gfairchild/eclipse-hasher), freshly tagged with v1.2. One of the things I learned during this rewrite is that there’s a lack of good examples and documentation out there for modern Eclipse plugins. I’m hopeful that Hasher can be useful to someone wanting to get into writing Eclipse plugins. Hasher is pretty simple right now, but it’s non-trivial (has external dependencies, interacts meaningfully with Eclipse – more than just Hello World). If you find yourself using it to learn, please let me know!

There’s still a to do list. I want to make the output prettier using a custom view. A tree view or a table view (or perhaps some hybrid) would probably be ideal. I don’t know how to do a custom view yet, though, so that’ll add to the learning process. Also, I want to make use of Eclipse’s Jobs API. Right now, I’m just manually creating a new thread to do computations. This works and leaves the UI free to do its work, but it’s not elegant and doesn’t take advantage of several nice features Eclipse offers for background jobs.

If you use Hasher, let me know what you think!

pyHarmonySearch now supports Python 3+

As promised yesterday, pyHarmonySearch, my open source pure Python implementation of the harmony search algorithm, now fully supports Python 3. As with yelpapi, it was actually a really simple process. Only a few lines of code needed to change.

Also of note, pyHarmonySearch now properly handles KeyboardInterrupt exceptions. pyHarmonySearch uses Python’s multiprocessing.Pool to run multiple searches simultaneously. multiprocessing.Pool doesn’t natively handle KeyboardInterrupt exceptions, so special care must be given to ensure proper termination of the pool. The solution I used comes from this Stack Overflow question.