M-Turk Guide

Amazon Mechanical Turk Guide for Social Scientists (updated 29 Jan 2016)
By Michael Buhrmester (buhrmester at gmail dot com)

For our evaluation of MTurk in PoPS, see Buhrmester, Kwang, & Gosling & accompanying supplement

Note: This page is meant to help the curious researcher successfully get studies up and running on Mechanical Turk with minimal fuss. I’ve answered a lot of MTurk questions this past year and have tried to condense my answers into the FAQ below. I encourage anyone with any tips/comments/questions to please contact me!

—————————————–

FAQ

Amazon increased their % fee to 40%?!

The best workaround right now is to post your study as HITs allowing a maximum of 9 respondents. Instead of a 40% fee, you’ll pay only 20%. You can either re-post your 9-person max HIT manually again and again and again until you hit your desired total N, or you can automate the process by following these steps: https://docs.google.com/presentation/d/1Y_lvecsOefCfkXkkrdFtW1GH-NkoZidKzKXqFVJaLH4/edit#slide=id.p . Turkprime.com also offers this feature.

How large is the participant population on MTurk?

According to Stewart et al. (2015), the answer is about 7,300. Here’s their report: http://www.ucl.ac.uk/lagnado-lab/publications/harris/StewartEtAl_JDM_MTurk.pdf

Can I screen participants who fit my specific criteria?
Yes (ish). When creating a HIT, click ‘advanced’ to open the possible ‘qualifications’ parameters. Currently, MTurk offers 6 easy to use ‘qualifications’ and a more complicated, user-defined custom qualifications option. With the default options, you can require that in order to complete your HIT, a participant must be located in a certain country (based on the country they indicated when creating their worker account), approval rate %, and total # of HITs approved. Certain workers receive ‘Masters’ status from MTurk in one of three flavors, but MTurk decides who gets this status and there’s not solid information on what qualifies someone to become a master. Thus, I recommend not using the master’s qualification because you don’t know who you’re excluding from your sample pool.

So what if you want to include males 18-24, currently pregnant women, or middle-aged men who’ve recently purchased a convertible? One approach would be to simply ask that people who fit your criteria only participate in your study. The problem, of course, is that people who don’t fit your criteria can ignore you potentially without consequence. How can this be prevented? One solution that I’ve found to work is to screen participants yourself. It’ll cost a little money because you’ll be paying a small amount to potentially many people to who don’t fit your criteria, but it will provide you with a sample pool without showing your cards as to the specific population you aim to study. Essentially, you’ll want to embed your screening criteria within a number of other questions so the screening items don’t look suspicious. For everyone who qualifies, you could 1) instantly give them instructions for how to proceed with the real study (i.e., within SurveyMonkey, use the logic commands to have them continue onto the real survey) or 2) let them know that if they qualify, they’ll be contacted via an MTurk message.

Another issue to be aware of is that MTurk workers come from all over the world. If you leave your HIT up overnight (from the US), expect that the vast majority of responses will be coming from people on the opposite side of the planet. Deciding who to limit your survey to is obviously important, as is when you have it posted and available.

Can I run longitudinal studies on MTurk?
Yes. However, it is against the MTurk ToS to ask for personally identifying info such as their email address. One option  is to contact workers using the “bonus worker” feature. You’ll have to award workers at least a penny, but you’ll also be able to include a message that they’ll receive. You can get to the “bonus worker” feature by clicking “Manage” > Workers > find the worker ID that matches who completed your wave 1. Alternatively, you can pull up the batch and see all the workers who completed your wave 1 batch at once.
You can also email workers without giving a bonus through the MTurk system’s more advanced “command line reference” toolbox. It requires some basic coding know-how to use.

Another more complicated option can be found here: https://thebehaviorallab.wordpress.com/2014/09/19/how-to-email-multiple-mturk-workers/

Whichever method you choose, you’ll want to do some things to ensure that only your wave 1 participants come back for wave 2. One route is to take advantage of the “bonus worker” feature. Give each worker a code to enter at the end of wave 2 so you know who completed it successfully. Then give the worker their agreed-upon payment in the form of the bonus. The downside of this is that the “contract” or promise to pay is agreed upon through email rather than through accepting an altogether different HIT. So the second way to go is to create a second HIT and invite workers from wave 1 to complete the second HIT for wave 2. You’ll want to be explicit in the HIT’s description that the HIT is only for participants you’ve contacted and that people who weren’t contacted (i.e. your wave 1’ers) won’t be paid. A password/codeword system that you send in the email invite for wave 2 would be useful here.

How do I know that the participants are paying attention, or worse, are even real people and not survey-taking robots?
One method is to provide an attention-check item somewhere in your study.  Here’s a paper describing one method — http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1003424. It’s been suggested that the same item (i.e., the one in the paper) not be copied again and again, potentially conditioning workers to identify the item. Get creative and make up your own. For an excellent treatment of the advantages and disadvantages of attention check items, see: http://onlinelibrary.wiley.com/doi/10.1111/ajps.12081/abstract

Why does it say I have to be from the US to make a requester account?
It’s sort of a complicated answer, but there’s some potential ways around it. A good discussion can be found here: http://www.behind-the-enemy-lines.com/2010/02/why-mechanical-turk-allows-only-us.html

Is MTurk ethical to use?
There are some ethical concerns about MTurk regarding low payment, with some calling MTurk an online sweatshop. In some respects, these concerns are warranted – getting paid pennies to identify objects in a picture or retrieve the location of some website ad infinitum can look a lot like a sweatshop. One response by some to these concerns has been to point out that MTurk is a voluntary place to earn money. The investment on the part of workers to get started on the site is extremely low, and they are free to come and go as they please. Basically, or so the argument goes, MTurk is not like a regular job and therefore the same ethical rules don’t apply. Multiple findings support this stance to some extent — a majority of workers use the site casually as an alternative to less-productive internet activities (i.e., surfing facebook) and don’t rely on MTurk as their #1 source of income.

If this argument doesn’t feel satisfying, you’re not alone. There are a number of things a researcher can (and should) do to mitigate ethical concerns about payment:

1. Simply pay more. When you enter in how long you expect the study to take and how much you will pay, MTurk calculates the hourly wage you’re paying. Our work and others has shown that workers are sensitive to how much they are getting paid – the more you pay, the quicker the data rolls in. Everybody wins.

2. Be explicitly clear about how long the study will take. Requesters input and workers see the time allotted to complete the study; it’s up to the requester to describe how long the study will actually take in the description write-up. One reason workers may take on tasks that pay a low wage is because they believe they can do it faster than expected, pushing their hourly work wage higher. Be clear that the study will take everyone X amount of time.

3. Treat workers more like in-the-lab participants. Simply describing the study as an “academic research study” doesn’t mean a whole lot to most people. Describe in more detail how important their participation is to conducting psychological science — that their responses will be used to make generalizations about how people think, feel, and act in general. Workers who may be accustomed to completing relatively mindless tasks will appreciate what they are doing and may realize that their participation is less about making money and more about the experience. Our data and others suggest that a strongly held motivation of workers is to engage in interesting activities in productive ways. Hone in on this motivation. Part of the supposed reward for participants in undergrad research pools is the knowledge and insight that they gain by participating in research studies. Do the same online – give participants the opportunity to learn about cutting-edge research during the debriefing. Go even further than the traditional debriefing too if you can – provide personalized feedback based on their responses in the study. Sites like http://www.outofservice.com/ have received millions of responses just by providing Big 5 personality feedback to people. Your feedback doesn’t even have to be necessarily related to the focus of your research – it just needs to be something that is interesting and informative. If our goal as researchers is to spread psychological knowledge, MTurk is a high-volume way to get this done.

In short, I think there are many simple and creative ways to keep MTurk from becoming an online sweatshop. I’m (hopefully) preaching to the choir here: it’s the responsibility of every researcher using the site to adopt an ethical approach to MTurk – wield your power wisely.

Should I trust the data I collect from MTurk?
There’s many ways to answer this question. One route researchers have taken is to replicate reliable effects found in the lab on MTurk. A fast growing body of work is showing that turkers think and act a lot like other samples. A number of papers have made more explicit comparisons between turkers and various other types of samples, finding a lot of similarities. Before getting started, I suggest you review this growing literature to learn about the advantages and limitations of MTurk compared to other methods. I hope to add a list of MTurk-related papers here soon.

What’s the deal with taxes and the IRS on MTurk?
Because you are paying people and acting essentially as a part-time employer, taxes potentially become an issue if you pay any individual worker more than the minimum threshold for IRS reporting, which I believe is $400 a year. This $400 threshold is per worker in a tax year, not the total amount you’re paying out to all workers. I’ve never come remotely close to that threshold, and you probably won’t either. In short, tax reporting should not be an issue. Amazon has a FAQ from the worker perspective on these issues here: https://www.mturk.com/mturk/help?helpPage=worker#tax_why_tax_info .

Does MTurk keep demographic info on file or do I have to ask?
You have to ask.

—————————————–

GETTING STARTED – THE BASICS

Visit www.mturk.com and get the lay of the land. Peruse the introductory pages about being a worker and being a requester. Take a look at the “HITs” tab where you can see all the currently available tasks. Clicking on the “Get Started” button on the requester side takes you here —https://requester.mturk.com/mturk/resources— and has a lot of business-oriented stuff on it that you can ignore.

Read the Basic Getting Started Guide for Requesters.
From the Resource Center page, under the How To Guides box on the right, click on “Requester Website User Guide”. Here’s some extra things to consider:

Making an account:
1. When creating an account, you might consider making a new account that is separate from an existing personal Amazon.com account (with new email address).

2. You will also be asked for a Requester name. This name is what will be seen by other Turkers, so choose wisely (such as using X Lab rather than your personal name).

3. Before you are able to post a study (i.e., a HIT/batch), you need to pre-pay for the work the Turkers are about to complete. Once logged in, in the top right-click on “Account Settings”. In the bottom left, click on “Prepay for Mechanical Turk HITs”. Punch in how much you want in your Turk account, then you’ll be taken to a billing screen where you enter in credit card information. Ask whoever is in charge of university/department participant payments before doing this – there’s likely a form for you to complete and a receipt to return. MTurk will send you an email confirmation of the pre-payment purchase and you can also find a log under “View Transaction History”.

Designing a HIT Template:
1. Across the top of the page should be Design, Publish, and Manage tabs. Start with the Design tab. If you want to create a simple survey within MTurk (rather than linking to a different survey site like SurveyMonkey, described below), I would start with the “Survey Template” near the bottom.

2. The guide contains some potentially confusing tips about how to create multiple HITs within a batch (e.g., if you want people to rate a bunch of images and get paid for each individual image (or set) of images, MTurk can do that). If you’re looking to do simple surveys or experiments, you can safely ignore those sections.

3. On the “Enter Properties” page, I tend to make my titles short and include the estimated time it takes to complete (e.g., “5-10 minute survey on self-attitudes”). Then in the description I’ll explain what the study entails a little more. Enter in a bunch of keywords related to your study (survey, experiment, psychology, questionnaire, etc.). “Time allowed per assignment” refers to the amount of time a worker has from the time they click “accept HIT” to when they are allowed to “submit HIT”. I generally give plenty of time here in case people forget to submit after completing the survey (this is more common when you link them to a different site like SurveyMonkey and they forget to navigate back to MTurk and submit it).

4. The criteria functions are important. People from all over the world are on MTurk, so think about who you want to participate in your study. For example, if you put no “location” restrictions on your HIT and leave the HIT available for completion overnight, expect to wake up to a lot of submissions completed by people living in India or somewhere else on the opposite side of the world (if you are from the US of course). MTurk seems to have caught on in some Asian countries. So as a researcher, ask yourself if you would like to sample from these countries. In my experience, some foreign Turkers tend to complete surveys quicker and are more likely to skip questions that require typed short-answer responses. This may be because they are more motivated by money than are workers from other countries (although this is certainly an empirical question). If you do not wish to place a location requirement but want to avoid potentially sub-par work, the approval rating function is your best shot. The approval rating is calculated for each worker and is the percentage of approved submissions divided by the worker’s total submissions. So if a worker has 3 approved submissions but 1 rejected submission, he/she would have a rating of 75%. MTurk recommends a 95% approval rating. I’m not sure how they decided on that number – perhaps it’s p-value inspired? I’ve personally moved the approval rating around between 50-99 and at least in my experience, the higher rating requirement seems to slow down the flow of incoming submissions without affecting data quality, but I’ve done no formal test of this.

5. Payment amount is up to you. There’s significant effects of both the estimated time it takes to complete the HIT and the payment amount, so you’ll have to peruse the “market rate” based on what other Requesters are paying at that time. Generally, it’s going to be really cheap…I’ve collected loads of data paying around 10 cents per person for 10 minute studies (see Buhrmester, Kwang, & Gosling, in press).

6. On the Design Layout tab, here’s how I generally lay things out…

a. Title
b. Study description (can put IRB consent statements here)
c. Statement about re-posting of the HIT (explained below)
d. The survey / instructions and link to survey (e.g., SurveyMonkey)
e. Completion code / comments box (optional)

7. For c., I have found that data can be collected faster if you re-post your HIT after a day or even a few hours. When you first publish your HIT, it is loaded to the top of the long list of available HITs (if you click on the “HITs” tab from the main page, your freshly published HIT should appear shortly). As other Requesters publish their HITS, they get put on top and yours slides down the list. Apparently, most workers hunt for work from the top down, so fewer eyes will see your HIT after it’s been posted for a while. My solution has been to re-post the HIT, sending it to the top again. The potential issue with this is that there’s no way I know of to disallow people who’ve already completed the HIT from completing it again. To deter would-be duplicate responders, I include a statement that says “This HIT is periodically re-posted. If you’ve already completed this HIT previously, please do not complete it a second time. You will not be compensated a second time.” This statement has worked for me for the most part, though I recommend checking your actual data for duplicate responders.

8. If you want to use an outside survey site like SurveyMonkey, I recommend using a completion code system to 1) deter people from accepting and submitting the HIT without having actually gone to SurveyMonkey and completing the study and 2) to be able to link MTurk submissions to the SurveyMonkey data. The high-tech way to do completion codes (which I don’t think you can do with SurveyMonkey) is to assign each person a different code at the end of the study, and instruct him/her to enter the same code in a text box on the MTurk page before submitting. The low-tech way that I use is to instruct each participant to make up their own 4 or 5 digit completion code number, enter it on SurveyMonkey, and again on MTurk. If more than one person makes up the same code, I can use the timestamp data from each to figure out who’s who.

9. When linking to a different site, I’ve found that explicitly asking people to open the page in a new window/tab helps out. If you link them with a hyperlink and it opens in that same window, people have a hard time navigating back to the submission page and will likely e-mail you to complain.

Publishing your HIT
This should be pretty straightforward. Double-check that everything looks right and you’re ready to make your HIT available to turkers. Make sure you’ve got enough money in your account to pay for the # of responses X payment amount per response.

Managing your HIT
1. I have to admit, the first time I collected data on MTurk, I was glued to the screen, watching the green bar tick up as the submissions came in in real-time. If you’re linking to an outside survey site, you’ll want to keep tabs on how many people have submitted on MTurk versus how many have completed the survey on your site.

2. As soon as submissions come in, you can individually review them and award or reject payment. Deciding when to reject payment can potentially be tricky – you’ll definitely want to speak with your IRB about what circumstances are appropriate. You can also reject or award payment en masse. Note that under the design tab, you enter a cutoff time for when payment will be automatically rewarded.
Another good resource for starting up is this best practices guide: http://mturkpublic.s3.amazonaws.com/docs/MTURK_BP.pdf

OTHER LINKS:
Basic Resource Center for Requesters — https://requester.mturk.com/mturk/welcome

Advanced “how-to” guide by Amazon (good if you have programming knowledge and want to do more complex stuff) —http://docs.amazonwebservices.com/AWSMechTurk/latest/AWSMechanicalTurkGettingStartedGuide/

FAQs by Amazon — https://requester.mturk.com/mturk/help?helpPage=main

Some meta-data on the MTurk world – http://mturk-tracker.com/about/

A recent newsfocus report in Science by John Bohannon about MTurk in the social science world — http://www.sciencemag.org/content/334/6054/307.short

An excellent in-depth guide by Winter Mason & Siddharth Suri can be found here —http://www.mendeley.com/research/guide-conducting-behavioral-research-amazons-mechanical-turk/

REPLICATION & EVALUATION PAPERS:
Berinsky, Huber, and Lenz (2012) successfully replicated several survey experiments from psychology and political science with MTurk: http://www.mit.edu/~glenz/Mechanical_Turk.pdf

Using a non-U.S. MTurk sample, Berinsky, Quek, and Sances (2012) conducted successful replications of the classic framing experiment by Tversky and Kahneman (1981); Kam and Simas’s (2010) experiment in American politics; and Tomz’s (2007) experiment in international relations: http://www.michaelsances.com/papers/mturk.pdf

Heer and Bostock (2010) successfully replicate previous studies on spatial encoding and luminance contrast using MTurk as a platform for graphical perception experiments: http://vis.stanford.edu/files/2010-MTurk-CHI.pdf

 In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods in linguistic research: http://www.doiserbia.nb.rs/img/doi/0048-5705/2010/0048-57051004441S.pdf

This paper presents the results of a comparative study involving classic experiments in judgment and decision-making. The authors found no differences in the magnitude of effects obtained using Mechanical Turk and using traditional subject pools: http://repub.eur.nl/pub/31983/jdm10630a%5B1%5D.pdf

This article reviews recent research about MTurk and compare MTurk participants with community and student samples on a set of personality dimensions and classic decision-making biases. Across two studies, they find many similarities between MTurk participants and traditional samples as well as a few differences: http://onlinelibrary.wiley.com/doi/10.1002/bdm.1753/full

This paper demonstrates replication of traditional findings with MTurk in a prisoner’s dilemma game, with priming, and in replicating a famed Tversky-Kahneman result: http://www.nber.org/papers/w15961

 Suri and Watts (2011) successfully replicate a public goods experiment on MTurk that was conducted initially in a classroom by Fehr and Gachter (2000): http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016836

Simons and Chabris (2012) replicated their initial telephone survey using Mechanical Turk, producing remarkably similar results: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0051876

Chandler and Shapiro (2016) reviewed clinical research using MTurk participants, describe the challenges of using MTurk, and provide a great set of best practices. Highly recommended read: http://www.annualreviews.org/doi/abs/10.1146/annurev-clinpsy-021815-093623?journalCode=clinpsy

Gosling and Mason (2015) provided a broad overview of types of internet research, methodological issues, and emerging directions: http://www3.ntu.edu.sg/home/linqiu/publications/Internet%20research%20in%20psychology.pdf

MTURK FOR SOCIAL SCIENCES BLOGS
Gabriele Paolacci & Massimo Warglien’s excellent blog that also touches on many MTurk issues:
http://experimentalturk.wordpress.com/

Panos Ipeirotis’ excellent blog on many MTurk issues: http://www.behind-the-enemy-lines.com/

A post on pros & cons of MTurk for science: http://blogs.scientificamerican.com/guilty-planet/2011/07/07/the-pros-cons-of-amazon-mechanical-turk-for-scientific-surveys/

—————————————–

Feel free to contact me with any MTurk related advice, updates, news, etc. I’m happy to add and collect as many MTurk related resources here as possible. Happy turking.