DIY Survey panels / Polls with Amazon Mturk

Polls! (huh) What are they good for? Hopefully something, say it again.

What would you ask a national poll if you could? Could you afford to commission one?

This is something that’s come up a number of times in talks around audiences – how do we ask the people who AREN’T attending? How do we ask people from outside of city X what they think about city X? Would it be interesting to see what they thought before and after some big event? And so on.

I’m glad I once noted this down as I can’t seem to find it on their website: One closed-options question: £500 – results next day. YouGov Polling Omnibus service: Nationally representative sample of 2000 UK Adults aged 18+

I imagine £500 is probably still the general ball park needed for any reasonable nationally representative poll, whether you used YouGov, Ipsos MORI or others. That’s 40p per person, maths fans. (Update: I think I found the current equivalent, if anything, it’s cheaper now).

SurveyMonkey Audience offers a similar self-service type of thing, however given their US basis, it only really seems cost effective if you are buying a US based sample (79p per response, compared to
£3.50 per response for UK!). It’s probably not going to be nationally representative but would be easier and probably accurate enough for many purposes. For reference, the ‘ultimate’ poll; that is to say, national elections, have been estimated to cost about £6 per vote to administer.

 

"DSCN0841" by NelC is licensed under CC BY-NC-ND 2.0

“DSCN0841” by NelC is licensed under CC BY-NC-ND 2.0

“But I thought polls were all a load of cobblers in this post-truth age?”

Nothing’s perfect, of course, but at least in the most recent general election in the UK, the polls were almost spot on. Overall trends are clear even if individual polls differ.

“It’s easy to lie with numbers, it’s even easier to lie without them”

I have been playing around with Amazon Mechanical Turk for other reasons (here and here), but this is absolutely the kind of service makes you think: “Maybe I can do my own polls?”

Even if the polls are almost certainly not going to be representative, surely it’s got to be of some use to throw out questions to a bunch of random people on the internet? Maybe we should just call it a survey panel instead of a poll.

There are growing numbers of researchers using Mturk to quickly distribute surveys and do social-research type things; some of whom even seem to be able to adjust/weight responses so that they are more representative.

Some references before we get to it (should all be open-access)
Running Experiments on Amazon Mechanical Turk (2015)
Evaluating Amazon’s Mechanical Turk as a Tool for Experimental Behavioural Research (2013)
The Demographic and Political Composition of Mechanical Turk Samples (2016)

Let’s have a look at some of my results!

Part 1: A festival

This is with a real festival who will remain anonymous.

I set up a task on Mturk which was restricted to UK-based respondents, this basically asked people to click a link which took them to a Google Form, once they had answered the questions a code would be shown, then they copied the code back into the Mturk form to complete the task. When checking to approve/reject tasks, I could simply look to see if the right code had been input to see if they had completed the form. Overall I found this easier than using Mturks own HTML editing stuff to set up the form here, but that’s a possibility too.

From reading around, I see that some people do a screening step with multiple tasks, eg: set a low reward for some screening questions, ask for people’s email addresses then send them a larger task/survey directly.

Of course this does nothing if people chose to lie on the form, but I’m happy to stick with the assumption that even if you are mindlessly clicking buttons to make pennies on the internet, it generally costs you more effort to lie than to tell the truth. And hopefully with a big enough sample size, yadda yadda yadda…

Respondents received $0.05 for answering 4 questions: Gender, Age, Location and whether they had heard of the Festival or not.

82 responses were collected over the course of about 1 month, from approximately the middle of the festival to about 2 weeks afterwards. I think I set an upper limit of 100 responses/tasks to be completed so I was surprised it took so long and eventually didn’t complete. I’m guessing it is related to a relatively smaller UK base of respondents compared to other tasks which I have set up to be available internationally. Or once a task has been live for a while it drops down the list of jobs that Mturk workers can select to do.

The main question was whether they had heard of the festival or not:

Interesting if a little arbitrary so far, without anything to compare it to, but we’ll get onto that. In the Yougov Brand Awareness ratings, a national awareness of 32% would place it in the top 2000 brands (1630th): https://yougov.co.uk/ratings/consumer/fame/brands/all

Then we can take the basic awareness yes/no and split it up by the other factors:

This is a fairly small sample, especially so when breaking it up into age groups or regions. For instance, there were 26 total Female respondents and 56 total Male respondents (an open text option was also available though in this case no-one used it).

In terms of age, 25-39 year olds made up 72% of the total sample so I wouldn’t read anything more into those results; and there doesn’t seem to be much to split between these three groups anyway.

However, the regional bit is interesting, because (and this is probably not a surprise if you know where I do most of my work!) X festival did happen to take place in the East Midlands, so the higher awareness here makes sense. Also, compared to the other factors, responses were a bit more evenly spread around. London was the largest single region with about 25% of all responses.

Very little of this stuff is really strong enough to be significant, but also I wouldn’t be surprised if it was done more rigorously and came up with a same or similar result. Maybe I’m just optimistic.

But on the upside, that’s $0.05 per response (+$0.02 to Amazon). $5.72 in total? (£4.36)

Part 2: Arts Council England

Let’s try something a bit bigger.

I kept the survey the same, but replaced X Festival with Arts Council England.

I ran three waves which were open for 7 days each:

Total Responses How many in the first 24hr?
Tuesday October 8th 92 40
Friday November 8th 72 39
Thursday December 12th 53 27
Total: 217

Again, I was aiming for 100 responses per wave. All started at about 10-11am on their respective days. Whatever the day, nearly half of all the responses were in the first 24 hours. Maybe earlier in the week seems better? I wonder how many people are sitting around completing these tasks for a bit of pocket money while at school or work. Maybe I should offer more than 5 cents per response. (shrugs emoji)

Interestingly when I compared the Amazon completed tasks (186) and Google responses (217), this seems to suggest some people (31) filled in the survey but did not come back to enter the code and complete their response or get their 5 cent reward? This made me feel so guilty I immediately donated the equivalent reward ($1.55/£1.18) to charity in recognition of their noble sacrifice.

Here’s the main question then:

Overall, recognition of ACE is a good 10% higher than X festival was. Not a huge amount, but in the direction you would probably expect. (Actual responses: No: 128, Yes: 89).

While there was a 7% gap for X festival (M more aware than F), for ACE, the gap is only 2% (F more aware than M). This is from 127 Male responses and 89 Female responses.

(Actual responses: Female: Yes: 48, No 32. Male: Yes: 79, No 57. An open text option was also available and in this case 1 person used it).

As before, big old caveat here that many age groups do not have enough responses to be considered useful. I’ve charted the actual responses as well as percentages (#). There are quite a lot of age brackets with a higher % of Yes than the background/whole sample (41%).

Let’s focus in a bit more on the groups where there was a better response, so we can compare it to Festival X:

Given that Festival X is about 10% behind ACE ‘to begin with’, there are some relatively close gaps with the 20-29 year olds, but the gap opens up beyond this.

Finally, the regions:

Again, the yes/no are %ages while the # is actual numbers of responses. The majority of responses were from London, as was the case with Festival X, in about the same proportion of total responses. Bizarrely enough, the East Midlands is the highest % of yeses but I suspect this shows the unreliability of looking at only 18 responses (3 Nos and 15 Yeses).

Oh – which reminds me – this question had the answer order shuffled each time, so it’s not like people could just pick the same first option each time.

Overall it cost about £15.00 to get these three waves of responses.

Part 2.5: Mo’ waves, mo’ problems

If you did three waves, didn’t you end up asking the same people multiple times?

Each person working on Mturk has a unique ID. Of the 186 responses, there were 162 unique IDs. 141 completed 1 task, 18 completed 2 and 3 completed all 3. (This is on Amazon, remember because some people filled in the Google Form but didn’t complete the task).

A quick cross reference with a few of the IDs confirmed that these were indeed the same workers completing separate waves of the task. If I had been smarter about it, or done the survey design in Amazon, I could have tracked to verify if they were providing the same responses each time or not.

If you were particularly worried about this skewing things, you might also just take their first response and manually delete the later ones, about 10%, not the end of the world.

In terms of people completing the task properly, I actually only got 2 responses that had entered the code wrong. Again, if I had been smarter it would have been possible to reject these and either open them up for someone else to complete or to let the person try again. (One of the errors seemed to be a genuine misunderstanding while the other may have been spam). But 2 out of 186 is acceptable anyway.

In terms of the results being similar across waves, that seems to be the case, here are a few final charts:

Part 3: In conclusion

This is not a suitable replacement for a proper poll, but it is an interesting little dip into the world of survey panels. There is a burgeoning discipline of social scientists using Mturk and other tools like it to quickly generate samples for all kinds of surveys. Maybe even just to do a pilot test before doing the actual survey somewhere else. As mentioned before, some of them even control for mitigating factors and seem to get comparable results to traditional surveys at a fraction of the costs. (Although they warn this will never really replace “true” observational studies, but still, can’t argue with the price!)

On the other hand, learning more about this sort of thing, I have to say that proper polls are pretty good value (and seem to be getting cheaper) but many charities, events and arts organisations I work with, still probably don’t have £300-500 to burn on this sort of thing, especially when you can only get 1-5 questions out of it. Unless you’ve got a REALLY specific burning question to ask or are an organisation that has SOME kind of national profile.

Plus if you are working at a more local or regional level, the relevant of national polls is a bit diminished, you might not really care what people in the West Midlands think about your project in the North East. But on the other hand, while you probably always want to keep your target audience in mind, there is a lot to be said for learning more about prospective audiences – even if they aren’t geographically near enough to realistically attend, what’s to say they don’t share other characteristics and opinions with the more local audiences who also aren’t attending?

At least you can now target the UK, which wasn’t always the case – whether you wanted to set up jobs or be a ‘worker’ to complete jobs. And again, just because someone in Europe or America or wherever is unlikely to attend your UK based thing, they can still give a view on something.

For US tasks, you can select by state as well, no idea on when or if that would ever come to the UK. (More about ‘Premium qualifications’ here)

There are alternatives to Amazon too, with various pros and cons.

Incidentally I’ve never seen the worker side of things, I actually tried signing up for an account but as I already have one as a ‘requester’, I guess they don’t allow you to do both. You do get to preview tasks before publishing them.

I can definitely see it being useful for less critical decision making, but the kind of thing where you’d probably appreciate a quick second opinion from a bunch of relatively objective and disinterested people. The speed of turnaround is appealing, the majority of responses are in within 24 hours and that’s slowed down quite a bit by only targeting the UK.

What website design or graphic layout do they prefer? What keywords would you associate with X, Y or Z brand? Read this marketing blurb and tell me what you think about it? What’s your position on A, B or C issue? I could see people spending a lot more on a focus groups to find out basically the same sort of thing.

So, has anyone out there got any questions they want to try? Paypal me a tenner and we’ll get started!

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.