Boto Mturk Tutorial: Fetch results and pay workers

This is another tutorial of the mturk series, in this one I will explain how to fetch the ready results from mturk trough python boto and how to approve or reject payments to the workers.
Before continue I suggest you to read my first tutorial about boto and mturk if you didn’t it already.

Well, before continuing for have a good test case I suggest you to publish some hits on the mturk sandbox and do it trough the workers sandxbox, in this way you will have some results ready to be fetched.
The protagonist of this tutorial is the method get_reviewable_hits,

get_reviewable_hits(hit_type=None, status=’Reviewable’, sort_by=’Expiration’, sort_direction=’Ascending’, page_size=10, page_number=1)

as you canunderstand by the name this method fetch the hits that have the status of “reviewable” or “reviewing” this means all the hits that have all assignments (Number of answer required from different workers) completed or that are expired.
As you can also understand from parameters this method give you back by default just the first 10 reviewable hits, the maximum page size that you can have is 100, this means that if you have more than 100 hits ready you have to call this method more than one time with incremental page number.
Well, the first thing that we do is write a method that fetches all reviewable hits, it accept as unique parameter an mturk connection object.

def get_all_reviewable_hits(mtc):
    page_size = 50
    hits = mtc.get_reviewable_hits(page_size=page_size)
    print "Total results to fetch %s " % hits.TotalNumResults
    print "Request hits page %i" % 1
    total_pages = float(hits.TotalNumResults)/page_size
    int_total= int(total_pages)
    if(total_pages-int_total>0):
        total_pages = int_total+1
    else:
        total_pages = int_total
    pn = 1
    while pn < total_pages:
        pn = pn + 1
        print "Request hits page %i" % pn
        temp_hits = mtc.get_reviewable_hits(page_size=page_size,page_number=pn)
        hits.extend(temp_hits)
    return hits

The list of hits returned by the method is a list of boto HITS objects.
This object doesn’t contain the assignments, you have to call another method for get the assignments of a particular HIT id.
The next step is tho iterate trough this list and for each HIT calls the method get_assignments(hit_id)

This method will return all the answers to your hits.
Below the complete script for print to screen all the assignments of your hits.

from boto.mturk.connection import MTurkConnection

ACCESS_ID ='your access id'
SECRET_KEY = 'your key'
HOST = 'mechanicalturk.sandbox.amazonaws.com'

def get_all_reviewable_hits(mtc):
    page_size = 50
    hits = mtc.get_reviewable_hits(page_size=page_size)
    print "Total results to fetch %s " % hits.TotalNumResults
    print "Request hits page %i" % 1
    total_pages = float(hits.TotalNumResults)/page_size
    int_total= int(total_pages)
    if(total_pages-int_total>0):
        total_pages = int_total+1
    else:
        total_pages = int_total
    pn = 1
    while pn < total_pages:
        pn = pn + 1
        print "Request hits page %i" % pn
        temp_hits = mtc.get_reviewable_hits(page_size=page_size,page_number=pn)
        hits.extend(temp_hits)
    return hits

mtc = MTurkConnection(aws_access_key_id=ACCESS_ID,
                      aws_secret_access_key=SECRET_KEY,
                      host=HOST)

hits = get_all_reviewable_hits(mtc)

for hit in hits:
    assignments = mtc.get_assignments(hit.HITId)
    for assignment in assignments:
        print "Answers of the worker %s" % assignment.WorkerId
        for question_form_answer in assignment.answers[0]:
            for key, value in question_form_answer.fields:
                print "%s: %s" % (key,value)
        print "--------------------"

As you can see the scripts call the get_assignments method for each hit id and after that iterate trough it for fetching the answers.
In the line 36 you see an answer[0], maybe you are thinking “why don’t iterate trough all answers ?”
For try to give a clear explanation first let’s give some definition thaw will be valid on the next rows.

  • A “question form answer” is the single answer to a single question of your form.
  • An “answer” element is the set of all the “question form answer” of your QuestionForm
  • An “assignment” is the set of all the “answers” of the same worker

In practice each worker can give just 1 “answer” to the hit, for that the assignment will contain always just one “answer”.
“answers” element is just a reflection of the xml structure, boto translate it as array of one element.
If this explanation has been clear, you just have to know which method use for accept and refuse payments to the workers.
The operations of pay and refuse have do be done on the “assignments” unit, in fact they accept the assignment id as a parameter.

approve_assignment(assignment_id, feedback=None)

reject_assignment(assignment_id, feedback=None)

Both methods accept also a feedback string, this is the message that the workers will receive as explanation for the approved/rejected assignment, be kind :-D .
When you don’t need anymore an hits you can “delete” it from mturk by calling the method

disable_hit(hit_id, response_groups=None)

I suggest you to read the documentation about disable_hit method.
I leave you with an edited version of the loop that pay all workers and disable the hits.
See you soon ;-)

for hit in hits:
    assignments = mtc.get_assignments(hit.HITId)
    for assignment in assignments:
        print "Answers of the worker %s" % assignment.WorkerId
        for question_form_answer in assignment.answers[0]:
            for key, value in question_form_answer.fields:
                print "%s: %s" % (key,value)
        mtc.approve_assignment(assignment.AssignmentId)
        print "--------------------"
    mtc.disable_hit(hit.HITId)

  • Trevor

    Thanks for the example(s)! This has been very helpful. Python is the first language I’ve studied intensely, and MechTurk is my first experience working with an API, so you’re explanation has been very useful in understanding a lot of the process.

    Thanks again!

    -TA

  • Shashank Jain

    Hello
    Thanks a lot for this post!!!!
    How do we know if the job got completed?
    Do I have to manually check it on the requester account? Or I have to do polling using the above function?
    Is there way that I get a message whenever a Hit gets submitted by the user?

    PS: Can you provide soem examples on how to assign qualifications and distribute bonusses using boto. It can be really helpful.

    Thanks
    Shashank Jain

  • http://www.toforge.com/ Mauro Rocco

    Hi Shashank,
    I’m glad you appreciate the post, thanks.

    As far as I know you cannot provide any callback url to amazon on which it will cal you back when an hit is complete, so you will need to pull from time to time to check which hits are complete.

    The function get_reviewable_hits will always return only HITS that are ready to be reviewed.
    For the examples I wish I had all this free time but sadly I cannot.

    Thanks again

    Regards

  • Jacopo

    Mauro, great work, thanks a lot.

    So far, I have used the web interface to submit hits. Generally, i submit a parametrised html file that provides the form layout and then a CSV file with the data (each row corresponds do a different hit). i am looking for something similar with Boto . My suspect is that I have to create a different html file for each hit and submit each of them. Would you have any suggestion? Could you address me to which Boto object I should use? Many thanks

  • http://www.toforge.com/ Mauro Rocco

    Hi,

    In your case as you are creating everything via python code I would suggest you to generate the HTML content already in your code together with the data and than wrap everything in an HTMLQuestion, https://github.com/boto/boto/blob/develop/boto/mturk/question.py#L143, The only inconvenient of this is that you will upload similar html strings every time, but except of bandwidth usage and speed I don’t see any issue on this.

    I would not recommend to take the path of HitLayout when working with BOTO, hitLayout makes more sense when you do manual job and you don’w want to copy and paste html around.

    Hope This Helps
    Regards

  • Jacopo

    Hi Mauro,

    I followed your instructions, now I am able to create HITs but when it is time to answer them there is a problem.
    Initially, the “Submit” button at the top was disabled so I followed the example in: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_HTMLQuestionArticle.html
    I wrapped my question is a html-form.
    Now I have a Submit button inside my page (as opposed to the one on top of the frame). Unfortunately, when I click on it I get and error “There was a problem submitting your results for this HIT.”

    Would you have any suggestion?

  • http://www.toforge.com/ Mauro Rocco

    Hi Jacopo,
    This is more related to Amazon AWS itself and not to boto or python so I really don’t have the time to investigate on it for you.
    Sorry