The Dropbox Python SDK and gevent

Contents

1 Introduction
2 The sequential model
3 The async model
4 The async model with limited concurrency

1 Introduction

This document provides a rather simple example of the use of the Dropbox Python SDK to upload files to a dropbox folder. We'll provide examples of both a synchronous and sequential process and one that follows an asynchronous and coroutine model using gevent.

You can find information on gevent here:

And, documentation on the Dropbox Python SDK is here: http://dropbox-sdk-python.readthedocs.io/en/master/.

Both of our example programs read a file that contains specifications of the files to be uploaded. Each line in this file contains the name of the file to be uploaded and the "target", i.e. the location for the file in the Dropbox folder (a path/name). Here is an example that I used in my testing:

Data/tmp01.txt /Target/tmp01.txt
Data/tmp02.txt /Target/tmp02.txt
Data/tmp03.txt /Target/tmp03.txt
Data/tmp04.txt /Target/tmp04.txt
Data/tmp05.txt /Target/tmp05.txt
Data/tmp06.txt /Target/tmp06.txt
Data/tmp07.txt /Target/tmp07.txt
Data/tmp08.txt /Target/tmp08.txt
Data/tmp09.txt /Target/tmp09.txt
Data/tmp10.txt /Target/tmp10.txt
Data/tmp11.txt /Target/tmp11.txt
Data/tmp12.txt /Target/tmp12.txt

Notes:

Data/tmp01.txt, Data/tmp02.txt, etc. are the files to be uploaded. They exist and are small text files on my local file system.
/Target/tmp01.txt, /Target/tmp02.txt, etc. are the locations that the corresponding file will be uploaded to in my Dropbox folder. This is where the files will appear in my Dropbox folder.

I tested these examples under both Python 2 and Python 3.

2 The sequential model

For comparison, this example does not use gevent.

This example uses a simple for loop to repeatedly call a function that uploads one file. Here is the source:

#!/usr/bin/env python

"""
synopsis:
    Upload files to Dropbox folder.
    Attempt to upload files in parallel.
    Read names of files to be uploaded and the path at which each
        is to be stored from spec_file.  spec_file contains one line
        per file with two fields per line: the name and the path.
usage:
    python tornado_ioloop01.py <spec_file>
"""

from __future__ import print_function

import sys
import os
import dropbox
import time
import datetime


Auth_key = "<my-dropbox-authentication-key>"


def read_files_and_paths(infilename):
    with open(infilename, 'r') as infile:
        specs = []
        for line in infile:
            line = line.strip()
            if not line.startswith('#'):
                source, dest = line.split()
                specs.append((source, dest))
    return specs


def upload_one_file(dbx, source, dest):
    overwrite = False
    with open(source, 'r') as infile:
        data = infile.read()
        if sys.version_info.major == 3:
            bytesdata = bytes(data, 'utf-8')
        else:
            bytesdata = data
        mode = (
            dropbox.files.WriteMode.overwrite
            if overwrite
            else dropbox.files.WriteMode.add)
        mtime = os.path.getmtime(source)
        client_modified = datetime.datetime(*time.gmtime(mtime)[:6])
        print('bytesdata: {}  dest: {}  client_modified: {}'.format(
            bytesdata, dest, client_modified))
        res = dbx.files_upload(
            bytesdata, dest, mode,
            client_modified=client_modified,
            mute=True)
        print('res.name: {}  source: {}  dest: {}'.format(
            res.name, source, dest))


def upload_files_seq(dbx, files_and_paths):
    for source, dest in files_and_paths:
        upload_one_file(dbx, source, dest)


def main():
    args = sys.argv[1:]
    if len(args) != 1:
        sys.exit(__doc__)
    infilename = args[0]
    files_and_paths = read_files_and_paths(infilename)
    print('files_and_paths: {}'.format(files_and_paths))
    dbx = dropbox.Dropbox(Auth_key)
    upload_files_seq(dbx, files_and_paths)


if __name__ == '__main__':
    #import pdb; pdb.set_trace()
    main()

3 The async model

Here is the version that uses gevent to batch those requests for Dropbox to upload:

#!/usr/bin/env python

"""
synopsis:
    Upload files to Dropbox folder.
    Attempt to upload files in parallel.
    Read names of files to be uploaded and the path at which each
        is to be stored from spec_file.  spec_file contains one line
        per file with two fields per line: the name and the path.
usage:
    python tornado_ioloop01.py <spec_file>
"""

from __future__ import print_function

import gevent.monkey
gevent.monkey.patch_all()

import sys
import os
import gevent
import dropbox
import time
import datetime


Auth_key = "<my-dropbox-authentication-key>"


def read_files_and_paths(infilename):
    with open(infilename, 'r') as infile:
        specs = []
        for line in infile:
            line = line.strip()
            if not line.startswith('#'):
                source, dest = line.split()
                specs.append((source, dest))
    return specs


def upload_one_file(dbx, source, dest):
    overwrite = False
    with open(source, 'r') as infile:
        data = infile.read()
        if sys.version_info.major == 3:
            bytesdata = bytes(data, 'utf-8')
        else:
            bytesdata = data
        mode = (
            dropbox.files.WriteMode.overwrite
            if overwrite
            else dropbox.files.WriteMode.add)
        mtime = os.path.getmtime(source)
        client_modified = datetime.datetime(*time.gmtime(mtime)[:6])
        print('bytesdata: {}  dest: {}  client_modified: {}'.format(
            bytesdata, dest, client_modified))
        res = dbx.files_upload(
            bytesdata, dest, mode,
            client_modified=client_modified,
            mute=True)
        print('res.name: {}  source: {}  dest: {}'.format(
            res.name, source, dest))


def upload_files(dbx, files_and_paths):
    threads = []
    for source, dest in files_and_paths:
        threads.append(gevent.spawn(upload_one_file, dbx, source, dest))
    gevent.joinall(threads)


def main():
    args = sys.argv[1:]
    if len(args) != 1:
        sys.exit(__doc__)
    infilename = args[0]
    files_and_paths = read_files_and_paths(infilename)
    print('files_and_paths: {}'.format(files_and_paths))
    dbx = dropbox.Dropbox(Auth_key)
    upload_files(dbx, files_and_paths)


if __name__ == '__main__':
    #import pdb; pdb.set_trace()
    main()

Notes:

Rather than use a for loop to directly call our function that uploads one file, this for loop spawns and collects a list of Greenlet pseudo threads, each of which encapsulate a call to the function that uploads one file.
Then we wait for these tasks to complete using gevent.joinall(threads).
Note the call to gevent.monkey.patch_all(). That is what is what changes some of the calls down inside the Dropbox SDK from blocking to non-blocking calls and enables our pseudo-threads to give up control to a second thread while the first thread waits on an I/O or network request. That's what enables gevent to schedule the running of Greenlets "cooperatively".

4 The async model with limited concurrency

The above example works fine with a limited number of files. But, what if we attempted to upload a large number of files? In that case we might want to put some limit on the number of tasks that can be active concurrently. gevent makes it rather easy to do that, too.

The gevent.pool.Pool class enables us to specify a maximum number of Greenlets to be active at any time.

Using a pool can be as simple as making a couple of modifications to the above gevent example. Here is a diff between the previous gevent example and one that uses gevent.pool.Pool instead:

--- upload_batch02.py   2017-01-10 09:59:05.434850755 -0800
+++ upload_batch03.py   2017-01-10 12:09:06.509149609 -0800
@@ -19,6 +19,7 @@
 import sys
 import os
 import gevent
+import gevent.pool
 import dropbox
 import time
 import datetime
@@ -63,9 +64,10 @@


 def upload_files(dbx, files_and_paths):
+    pool = gevent.pool.Pool(4)
     threads = []
     for source, dest in files_and_paths:
-        threads.append(gevent.spawn(upload_one_file, dbx, source, dest))
+        threads.append(pool.spawn(upload_one_file, dbx, source, dest))
     gevent.joinall(threads)

Notes:

We added an import for the gevent.pool module.
We create an instance of the gevent.pool.Pool class, specifying that the maximum size of the pool is 4.
We spawn our threads (Greenlets) using pool.spawn instead of gevent.spawn.

The Dropbox Python SDK and gevent

1 Introduction

2 The sequential model

3 The async model

4 The async model with limited concurrency

Published

Category

Tags

Contact