Tuesday, November 8, 2016

Bypassing Box Upload Limits by API

Box for large files

Box offers 10gb of online storage for free, double what anyone else offers, with an individual max size of 2gb, but you can only upload 250mb files. So how do you upload that 2gb file? The Box API that's how, either with regular ol' requests or their fancy smancy sdk. First follow the Getting Started instructions, sign up for a developer account and create a temporary key. Then in Python, try this out:

# import the requests package
import requests

# copy your token here
TOKEN = "<your developer token>"

# try to get the top level folder, id: "0", using this command exactly as below:
r = requests.get(url='https://api.box.com/2.0/folders/0',
                 headers={'Authorization': 'Bearer %s' % TOKEN})

# check the response
r
#  <Response [200]>
# success!

# get the output
r.json()
# lots of stuff

# upload a file, using the commands exactly as below, except put the actual id number
# of the desired folder
FILES = {'file': open('path/to/myfile','rb')}
PAYLOAD = {'attributes': '{"name":"myfile", "parent":{"id":"<id # of desired folder>"}}'}
r = requests.post(url='https://upload.box.com/api/2.0/files/content',
                  headers={'Authorization': 'Bearer %s' % TOKEN},
                  files=FILES,
                  data=PAYLOAD)

# check the response
r
#  <Response [201]>
# success!

References

Check the online Content API reference for full documentation.

Monday, November 7, 2016

Panda Pop

Pandas Offset Aliases

Memorize this table - or just bookmark this link: Pandas Offset Aliases

Offset Aliases

A number of string aliases are given to useful common time series frequencies. We will refer to these aliases as offset aliases (referred to as time rules prior to v0.8.0).

Alias Description
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
SM semi-month end frequency (15th and end of month)
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
SMS semi-month start frequency (1st and 15th)
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA business year end frequency
AS year start frequency
BAS business year start frequency
BH business hour frequency
H hourly frequency
T, min minutely frequency
S secondly frequency
L, ms milliseconds
U, us microseconds
N nanoseconds

Tuesday, November 1, 2016

robotic releases

Basic Auto-Versioning from Git

If you're using the winning workflow and the recommended Python project layout then you've set up a CI server to build releases when you tag them in Git, and you set your version in the __init__.py file of your package. But, "Oh, No!" you did it again. You created the Git tag, but forgot to update your code's __version__ string.

Okay, there is a Python package called Versioneer that handles this for you, and it's pretty awesome. But it turns out it's also pretty easy to roll your own, especially if you're just using Git, because Python has a Git implementation called Dulwich that can do this in just a few lines. Maybe it will get integrated into a future version of Dulwich - I've submitted a PR (#462) which was merged into v0.16.3 and an update (#489) which was also merged into v0.17 to also list tags that are not objects. Anyway, for now, the easiest way to use this is to copy this file into your package at the top level, Install the latest version of dulwich (>=0.17.1), import it and then add something like this to your package dunder init module so it works both in your repo during dev and then later when deployed to users.

"""
Example package dunder init module implementing
``dulwich.contrib.release_robot`` to get current version.
"""

import os
import importlib

# try to import Dulwich or create dummies
try:
    from dulwich.contrib.release_robot import get_current_version
    from dulwich.repo import NotGitRepository
except ImportError:
    NotGitRepository = NotImplementedError

    def get_current_version():
        raise NotGitRepository

BASEDIR = os.path.dirname(__file__)  # this directory
VER_FILE = 'version'  # name of file to store version
# use release robot to try to get current Git tag
try:
    GIT_TAG = get_current_version()
except NotGitRepository:
    GIT_TAG = None
# check version file
try:
    version = importlib.import_module('%s.%s' % (__name__, VER_FILE))
except ImportError:
    VERSION = None
else:
    VERSION = version.VERSION
# update version file if it differs from Git tag
if GIT_TAG is not None and VERSION != GIT_TAG:
    with open(os.path.join(BASEDIR, VER_FILE + '.py'), 'w') as vf:
        vf.write('VERSION = "%s"\n' % GIT_TAG)
else:
    GIT_TAG = VERSION  # if Git tag is none use version file
VERSION = GIT_TAG  # version

__author__ = u'your name'
__email__ = u'your.email@your.company.com'
__url__ = u'https://github.com/your-org/your-project'
__version__ = VERSION
__release__ = u'your release name'

Or you can also use it to get all recent tags.

get_recent_tags()[0][0]

assuming your tags all use semantic versions like "v0.3". Enjoy!

"""Determine last version string from tags.
Alternate to `Versioneer <https://pypi.python.org/pypi/versioneer/>`_ using
`Dulwich <https://pypi.python.org/pypi/dulwich>`_ to sort tags by time from
newest to oldest.
Copy the following into the package ``__init__.py`` module::
from dulwich.contrib.release_robot import get_current_version
__version__ = get_current_version('..')
This example assumes the tags have a leading "v" like "v0.3", and that the
``.git`` folder is in a project folder that containts the package folder.
EG::
* project
|
* .git
|
+-* package
|
* __init__.py <-- put __version__ here
"""
import datetime
import re
import sys
import time
from dulwich.repo import Repo
# CONSTANTS
PROJDIR = '.'
PATTERN = r'[ a-zA-Z_\-]*([\d\.]+[\-\w\.]*)'
def get_recent_tags(projdir=PROJDIR):
"""Get list of tags in order from newest to oldest and their datetimes.
:param projdir: path to ``.git``
:returns: list of tags sorted by commit time from newest to oldest
Each tag in the list contains the tag name, commit time, commit id, author
and any tag meta. If a tag isn't annotated, then its tag meta is ``None``.
Otherwise the tag meta is a tuple containing the tag time, tag id and tag
name. Time is in UTC.
"""
with Repo(projdir) as project: # dulwich repository object
refs = project.get_refs() # dictionary of refs and their SHA-1 values
tags = {} # empty dictionary to hold tags, commits and datetimes
# iterate over refs in repository
for key, value in refs.items():
key = key.decode('utf-8') # compatible with Python-3
obj = project.get_object(value) # dulwich object from SHA-1
# don't just check if object is "tag" b/c it could be a "commit"
# instead check if "tags" is in the ref-name
if u'tags' not in key:
# skip ref if not a tag
continue
# strip the leading text from refs to get "tag name"
_, tag = key.rsplit(u'/', 1)
# check if tag object is "commit" or "tag" pointing to a "commit"
try:
commit = obj.object # a tuple (commit class, commit id)
except AttributeError:
commit = obj
tag_meta = None
else:
tag_meta = (
datetime.datetime(*time.gmtime(obj.tag_time)[:6]),
obj.id.decode('utf-8'),
obj.name.decode('utf-8')
) # compatible with Python-3
commit = project.get_object(commit[1]) # commit object
# get tag commit datetime, but dulwich returns seconds since
# beginning of epoch, so use Python time module to convert it to
# timetuple then convert to datetime
tags[tag] = [
datetime.datetime(*time.gmtime(commit.commit_time)[:6]),
commit.id.decode('utf-8'),
commit.author.decode('utf-8'),
tag_meta
] # compatible with Python-3
# return list of tags sorted by their datetimes from newest to oldest
return sorted(tags.items(), key=lambda tag: tag[1][0], reverse=True)
def get_current_version(projdir=PROJDIR, pattern=PATTERN, logger=None):
"""Return the most recent tag, using an options regular expression pattern.
The default pattern will strip any characters preceding the first semantic
version. *EG*: "Release-0.2.1-rc.1" will be come "0.2.1-rc.1". If no match
is found, then the most recent tag is return without modification.
:param projdir: path to ``.git``
:param pattern: regular expression pattern with group that matches version
:param logger: a Python logging instance to capture exception
:returns: tag matching first group in regular expression pattern
"""
tags = get_recent_tags(projdir)
try:
tag = tags[0][0]
except IndexError:
return
matches = re.match(pattern, tag)
try:
current_version = matches.group(1)
except (IndexError, AttributeError) as err:
if logger:
logger.exception(err)
return tag
return current_version
if __name__ == '__main__':
if len(sys.argv) > 1:
_PROJDIR = sys.argv[1]
else:
_PROJDIR = PROJDIR
print(get_current_version(projdir=_PROJDIR))
# release_robot.py
#
# Dulwich is dual-licensed under the Apache License, Version 2.0 and the GNU
# General Public License as public by the Free Software Foundation; version 2.0
# or (at your option) any later version. You can redistribute it and/or
# modify it under the terms of either of these two licenses.
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# You should have received a copy of the licenses; if not, see
# <http://www.gnu.org/licenses/> for a copy of the GNU General Public License
# and <http://www.apache.org/licenses/LICENSE-2.0> for a copy of the Apache
# License, Version 2.0.
#
"""Tests for release_robot."""
import datetime
import os
import re
import shutil
import tempfile
import time
import unittest
from dulwich.contrib import release_robot
from dulwich.repo import Repo
from dulwich.tests.utils import make_commit, make_tag
BASEDIR = os.path.abspath(os.path.dirname(__file__)) # this directory
def gmtime_to_datetime(gmt):
return datetime.datetime(*time.gmtime(gmt)[:6])
class TagPatternTests(unittest.TestCase):
"""test tag patterns"""
def test_tag_pattern(self):
"""test tag patterns"""
test_cases = {
'0.3': '0.3', 'v0.3': '0.3', 'release0.3': '0.3',
'Release-0.3': '0.3', 'v0.3rc1': '0.3rc1', 'v0.3-rc1': '0.3-rc1',
'v0.3-rc.1': '0.3-rc.1', 'version 0.3': '0.3',
'version_0.3_rc_1': '0.3_rc_1', 'v1': '1', '0.3rc1': '0.3rc1'
}
for testcase, version in test_cases.items():
matches = re.match(release_robot.PATTERN, testcase)
self.assertEqual(matches.group(1), version)
class GetRecentTagsTest(unittest.TestCase):
"""test get recent tags"""
# Git repo for dulwich project
test_repo = os.path.join(BASEDIR, 'dulwich_test_repo.zip')
committer = b"Mark Mikofski <mark.mikofski@sunpowercorp.com>"
test_tags = [b'v0.1a', b'v0.1']
tag_test_data = {
test_tags[0]: [1484788003, b'0' * 40, None],
test_tags[1]: [1484788314, b'1' * 40, (1484788401, b'2' * 40)]
}
@classmethod
def setUpClass(cls):
cls.projdir = tempfile.mkdtemp() # temporary project directory
cls.repo = Repo.init(cls.projdir) # test repo
obj_store = cls.repo.object_store # test repo object store
# commit 1 ('2017-01-19T01:06:43')
cls.c1 = make_commit(
id=cls.tag_test_data[cls.test_tags[0]][1],
commit_time=cls.tag_test_data[cls.test_tags[0]][0],
message=b'unannotated tag',
author=cls.committer
)
obj_store.add_object(cls.c1)
# tag 1: unannotated
cls.t1 = cls.test_tags[0]
cls.repo[b'refs/tags/' + cls.t1] = cls.c1.id # add unannotated tag
# commit 2 ('2017-01-19T01:11:54')
cls.c2 = make_commit(
id=cls.tag_test_data[cls.test_tags[1]][1],
commit_time=cls.tag_test_data[cls.test_tags[1]][0],
message=b'annotated tag',
parents=[cls.c1.id],
author=cls.committer
)
obj_store.add_object(cls.c2)
# tag 2: annotated ('2017-01-19T01:13:21')
cls.t2 = make_tag(
cls.c2,
id=cls.tag_test_data[cls.test_tags[1]][2][1],
name=cls.test_tags[1],
tag_time=cls.tag_test_data[cls.test_tags[1]][2][0]
)
obj_store.add_object(cls.t2)
cls.repo[b'refs/heads/master'] = cls.c2.id
cls.repo[b'refs/tags/' + cls.t2.name] = cls.t2.id # add annotated tag
@classmethod
def tearDownClass(cls):
cls.repo.close()
shutil.rmtree(cls.projdir)
def test_get_recent_tags(self):
"""test get recent tags"""
tags = release_robot.get_recent_tags(self.projdir) # get test tags
for tag, metadata in tags:
tag = tag.encode('utf-8')
test_data = self.tag_test_data[tag] # test data tag
# test commit date, id and author name
self.assertEqual(metadata[0], gmtime_to_datetime(test_data[0]))
self.assertEqual(metadata[1].encode('utf-8'), test_data[1])
self.assertEqual(metadata[2].encode('utf-8'), self.committer)
# skip unannotated tags
tag_obj = test_data[2]
if not tag_obj:
continue
# tag date, id and name
self.assertEqual(metadata[3][0], gmtime_to_datetime(tag_obj[0]))
self.assertEqual(metadata[3][1].encode('utf-8'), tag_obj[1])
self.assertEqual(metadata[3][2].encode('utf-8'), tag)
"""
Example package dunder init module implementing
``dulwich.contrib.release_robot`` to get current version.
"""
import os
import importlib
# try to import Dulwich or create dummies
try:
from dulwich.contrib.release_robot import get_current_version
from dulwich.repo import NotGitRepository
except ImportError:
NotGitRepository = NotImplementedError
def get_current_version():
raise NotGitRepository
BASEDIR = os.path.dirname(__file__) # this directory
PROJDIR = os.path.dirname(BASEDIR)
VER_FILE = 'version' # name of file to store version
# use release robot to try to get current Git tag
try:
GIT_TAG = get_current_version(PROJDIR)
except NotGitRepository:
GIT_TAG = None
# check version file
try:
version = importlib.import_module('%s.%s' % (__name__, VER_FILE))
except ImportError:
VERSION = None
else:
VERSION = version.VERSION
# update version file if it differs from Git tag
if GIT_TAG is not None and VERSION != GIT_TAG:
with open(os.path.join(BASEDIR, VER_FILE + '.py'), 'w') as vf:
vf.write('VERSION = "%s"\n' % GIT_TAG)
else:
GIT_TAG = VERSION # if Git tag is none use version file
VERSION = GIT_TAG # version
__author__ = u'your name'
__email__ = u'your.email@your.company.com'
__url__ = u'https://github.com/your-org/your-project'
__version__ = VERSION
__release__ = u'your release name'
view raw xyz__init__.py hosted with ❤ by GitHub
Fork me on GitHub