There are many blog posts on the topic of effective Git workflows, SO questions and answers, BitBucket tutorials and GitHub guides and an article that has been archived by former BBC Mark McDonnell. So why another post on git workflow? None of these workflows seemed right for us, but recently it's just clicked, and I feel like we've finally found the process that works for us. The key was finding the simplest workflow that included the most valuable best practices. In particular, we found that complicated multi-branch strategies were unnecessary, but test driven development (TDD) and continuous integration (CI) were a must.
Winning Workflow
Setting up Remotes
We start with the assumption that all of collaborators fork the upstream repository to their personal profile. Then each person clones their profile to their laptop as origin and adds another remote pointing to the upstream repository. For convenience, they may also create remotes to the forks of their most frequent collaborators.
The next assumption is that we all keep our version of master synchronized with upstream master. And we never work out of our own master branch! Basically this means at the start of any new work we do the following:
I like to do git fetch --all to get the lay of the land. This combined with git log --all --graph --date=short --pretty=format:"%ad %h %s%d [%an]" let's me know what everyone is working on, assuming that I've made remotes to their forks.
Then I pull from upstream master to get the latest nightly or release,
and push to origin master to keep my fork current.
Recommended Project Layout
I'm also going to assume that everyone is following the recommended project layout. This means that their project has all dependencies listed in requirements.txt, is developed and deployed in its own virtual environment, includes testing and documentation that aims for >80% coverage, has a boilerplate design that allows testing, documentation and package data to be bundled into a distribution and enables use with a test runner with self discovery, and is written with docstrings for autodocumentation. Nothing is ever perfect, so being diligent of path clashes, aware of the arcana of Mac OS X1 or Windows2 and able to use Stack Overflow to find answers is still important.
Branching, Testing, Pull Requests and Collaboration
Now I switch to a new feature branch with a meaningful name - I'll delete this branch everywhere later so it can be verbose.
The very first code I writeisa test or two that demonstrates more or less exactly what we want the feature or bug fix to do. This is one of the most valuable steps because it clearly defines the acceptance criteria. Although it's also important to be thoughtful and flexible - just because your tests pass doesn't necessarily mean the feature is implemented as intended. Some new tests or adjustments may be needed along the way.
Now, before I write any more code, is when I submit a pull request (PR) from my fork's feature branch to upstream/master. So many people are surprised by this. Many collaborators have told me they thought that PR's should be submitted after their work is complete and passing all tests. But in my opinion that defeats the entire point of collaborating on a short iteration cycle.
If you wait until the end to submit your work you risk diverging from the feature's intended goals especially if the feature's requirements shift or you've misinterpreted the goals even slightly.
Waiting also means you're missing out on collaborating with your teammates and soliciting their feedback mid-project.
On the other hand, by submitting your PR right after you write your tests means:
Every push to your fork will trigger a build that runs your tests.
Your teammates will get continuous updates so they can monitor your progress in real-time but also on their time so you won't have to hold a formal review, since collaborators can review your work anytime as the commits will all be queued in the PR.
I think the reason people wait until the end to submit PR's is the same reason they like to write tests at the end. I used to hate seeing my tests fail because it made me feel like I was failing. I think people delay submitting their PR's because they're nervous about having incomplete work reviewed out of context and receiving unfair criticism or harsh judgment. IMO, punitive behavior is dysfunctional and a collaboration killer and should be rooted out with a frank discussion about what mutual success looks like. I also think some people aren't natural collaborators and don't want other's interfering with their work. Again, a constructive discussion can help promote new habits, although don't expect people to change overnight. You can take a hard stance on punitive behavior but you can't expect an introvert to feel comfortable sharing themselves freely without some accommodations.
Now comes the really fun part. We hack and collaborate until the tests all pass. But we don't have too much fun - there should be at most 10 commits before we realize we've embarked on an epic that needs to be re-organized, otherwise the PR will become difficult to merge. That will sap moral and waste time. So keep it simple.
The final bit of collaboration is the code review and merging the PR into upstream master. This is fairly easy, since there are
already tests that demonstrate what the code should do,
only a few commits,
and all of the collaborators have been following the commits as they've been queuing in the PR.
So really the review and merge is a sanity check. Do these tests really demonstrate the feature as intended? Anything else major would have stood out already.
Whoever the repository owner or maintainer is should add the tag and push it to upstream. This triggers the CI to test, build and deploy a new release.
Continuous Integration
This is key. Set up Travis, Circle, AppVeyor or Jenkins on upstream master to test and build every commit, every commit to an open PR and to deploy on every tag. Easy!
Wrapping Up
There are some features of this style that stand out:
There is only one master branch. Using CI to deploy only on tags eliminates our need for a dev or staging branch because any commits on master not tagged are the equivalent of the bleeding edge.
This method depends heavily on an online hosted Git repo like GitHub or BitBucket, use of TDD, strong collaboration and a CI server like Travis.
Happy Coding!
footnotes
On Mac OS X matplotlib will not work in a virtual environment unless a framework interpreter is used. The easiest way to do this is to run python as PYTHONHOME=/home/you/path/to/project/venv/ python instead of using source venv/bin/activate.
On Windows pip often creates an executable for scripts that is bound to the Python interpreter it was installed with. If the virtual environments was created with system site packages or if the package is not installed in the virtual environment then you may get a confusing path clash. For example running the nosetests script will use your system Python and therefore the Python path will not include your virtual environment. The solution is to never use system site packages and install all dependencies directly in your virtual environment.
Love struck super villain (Neil Patrick Harris) loses to super hero (Nathan Fillion) musical. This just never get’s old. Almost as good as the official Star Wars trailer.
FYI: You do not need to use `obj.empty` to preallocate an object array.
In fact as soon as you assign a value to any element in the object array it grows the array to that size, which allocates (or reallocates) RAM for the new object array, therefore defeating the point of preallocating space.
“If you make an assignment to a property value, MATLAB calls the SimpleClass constructor to grow the array to the require size:”
Instead if you want to preallocate space for an object array, grow the array once by assigning the last object first. This requires the class to have a no-arg constructor. Each time you grow your array you will reallocate RAM for it, wasting time and space, so do it once with the max expected size of the array. See Initialize Object Arrays and Initializing Arrays of Handle Objects in the OOP documentation.
>> S(max_size) = MyClass(args)
Another option is to preallocate any other container like a cell array (best IMHO), structure or containers.Map and then fill in the class objects as they are created. An advantage to this is you don’t have to subclass matlab.mixin.Heterogeneous to group different classes together.
>> S = cell(max_size); args = {1,2,3;4,5,6;7,8,9};
>> for x = 1:size(args,1), S(x) = MyClass(args{x,:});end
The only time to use an empty object is if you want it as a default for the situation where nothing gets instantiated, and you need the it be an instance of the class. Of course any empty array will do this, IE: '', [] and {} are also empty.
>> S = MyClass.empty
>> if blah,S = MyClass(args);end
>> if isa(S, 'MyClass') && isempty(S),do stuff; end
I hope this helps someone; it definitely helped me understand the odd nature of MATLAB. This behavior is because everything in MATLAB is an array, even a scalar is a <1x1 double> read the C-API mxArray for external references and mwArray for compiled/deployed MATLAB for more info.
MATLAB = Matrix Laboratory
Class definitions didn’t appear until 2008. Other languages like C++, Java, Python and Ruby are object first. So the empty method is meant to duplicate the ability to be empty similar to other MATLAB datatypes such as double, cell, struct, etc. IMO outside of MATLAB it's a very artificial and somewhat meaningless construct.
There are also a packages that will create a boiler plate project layout for you but I wouldn't recommend them except as reference guides - the tutorial by NSLS-2 being the notable exception, PTAL!
Bootstrap a Scientific Python Library: This is a tutorial with a template for packaging, testing, documenting, and publishing scientific Python code.
Cookiecutter: A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python package project template.
It's hard to pin a standard style down. Here’s mine:
MyProject/ <- git repository
|
+- .gitignore <- *.pyc, IDE files, venv/, build/, dist/, doc/_build, etc.
|
+- requirements.txt <- to install into a virtualenv
|
+- setup.py <- use setuptools, include packages, extensions, scripts and data
|
+- MANIFEST.in <- files to include in or exclude from sdist
|
+- readme.rst <- incorporate into setup.py and docs
|
+- changes.rst <- release notes, incorporate into setup.py and docs
|
+- myproject_script.py <- script to run myproject from command line, use Python
| argparse for command line arguments put shebang
| `#! /usr/bin/env python` on 1st line and end with a
| `if __name__ == "__main__":` section, include in
| setup.py scripts section for install
|
+- any_other_scripts.py <- scripts for configuration, documentation generation
| or downloading assets, etc., include in setup.py
|
+- venv/ <- virtual environment to run tests, validate setup.py, development
|
+- myproject/ <- top level package keeps sub-packages and package-data together
| for install
|
+- __init__.py <- contains __version__, an API by importing key modules,
| classes, functions and constants, __all__ for easy import
|
+- docs/ <- use Sphinx to auto-generate documentation
|
+- tests/ <- use nose to perform unit tests
|
+- other_package_data/ <- images, data files, include in setup.py
|
+- core/ <- main source code for myproject, sometimes called `lib`
| |
| +- __init__.py <- necessary to make mypoject_lib a sub-package
| |
| +- … <- the rest of the folders and files in myproject
|
+- related_project/ <- a GUI library that uses myproject_lib or tools that
| myproject_lib depends on that's bundled together, etc.
|
+- __init__.py <- necessary to make related_project a sub-package
|
+- … <- the rest of the folders and files in your the related project
If for some bizarre reason you find yourself looking for a YA dystopian novel set in SF in modern times, then you can read Cory Doctorow's Homeland,
and its prequel Little Brother for free (if you don't mind plugs by the author for independent brick'n'mortar bookshops nationwide between chapters).
I used the epub file in the Nook reader app on my Android phone. I just dropped the file in the Nook's "My Documents folder". The epub file is unfortunately incorrectly formatted for Google books. For an Amazon Kindle compatible version scroll to the bottom for the mobi file.
Maybe enjoy? (I did, but beware the themes hit very close to home. That's good right?)