Twig's Tech Tips: May 2018

Now this isn't a blog I would have normally written up here since the stats in this post were only meant for my colleagues in an internal email update.

But I noticed some emotional messages in recent discussions regarding pipenv and a distinct lack of solid information about it's actual merits / benefit as a tool.

To me software development should be factual, much like maths and science. You prove yourself through your work. I don't give a shit if it was written by someone who is LGBT, has an illness or celebrity status.

It's as irrelevant to me as the stupid royal wedding. It doesn't matter and I don't need to know the back story.

That said, stress and anxiety from work should be dealt with by taking a damn break. It's not healthy to do nothing but coding or provide support for open source projects.

tldr; While I appreciate the effort and intention of the project to fix the Python workspace, pipenv feels like an early project still trying to find its feet.

Deterministic builds ARE important

As developers, there are plenty of things we'd like to spend our time doing and our dev tools are meant to help us save time in doing so.

Last week I had time to pick up a task from mid 2017 to switch our company codebase to use pip-tools.

pip-tools is primarily a tool to pin python dependencies by generating (and documenting) requirements.txt files from an input file, allowing for deterministic builds across all machines. It's pretty much yarn for Python.

I switched over to pip-tools within a day but there were still a few kinks with our dependencies. Conflicting library dependency versions, badly named libs, etc. Nothing really unexpected after 5+ years of digital hoarding and virtualenv neglect.

Another day of cleansing saw a few unnecessary libraries removed from the codebase and a neatly generated requirements.txt file.

During the process of updating our dev setup guide, I was looking for some documentation on Python.org and saw a little note recommending use of Pipenv for our virtualenv and packaging needs.

Well if it's recommended by Python it should be good, right? If this is the way it should be then it'd be in the best interest of our devs to switch to Pipenv so our skillset doesn't fall behind.

A summary of my experience follows below.

Benchmark setup

For the sake of reproducibility, all tests were done in a VM with a fresh install of Ubuntu 16.04 (on a host with a 7200rpm HDD), 4gb of ram, a shitty AMD A10-5800k and standard rubbish Aussie ADSL "broadband" internet.

Library versions are:

Pip 10.0.1
Python 2.7.11
pipenv 2018.5.18 (seriously, what is semver?)
virtualenv 16.0.0
and pip-tools 2.0.2

Timing was done via the "time" command. It's accessible and easy to use. Results were measured in seconds.

Notes:

for tests without pip cache, I would run "rm -rf ~/.cache/pip*" to clear pip/pipenv caching.
I wasn't able to time 2 commands properly, so I just wrote a "compile-sync.sh" script to time both "pip-compile --verbose" and "pip-sync"
excuse the charts, took me forever to figure out how to do them in Excel

Benchmark results

pipenv: pipenv --two
virtualenvwrapper: mkvirtualenv pt

No issue here. Nobody is gonna complain about 1 second difference in the grand scheme of things.

pipenv: pipenv install requests==2.18.4 django==1.11.13
pip-tools: ./compile-sync.sh

4 seconds difference, still not too bad.

pipenv: pipenv install requests==2.18.4 django==1.11.13
pip-tools: ./compile-sync.sh

So here was the first time I deleted the virtualenvs. I kept pip/pipenv caches intact to compare the dependency walking times.

Both much faster, but surprisingly still a 4 second difference.

pipenv: pipenv install (includes time to generate new lockfile)
pip-tools: ./compile-sync.sh

Now this is where it becomes interesting.

New virtualenv, no pip/pipenv caching, complicated requirements (pyrax and all of its insanity)

All things being equal, pipenv ends up being 2.7x slower than pip-tools.

If you want to replicate it, this is what the Pipfile looks like:

requests = "==2.18.4"
django = "==1.11.13"
### because pyrax is a cruel mistress
# "Could not find a version that matches pbr!=2.1.0,<2.0,>=1.6,>=2.0.0"
# https://github.com/rackspace/pyrax/issues/623
pyrax = "==1.9.8"
# required to get pyrax working without conflicting with its own dependencies
# https://github.com/pycontribs/pyrax/issues/623#issuecomment-329647249
"oslo.serialization" = "==1.6.0"
"oslo.utils" = "==2.0.0"
"oslo.i18n" = "==1.7.0"
debtcollector = "==0.5.0"
python-keystoneclient = "==1.6.0"
"oslo.config" = "==1.12.0"
stevedore = "==1.5.0"

For an insight to how truly horrible this library is, you should check the output of "pipenv graph".

pipenv: pipenv install
pip-tools: ./compile-sync.sh

Same complex requirements as before, but this time I only binned the virtualenv. It's much quicker once the libraries are cached, but pipenv is 4.8x slower when it needs to regenerate the lockfile.

Even with a valid lockfile, it's still 3.7x slower than pip-tools.

pipenv: pipenv install search_google==1.2.1
pip-tools: ./compile-sync.sh (after manually editing requirements.in)

Waiting 1m11s each time I want to add a library does not sound appealing.

Pros and cons

At this point I've only provided speed comparisons between pipenv and pip-tools. Below are a few things I noticed during my week comparing these tools.

virtualenvwrapper

simple virtualenv workflow
have to manually modify .bashrc to get the commands working

would have preferred the syntax to be some variance of "venvwrap mk|rm venv_name" rather than "mkvirtualenv venv_name" and "rmvirtualenv venv_name"

pip-tools

simple and focused
works with projects AND libraries
maintains compatibility with existing deployment tools (puppet, ansible, etc)
generated requirements.txt file is well documented and easy to read
pip-sync both installs new and removes unused libraries
unable to understand urls from github with #egg==version format (which pip understands)

likewise with virtualenvwrapper syntax, would have preferred "piptools sync|compile" over "pip-sync" and "pip-compile"

pipenv

provides many useful features like "check" for security vulnerabilities and "graph"
graph output is very nicely laid out
gets the "pipenv install|sync|clean" syntax right
much slower at tasks

works with projects, but not libraries

documentation for commands need work, better luck with trial and error

pipenv sync only seems to add libraries - need to run pipenv clean to remove unused libs

Pipfile syntax errors result in vague TomlDecodeError stack trace instead of helpful error messages

may cause issues with some shell setups due to the way pipenv shell works (lose aliases, no virtualenv label shown, source commands in .bashrc no longer work as expected, etc). mitigated by using --fancy flag, but inconsistent between dev machines with varied setups

"pipenv run" fails to set VIRTUAL_ENV environment. Apparently this is virtualenv's fault, but isn't pipenv meant to be a tool that makes it easier for Python newbies to pick up?

"Pipenv is primarily meant to provide users and developers of applications with an easy method to setup a working environment" (from homepage, paragraph 3)

doesn't quite feel like deployment tooling/plugins are ready yet (puppet, ansible, etc) - requires more work to update deployment scripts

no command for checking if virtualenv already created (I could be wrong due to documentation)

So in a deployment script, I tried to detect if a virtualenv folder exists before trying to sync. Nope, can't do.

"pipenv --venv" should give you the path of the virtualenv, but only if the virtualenv exists. Otherwise, it ends with exit code 1 which will terminate deploy scripts

Maybe sync will work? "pipenv sync --help" shows:

Options:
--three / --two Use Python 3/2 when creating virtualenv.

Ahh "when creating", that sounds promising!

But alas, in practice that actually destroys and recreates your virtualenv without warning! Enjoy your additional waiting time...

:~/src/test-piptools$ pipenv sync --two
Virtualenv already exists!
Removing existing virtualenv…
Creating a virtualenv for this project…

I have no words ...

What's the verdict?

I spent roughly 4-5 days getting things to work with pipenv. A bit of time learning the ropes of pipenv's workflow, some of it fighting my mostly-vanilla bash shell to work properly with pipenv, looking up issues on Github/StackOverflow, a lot of time waiting for lockfile generation and I finally had enough when the deployment scripts/tooling needed more work in the staging environment.

Your experience with pipenv on github may vary depending on who you interact with on the contributors team. I've seen a few valid tickets get dismissed, but the friendly assistance I got from uranusjr was highly appreciated.

While I can't argue the fact that pipenv works, it's definitely one of those things that could test the patience of a saint once used in a real world environment.

I find it difficult to see why pipenv is recommended by Python.org / PYPA apart from the reason that it's made by the guy who made requests.

Something to keep an eye on, but for now I don't believe it is as production ready as their alternatives.

Twig's Tech Tips

Pages

Search Twig's Tips

About Me

Contact

My Pet Projects

Links

Labels

Blog Archive

Python: comparison of pipenv vs pip-tools

Deterministic builds ARE important

Benchmark setup

Benchmark results

Pros and cons

What's the verdict?