Python: comparison of pipenv vs pip-tools

Now this isn't a blog I would have normally written up here since the stats in this post were only meant for my colleagues in an internal email update.

But I noticed some emotional messages in recent discussions regarding pipenv and a distinct lack of solid information about it's actual merits / benefit as a tool.

To me software development should be factual, much like maths and science. You prove yourself through your work. I don't give a shit if it was written by someone who is LGBT, has an illness or celebrity status.

It's as irrelevant to me as the stupid royal wedding. It doesn't matter and I don't need to know the back story.

That said, stress and anxiety from work should be dealt with by taking a damn break. It's not healthy to do nothing but coding or provide support for open source projects.

tldr; While I appreciate the effort and intention of the project to fix the Python workspace, pipenv feels like an early project still trying to find its feet.

Deterministic builds ARE important

As developers, there are plenty of things we'd like to spend our time doing and our dev tools are meant to help us save time in doing so.

Last week I had time to pick up a task from mid 2017 to switch our company codebase to use pip-tools.

pip-tools is primarily a tool to pin python dependencies by generating (and documenting) requirements.txt files from an input file, allowing for deterministic builds across all machines. It's pretty much yarn for Python.

I switched over to pip-tools within a day but there were still a few kinks with our dependencies. Conflicting library dependency versions, badly named libs, etc. Nothing really unexpected after 5+ years of digital hoarding and virtualenv neglect.

Another day of cleansing saw a few unnecessary libraries removed from the codebase and a neatly generated requirements.txt file.

During the process of updating our dev setup guide, I was looking for some documentation on Python.org and saw a little note recommending use of Pipenv for our virtualenv and packaging needs.

Well if it's recommended by Python it should be good, right? If this is the way it should be then it'd be in the best interest of our devs to switch to Pipenv so our skillset doesn't fall behind.

A summary of my experience follows below.

Benchmark setup

For the sake of reproducibility, all tests were done in a VM with a fresh install of Ubuntu 16.04 (on a host with a 7200rpm HDD), 4gb of ram, a shitty AMD A10-5800k and standard rubbish Aussie ADSL "broadband" internet.

Library versions are:

  • Pip 10.0.1
  • Python 2.7.11
  • pipenv 2018.5.18 (seriously, what is semver?)
  • virtualenv 16.0.0
  • and pip-tools 2.0.2

Timing was done via the "time" command. It's accessible and easy to use. Results were measured in seconds.

Notes:

  • for tests without pip cache, I would run "rm -rf ~/.cache/pip*" to clear pip/pipenv caching.
  • I wasn't able to time 2 commands properly, so I just wrote a "compile-sync.sh" script to time both "pip-compile --verbose" and "pip-sync"
  • excuse the charts, took me forever to figure out how to do them in Excel

Benchmark results

image

image

  • pipenv: pipenv --two
  • virtualenvwrapper: mkvirtualenv pt

No issue here. Nobody is gonna complain about 1 second difference in the grand scheme of things.

image

image

  • pipenv: pipenv install requests==2.18.4 django==1.11.13
  • pip-tools: ./compile-sync.sh

4 seconds difference, still not too bad.

image

image

  • pipenv: pipenv install requests==2.18.4 django==1.11.13
  • pip-tools: ./compile-sync.sh

So here was the first time I deleted the virtualenvs. I kept pip/pipenv caches intact to compare the dependency walking times.

Both much faster, but surprisingly still a 4 second difference.

image

image

  • pipenv: pipenv install (includes time to generate new lockfile)
  • pip-tools: ./compile-sync.sh

Now this is where it becomes interesting.

New virtualenv, no pip/pipenv caching, complicated requirements (pyrax and all of its insanity)

All things being equal, pipenv ends up being 2.7x slower than pip-tools.

If you want to replicate it, this is what the Pipfile looks like:

requests = "==2.18.4"
django = "==1.11.13"
### because pyrax is a cruel mistress
# "Could not find a version that matches pbr!=2.1.0,<2.0,>=1.6,>=2.0.0"
# https://github.com/rackspace/pyrax/issues/623
pyrax = "==1.9.8"
# required to get pyrax working without conflicting with its own dependencies
# https://github.com/pycontribs/pyrax/issues/623#issuecomment-329647249
"oslo.serialization" = "==1.6.0"
"oslo.utils" = "==2.0.0"
"oslo.i18n" = "==1.7.0"
debtcollector = "==0.5.0"
python-keystoneclient = "==1.6.0"
"oslo.config" = "==1.12.0"
stevedore = "==1.5.0"

For an insight to how truly horrible this library is, you should check the output of "pipenv graph".

image

image

  • pipenv: pipenv install
  • pip-tools: ./compile-sync.sh

Same complex requirements as before, but this time I only binned the virtualenv. It's much quicker once the libraries are cached, but pipenv is 4.8x slower when it needs to regenerate the lockfile.

Even with a valid lockfile, it's still 3.7x slower than pip-tools.

image

image

  • pipenv: pipenv install search_google==1.2.1
  • pip-tools: ./compile-sync.sh (after manually editing requirements.in)

Waiting 1m11s each time I want to add a library does not sound appealing.

Pros and cons

At this point I've only provided speed comparisons between pipenv and pip-tools. Below are a few things I noticed during my week comparing these tools.

virtualenvwrapper

  • simple virtualenv workflow
  • have to manually modify .bashrc to get the commands working
  • would have preferred the syntax to be some variance of "venvwrap mk|rm venv_name" rather than "mkvirtualenv venv_name" and "rmvirtualenv venv_name"

pip-tools

  • simple and focused
  • works with projects AND libraries
  • maintains compatibility with existing deployment tools (puppet, ansible, etc)
  • generated requirements.txt file is well documented and easy to read
  • pip-sync both installs new and removes unused libraries
  • unable to understand urls from github with #egg==version format (which pip understands)
  • likewise with virtualenvwrapper syntax, would have preferred "piptools sync|compile" over "pip-sync" and "pip-compile"

pipenv

  • provides many useful features like "check" for security vulnerabilities and "graph"
  • graph output is very nicely laid out
  • gets the "pipenv install|sync|clean" syntax right
  • much slower at tasks
  • works with projects, but not libraries
  • documentation for commands need work, better luck with trial and error
  • pipenv sync only seems to add libraries - need to run pipenv clean to remove unused libs
  • Pipfile syntax errors result in vague TomlDecodeError stack trace instead of helpful error messages
  • may cause issues with some shell setups due to the way pipenv shell works (lose aliases, no virtualenv label shown, source commands in .bashrc no longer work as expected, etc). mitigated by using --fancy flag, but inconsistent between dev machines with varied setups
  • "pipenv run" fails to set VIRTUAL_ENV environment. Apparently this is virtualenv's fault, but isn't pipenv meant to be a tool that makes it easier for Python newbies to pick up?

"Pipenv is primarily meant to provide users and developers of applications with an easy method to setup a working environment" (from homepage, paragraph 3)

  • doesn't quite feel like deployment tooling/plugins are ready yet (puppet, ansible, etc) - requires more work to update deployment scripts
  • no command for checking if virtualenv already created (I could be wrong due to documentation)

So in a deployment script, I tried to detect if a virtualenv folder exists before trying to sync. Nope, can't do.

"pipenv --venv" should give you the path of the virtualenv, but only if the virtualenv exists. Otherwise, it ends with exit code 1 which will terminate deploy scripts

Maybe sync will work? "pipenv sync --help" shows:

Options:
  --three / --two  Use Python 3/2 when creating virtualenv.

Ahh "when creating", that sounds promising!

But alas, in practice that actually destroys and recreates your virtualenv without warning! Enjoy your additional waiting time...

:~/src/test-piptools$ pipenv sync --two
Virtualenv already exists!
Removing existing virtualenv…
Creating a virtualenv for this project…

I have no words ...

What's the verdict?

I spent roughly 4-5 days getting things to work with pipenv. A bit of time learning the ropes of pipenv's workflow, some of it fighting my mostly-vanilla bash shell to work properly with pipenv, looking up issues on Github/StackOverflow, a lot of time waiting for lockfile generation and I finally had enough when the deployment scripts/tooling needed more work in the staging environment.

Your experience with pipenv on github may vary depending on who you interact with on the contributors team. I've seen a few valid tickets get dismissed, but the friendly assistance I got from uranusjr was highly appreciated.

While I can't argue the fact that pipenv works, it's definitely one of those things that could test the patience of a saint once used in a real world environment.

I find it difficult to see why pipenv is recommended by Python.org / PYPA apart from the reason that it's made by the guy who made requests.

Something to keep an eye on, but for now I don't believe it is as production ready as their alternatives.

 
Copyright © Twig's Tech Tips
Theme by BloggerThemes & TopWPThemes Sponsored by iBlogtoBlog