Getting Started with Continuous Integration in PHP with Jenkins
My company has made a recent transition from a small shop that cranked out short-term projects that typically had one or two devs on a single project, to a larger team working on more complex projects with 5 or more devs working on the same code. It was time for us to up our game.
Enter Continuous Integration.
If you haven’t heard of CI before, here’s a summary from Wikipedia:
“In software engineering, continuous integration (CI) is the practice of merging all developer workspaces with a shared mainline several times a day. It was first named and proposed as part of extreme programming (XP). Its main aim is to prevent integration problems, referred to as “integration hell” in early descriptions of XP.”
If you’re working on a team of one or two on reasonably straightforward projects, the real benefit of CI may not be plainly apparent (but read on anyway – even smaller teams can benefit!).
Reducing Time Between Defect Creation and Discovery
Coupled with the automated tests in TDD (test-driven development), your CI server can be a powerful tool to continuously run and test builds to reduce the amount of time that passes (and code written) between a new bug being created and when it gets discovered.
For example, our devs push to the Git server several times a day on any given project, and our CI build runs every time a developer does a Git push. Each time that build runs, the unit tests I’ve configured in the CI server run. If any of those builds fail, I know exactly which code push caused the tests to start failing. Code gets looked at and fixed immediately, before new code can get piled on top.
In a non-CI world with no automated unit tests, I have to rely on the developers to run unit tests locally when they remember to – which results in code getting pushed that introduces new defects, with god knows how much time passing between the moment that defect was created and the moment it was discovered, and no central place to see the results. Worse yet, the more time that passes between defect creation and discovery increases the probability and severity of code that will be written on top of or intertwined with that defective code. The issues become exponential.
Good News, Everyone!
The good news is that setting up a CI server is easy, and even if you have a small team, you can immediately start benefiting from it – with the added bonus that as your team grows, you’re already in a great position to scale with a solid testing and integration methodology in place. (Scaling without one is incredibly painful, I promise you.)
There are some pretty sexy on-demand style CI servers that manage all of the actual CI server management – you just supply a build file, and they handle the rest. Circle CI, and Bamboo OnDemand (part of the Atlassian/Jira suite) are both great products, and if the hassle of building the CI server itself is the only thing stopping you from trying out CI, I highly recommend getting started there. To see what builds look like, check out Travis CI.
I can say that from my experience, the third-party hosted CI servers tend to skew towards java and ruby projects. That’s absolutely not to say that if you spend a little time customizing the build file (circle.yml for CircleCI, etc) you can’t get them to work for PHP projects. You can. I opted to build my own CI server for the experience and knowledge I would gain from it, and also because it would violate the security requirements of some of our clients to have a relatively un-vetted third party having access to source code.
I decided to go with an industry standard – Jenkins CI (formerly called Hudson). Jenkins is tried and true, and lots of people use it. It has a robust plugin library, so even though it’s written in Java, you don’t really need to know Java to use it. In fact, installing a plugin for Jenkins is usually as easy as point and click through the web-based interface.
In a Nutshell
A CI server consists of a few different moving parts, and each one is pretty important. First, you have your source code. Then you have Jenkins itself, which runs on a server (any server, although I strongly recommend using that server only for automated builds and not trying to share resources):
- Your source code
- Jenkins to build/test your project, and to monitor/report on the results
- Apache Ant to automate some build tasks
- Various unit tests
So Jenkins automates and monitors the results of builds. Think of a build as a “from scratch” execution of your project. A CI server will start with a fresh directory and a fresh source code checkout, so everything you need to make your project work (or at least pass its tests) MUST be included in your source code repo OR in a build file that a build automation tool like Apache Ant can grok.
An Apache Ant build file can be as complex or simple as you need it to be. I’ll show you a few examples of my Ant build.xml files later, once we’ve walked through setting up the actual server, but a few examples of what you might include in an Ant build file are:
- chmod’ing a directory within your source to be writable
- running a command-line script to pull your vendors from composer
- updating database schema or fixtures
Anything that your build requires but that doesn’t just magically happen through a source code checkout is a good candidate for automating with Ant. Bear in mind that Ant does not typically run as root, and Jenkins runs as unprivileged user ‘jenkins’, so that should influence your build tasks.
You can also find a great base for PHP with Jenkins at jenkins-php.org. I did mess around with these templates and couldn’t quite them to work for me, but my initial requirements were a little different. It’s still a damn good website to checkout, and absolutely worth reading.
You’ll want to standardize your build.xml as much as you can between projects so you can just sort of set it and go, but it’s very likely that each project will require some tweaking, especially if your shop supports multiple frameworks. You may end up with a standard build.xml for Symfony2 and another for Zend Framework, but standardizing as much as you can is a good start.
When you first initiate a build through Jenkins (or any other CI server), it will do a fresh checkout of your source, and kick-off the Apache Ant build. Within your Apache Ant build.xml you’ll probably initiate your unit tests. More on that later though!
It should go without saying that unit tests are a huge part of the success and benefit of a CI server. It should also go without saying that bad unit tests are arguably worse than no unit tests. Investing the time into learning to write good unit tests will pay off exponentially in the long run for you, for your team, and for your clients.
For excellent resources on learning how to write unit tests, I highly recommend checking out The Grumpy Programmer’s Guide To Building Testable PHP Applications by Chris Hartjes. There’s a ton of info online about how to test in PHP as well, and if you’re using a framework, it hopefully has some solid docs on how to test within your framework.
Getting Started With Jenkins
We need to talk a little bit about unit tests and automated tests and what’s available out of the box with Jenkins, but we’ll get to that in a bit. The more intimidating part is setting Jenkins up in the first place, and I want to get you past that. There are also a few different ways of setting up your Jenkins server (having it spin up new VMs for each project versus building all projects on the same ever-present machine, for example), but that’s less important right now.
The easiest way to get started with Jenkins is simply to fire up a small AWS Linux instance on Amazon. (You’ll probably want small instead of micro, since there’s a bunch of stuff you have to set up on the server, and it will take forever if you try to do that on a micro.) I have our CI server’s web interface accessible from only our internal networks, but if you use the authentication plugin with Jenkins, you don’t have to lock it down that hard. Do take access to this seriously though, as some of the violation reports will expose vast amounts of your code, so treat access to Jenkins the same way as you would treat access to your raw source code.
A Few Quick Notes
AWS Linux smells a lot like RHEL/CentOS, so you may need to tweak the bash commands to work with your flavor of OS if you don’t use yum as a package manager.
Since you’re working with a brand new instance, you’ll need to install some base software in addition to Jenkins so that your unit tests and code coverage reports can run, etc. Depending on where you spin up an instance, and what type of image you use, you may have some of this stuff installed, but running through the commands below shouldn’t hurt anything.
My team works mostly in PHP, so if you work in a different environment, you may not need all of the pear/php* testing modules shown in my starter bash below, and you may need to install some other language-specific libraries.
Please note that in the commands below, I install PHPUnit 3.4.1 because the first project I set up CI for was using Zend Framework, which relied on 3.4.1 and failed hard if you use a more recent version.
If you need multiple versions of PHPUnit set up on one box for multiple projects, it’s quite easy, and there’s a great article on how to do it here. I have at least two running at any time to support Zend Framework apps and Symfony2 apps, both of which depend on different versions of PHPUnit.
You’ll also need to edit your php.ini file to enable xdebug and to set a timezone (otherwise you’ll end up with PHP errors):
If your source code is hosted on GitHub, you’ll want to add GH to your known hosts on your CI server to prevent issues during automated code checkout later as well:
And just to make it easy to get started, you can create a job in Jenkins that you can use as a template to start building your real jobs from. It’s not required, but it’s just easier this way.
And then reload the Jenkins configuration to see your changes:
At this point, Jenkins, a buttload of test libraries and dependencies, and one template job are installed. Huzzah!
Your new Jenkins server will run on port 8080 of wherever you’ve installed it, so you can access it by going to http://YOUR-SERVER-IP-OR-DOMAIN:8080/. If you need to invoke any of the command line tools for Jenkins, you can do so by calling http://localhost:8080/.
Your web-based Jenkins console will let you configure and view your jobs and builds. You’ll want to copy the starter template we got from jenkins-php.org. In your Jenkins web console:
- Click on “New Job”.
- Enter a “Job name”.
- Select “Copy existing job” and enter “php-template” into the “Copy from” field.
- Click “OK”.
- Disable the “Disable Build” option.
- Fill in your “Source Code Management” information.
- Configure a “Build Trigger”, for instance “Poll SCM”.
- Click “Save”.
You should now have a custom-named job in your Jenkins web console! You’ll probably want to go in and poke around a little to get familiar with what the job configuration page looks like.
You can try running a build now to see if your source code checkout works, but don’t expect the build to pass yet, since we haven’t defined any tests in our Ant build file.
Here’s an example build.xml from a Symfony2 project:
And another build.xml file for a Zend Framework 1.x project:
All of this XML looks really imposing if you’ve never seen a build file before, but it’s actually pretty straightforward. A standard target block might look like:
The target maps back to the “depends” line in the top of the build file. If you’re executing a command line request, it goes in the exec line, and any additional arguments your command needs get included as arg values.
You can have blocks in your build.xml that never actually get executed because you haven’t actually told Ant to invoke them. This target name doesn’t matter, but it should make sense to you, and you must include it in the build line.
Conversely, if you put something in that build line, and you have no corresponding target in the build.xml, your build will fail.
Great PHP Tests
There are a ton of fantastic test suites you can use in your Jenkins/Ant build to check the unit tests you wrote, the overall quality of your code based on accepted standards or your own standards, copy and paste detection, and so on. Here’s a quick rundown of some of the tests I use:
- PHPUnit – The PHP Unit Testing framework. Duh.
- phploc (PHP “lines of code”) – A tool for quickly measuring the size of a PHP project.
- PHPMD – PHP Mess Detector – looks for suboptimal code, overcomplicated expressions and unused parameters, methods and properties
- Copy/Paste Detector (CPD) for PHP code.
- PHP CodeSniffer
There are also some great tools like JSLint4Java, Selenium and more for validating non-PHP code.
Tenets of CI
While there is a lot of flexibility in how you set up your CI server, there are some critical points that must be an aspect of every CI server.
CI builds must be FAST. A build that takes an hour isn’t really as useful as as a build that takes 3 minutes or less. If it means increasing the size of your cloud server to give your CI a little more oomph so it can complete builds faster, do it.
CI must be automated. You should not need a human to click a button to initiate a build. By default, you can set Jenkins to poll your source code at certain intervals. For something more fine-grained, you can set up a post-commit hook in your source code management system to trigger a build on push.
Extending Your CI
So now that you’re a CI Rockstar, what’s next? NOW is the really fun part. As your CI server starts to become part of your workflow (and it should, ASAFP!), you can start to implement more formal process.
For example, you can:
- Trigger a build automagically on every Git push
- Have your Jenkins server push code to live, staging or QA environment every time a test passes
- Set up a workflow that prevents code from being deployed to a live environment *unless* all tests pass
- Automate load-testing with a service like Blitz.Io after every build, so if a code change becomes a bottleneck, you’ll know exactly where to look
- Integrate with HipChat or Hubot (you’re using HipChat, aren’t you? It’s amazing!) so your build start and end (pass/fail) gets automatically posted to the HipChat room to keep the number of emails down to a minimum, but keeps everyone in the know.
- Use the violation reports in weekly code reviews to help all of your devs get better
One thing you’ll see in each build (if you’ve set everything up correctly) is a violations report. The violations reports take the output of the tests you invoked in your Ant build (like PHP Code Sniffer) and parses them out into a human-readable, friendly way. It’s up to you how nitpicky you want to be.
By default, the code you’re testing will probably have a LOT of violations. You can determine whether a violation for tabs versus spaces is an error, a warning, etc, for example.
While you may not want to go too nuts during your first foray into CI, these style violation detection reports can be a great tool in standardizing your code. If you look at the source code of a project that eight devs worked on and it *looks* like eight devs worked on it, you should probably prioritize setting some coding standards, and CI can help you test against them and enforce them.
The plugins that contribute to the violations reports typically have custom rules you can set up if you find default standards too strict, or not a good fit. You can read more about the rules in PHP Mess Detector here, and check out a coding standard ruleset tutorial for PHP CodeSniffer here.
Whether we like it or not, sometimes it’s important to the folks that sign our paychecks to understand why investments of time and money on things like CI are worthwhile. The cost is minimal, and the benefits are tremendous, but that may not be specific enough for the person in charge of your IT budget.
The key will be to track the amount of time spent on regression issues and bugfixes on a project before adding CI, and after. If you’re using a bugtracker with good agile reporting, this is actually pretty easy. You can measure based simply on tickets that were opened, or even better, actually use the time-logging for all of those issues. Then you can go to management with something like “Running a CI server cost us $300 in time and servers, and directly translates to saving 60 development hours in bugfixes.”
Even if no one in management is asking you for those numbers, it’s good to pull them together to get your dev team excited and motivated, too. Generally speaking, devs want to write code, not fix bugs. Showing them the benefits will make sure everyone is on-board, especially after you’ve crushed their ego with the first few violation reports ;).