Tuesday, November 05, 2013

Writing Build Packs for CloudFoundry

CloudFoundry Build Packs

Introduction

One of my favorite new features with CloudFoundry v2 is that users now have the ability to run any application on the system, regardless of CloudFoundry's support for a particular development stack or programming language.
This is accomplished through the new build pack system.  As the name implies, a “build pack” is a set of functionality that builds your application and creates the executable unit, called a droplet, that is run by CloudFoundry.  The beauty of the build pack system is that it puts a tremendous amount of power into the users hands.  In the past, if a user wanted to customize the deployment environment or add support for a new language or framework, he or she had to fork and run a customized installation of CloudFoundry.  Now, a user can simply fork or create his or her own build pack and run it on any of the existing CloudFoundry provider’s infrastructure (i.e. run.pivotal.io).
In this article, I'm going to discuss the custom build pack system, some of the points you'll need to consider when creating your own build pack and give some tips for troubleshooting a custom build pack.

Usage

To get started with custom build packs, you'll need to know how to instruct CloudFoundry that your application requires a custom build pack and which one it requires.  Fortunately this is a simple process, you just include the ––buildpack= argument as you push an application to CloudFoundry.  

Example:


cf push –-buildpack=http://github.com/someuser/somerepo.git

This additional argument will instruct CloudFoundry to retrieve the specified build pack, using Git, and run the build pack against the application being deployed.

Anatomy of a Build Pack

A build pack is amazingly simple and consists of just three scripts:  detect, compile and release.  The scripts are executed in the order I listed them by CloudFoundry and can be written with virtually any scripting language (see the General Considerations section below, where I discuss this point further) so long as the scripts are directly executable in the CloudFoundry environment.  Beyond that, it's just a matter of adhering to the contract that CloudFoundry establishes with each script, which we’ll discuss next.

Detect Script

The detect script is the first script from a build pack that is executed by CloudFoundry.  The responsibility of the detect script is to determine if the build pack recognizes the application that needs to be packaged.  
When the detect script is executed, CloudFoundry passes it one argument, the location of the application files that have been pushed to the server, which is often called the build directory.
How you implement the detect script is entirely dependent on the structure of the application, but it typically results in a scan through the build directory searching for some key identifier like the existence of a specific file, a file ending with a particular extension or a key word being found in a particular file.  If the key identifier is found then the build pack knows it can package the application.  If  not, then it passes and allows another build pack the chance to package the application.
Once the detect script has determined if it can or cannot handle the application, it needs to alert CloudFoundry.  If the detect script is unable to handle the application, it simply writes “no” to STDOUT and exits with an exit code greater than zero.  If the detect script is able to handle the application then it writes the language or framework name to STDOUT and exits with the exit code zero.  While it is not strictly necessary to write the language or framework name, technically you can write anything other than “no”, writing the language or framework name is the convention followed by most build packs.

Compile Script

The compile script is the second script from a build pack that is executed by CloudFoundry and it is typically the most complicated.  The compile script is responsible for the actual bundling and packaging of the application.  In other words, this is where the actual work of creating the application droplet occurs.
When the compile script is executed, CloudFoundry passes it two arguments.  Like the detect script, the first argument is the build directory.  For the compile script, this location has a slightly different meaning though.  Not only does it hold the application files that were pushed to CloudFoundry, but it is also the location where the build pack should add any additional resources that are required to run the application.  After the compile script completes, everything that is included in this directory will be packaged up by CloudFoundry and included into the droplet.  
The second argument passed to the compile script is the location of the cache directory.  The cache directory is a location where the build pack can place files that it wants to retain from execution to execution.  As the name implies, this is often used to cache files which are expensive to create, such as large downloads.  
The cache is scoped to an individual application and exists as long as the application exists.  This means that the first time you push an application, the cache is created and it is empty.  As the application is packaged, the build pack can place files into the cache directory.  When the build pack finishes, the cache directory is automatically saved.  The next time the build pack runs for the same application, the cache directory will be restored with its previous contents.  The only gotcha with the cache directory is that once a file is saved, it cannot be removed or updated.  At present, the only way to remove or update a file is to delete the application, which will reinitialize the entire cache.
Beyond the script arguments, there are a couple additional locations which may be helpful to a build pack author.  The first is a temporary directory, which can be located by looking at the TMPDIR environment variable.  The second is the location of the build pack itself.  This can be found by looking at the full path to the script that was executed, often the 0th argument passed to the script and popping off the last two items in it (i.e. compile and bin).
How you implement the compile script depends entirely on the steps that it takes to package your application into a droplet.  In most cases, this will involve downloading, installing and configuring external resources like a programming language or a server.  As mentioned above, anything that is required to run the application should be installed into the build directory.  How you organize the build directory is up to your build pack and the contract that it makes with its users.
Once your compile script successfully completes, you simply need to exit with an exit status of zero.  If you want the compile script to fail, simply exit with a non-zero exit code.

Release

The release script is the third and final script from a build pack that is executed by CloudFoundry and is typically very simple.  The release script is responsible for providing CloudFoundry with the metadata necessary to execute an application, specifically this information must indicate the command to be run to execute the application droplet.
Just like the detect script, the release script is given one argument, the build directory.  With that, the release script should write to STDOUT the metadata in YAML format.  
There are two points of metadata that you can specify, config_vars and default_process_types, both of which are specified as lists.  The config_vars list should contain environment variables required by your application.  The default_process_types list should contain a list of processes to run.  
Having said that, config_vars is supported by CloudFoundry at this time.  Furthermore default_process_types only supports one process of type web.  These unsupported features are holdovers from the build pack system which was originally created by Heroku and may or may not be implemented on CloudFoundry in the future.
With that, here is an example of what the output should look like.
default_process_types:
   web:
Once that has been printed to STDOUT the release script should complete and exit with an exit code of zero.  Any other exit code or invalid YAML written to STDOUT will result in an error.  Be especially careful if you are writing debug information to STDOUT as this will corrupt the YAML.

Thoughts and Design Considerations

Because a build pack is essentially a set of shell scripts, what you can do with it is open-ended.  For the most part if you can script it, you can do it.  Having said that, just because you can do something doesn't mean that you should.  In this section, I'm going to talk about some of the design considerations and challenges that you might face when building a build pack.

General Considerations

Before you begin to develop a build pack, the first choice you’ll need to make is if you want to start with an existing build pack and fork, or modify it, to fit your needs.
Because the build pack system in CloudFoundry is based on the build pack system from Heroku, many of the Heroku build packs work with little or no modification on CloudFoundry.  Because these build packs already exist and conform to the build pack contracts, starting with one of them can be a quick way to get a custom build pack up and running.  
Another option if you are looking to create a build pack that is based around the JVM, would be to check out the CloudFoundry Java Build pack.  It was written so that it could easily be extended and it provides a developer with convenience methods which should help to make build pack development quicker and easier.
The other important decision to make is what language to use to write your build pack.  Many of the existing build packs are written as bash scripts, which is probably the safest and most compatible choice, as most of the installations of CloudFoundry are running on Linux.
Bash may not be your first choice though and thankfully it is possible to write your build pack in a few different scripting languages.  When picking the language for your build pack, you’ll want to make sure that the language you would like to use is supported by your CloudFoundry provider.  This is because the environment that executes your build packs could vary from provider to provider.
At the time this article was published, the run.pivotal.io build pack environment has Python 2.6.5, Ruby 1.9.3 p392 and Perl 5.10.1 installed.  Given that, a build pack targeting run.pivotal.io could be written in any of those languages.

Detect Script

As you might expect, the biggest thing to think about when writing the detect script is how should the build pack know if it is able to deploy the given application.  From a technical standpoint, this typically involves searching for some key identifier, such as a language or framework specific configuration file, a file or files ending with a specific extension or even some key word in the files.  Where you need to be careful is in what key identifiers you choose.  If you choose something that is too general, your build pack might falsely think that it can handle an application, when it cannot.  If you choose something too specific, the build pack might skip an application that it could in fact handle.
When authoring a custom build pack, it is not strictly necessary to write a detect script because CloudFoundry will not call the detect script when an application is pushed with the --buildpack argument or a build pack specified in the manifest file.  Despite this, I would still suggest that you write and test a detect script.  It’s generally quick and easy to do, plus CloudFoundry’s behavior could change in the future and that would break your build pack.

Compile Script

Being that most of the work happens in a build pack's compile script, it stands to reason that this is where the majority of the problems might exist.  While this is not a complete list, here are some of the common issues that you might encounter.

Application Requirements

The first thing to consider, is what does the application need to run.  Because the compile script is tasked to build a complete environment for the application, it needs to include everything.  When writing the compile script, assume that nothing is included out-of-the-box and that you need to include everything that is required to run an application.
Exactly what you need to include will depend on your build pack, but here are some of the common things that might be required by an application.
  • A web or application server such as Apache HTTPD, Apache Tomcat or Nginx to host the application.
  • A programming language runtime or interpreter like Perl, Python, Ruby or the JVM.
  • Individual application libraries
    • In some cases you may want to automatically add libraries, like when a database or service is being used by the application.
    • In other cases, you may want to provide the user with a way to indicate libraries that need to be installed in order for the application to run properly.  Examples of this are Ruby's Gemfile and Python's requirements.txt file.

Downloads

Because it is not practical to bundle all of the resources needed by the application within the build pack, the compile script has access to download external resources into the environment.  This can be done with a tool like curl or with functionality built into the scripting language used by the build pack.  There is no proxy information needed when making requests to download files.
Once files are downloaded into the environment, it is recommended that you add them in the cache directory, especially if they are large files.  Files added to the cache directory will automatically get stored by CloudFoundry and will be available to the build pack on subsequent runs.  The build pack can then use the files in the cache directory rather than downloading them from a remote location, thus lowering the time it takes to execute.

Binaries

When downloading the external resources required by an application, you may encounter resources that need to be compiled.  Fortunately, the compile script has access to all of the typical Linux build tools like make and autoconf, so you can build those resources as a part of your compile script.  Having said that, you need to be careful when building resources.
Compiling resources can take a significant amount of time and you don't want your users to have to wait a long time for their application to push.  Furthermore, the compile script has to finish in a finite amount of time or an error will occur.  On run.pivotal.io this is currently set to 900 seconds or 15 minutes.
To make the script execute as fast as possible, most build packs make use of precompiled binaries.  The build pack generally knows in advance which resources that an application might require.  Build pack authors can then precompile all of those resources and make them available via HTTP.  The precompiled resources can then be downloaded and cached as described in the Downloads section of this article.
The topic of how to compile binaries which are compatible with CloudFoundry is outside the scope of this article.  However, I’ll link to a few resources which show how some build pack authors have accomplished this.

Release Script

Like the detect script, the release script has one main consideration.  What command should CloudFoundry execute to start the application?  Depending on the language and framework used, this could be anything from starting a server like Apache HTTPD to executing a script with a provided language runtime or interpreter.  
Beyond that, you need to decide if the command will be listed directly inside the YAML output by the release script or if you’ll list a wrapper script in the YAML and include the command to start the application in the wrapper script.
For simplicity’s sake, I would suggest that you keep the command inlined in the YAML.  You don’t need to worry about including or generating a wrapper script and it’s easier for someone else to understand what the build pack is doing to start the application.  As with every rule though, there are a few exceptions.
The first exception is pretty obvious, if the process for starting your application is complicated then using a wrapper script is more convenient.  When you list the command directly in the YAML, it must all fit on one line.  If an application requires multiple commands to start it’s easier to read when they are on multiple lines.
The other problem is that while the YAML file, supports setting environment variables for the command to start your application, CloudFoundry does not implement this functionality.  That means if you need to set environment variables before your start your application, you’ll need to do that in a wrapper script.
An example of using a wrapper script can be found in the PHP build pack, which sets some custom environment variables and also starts two processes php-fpm and a web server.
Another less obvious reason to use a wrapper script is because the process that starts your application must continue to run in the foreground.  If, like some server applications, the process starts and runs in the background as a daemon, CloudFoundry will think that the application has crashed and try to restart it.  Because of this, it is sometimes necessary to wrap the command to start your application in a loop and prevent it from exiting.  An example of this loop can be seen in the PHP build pack here.

Troubleshooting

While developing the build pack, it’s likely that you’ll encounter some sort of problem.  Fortunately debugging the build pack is straightforward.  When you encounter a problem, start by breaking it down and looking to see which script caused the problem.  When a failure occurs, just ask yourself some questions like these.   Did the detect script correctly detect the application?  Did a command fail during the compile script?  Did the release script specify the correct command to start the application?  Once you have determined where the error is occurring, then you can begin debug further.
For the detect and release scripts, which are generally simple, it can often be sufficient to run and debug them locally.  You can simulate the CloudFoundry environment by creating a sample application and passing it’s location into the script as the build directory.  From there check the output of each script and make sure it is working expected.  
Another tip that can be helpful for debugging the release script is to intentionally return an invalid exit code, like negative one.  This will cause the release script to error and the build pack to halt.  The benefit of this is that anything that has been written to STDOUT should be visible through the console when you execute a push.  This provides an excellent way to spot check the YAML that is produced by the script.
Because it’s generally the most complicated script, most of the time you’ll see issues in the compile script.  While it’s possible to debug the compile script locally, it can be more complicated because the compile script has additional dependencies and it often acts destructively on those dependencies.  Fortunately, it is possible to debug the compile script as it runs by writing information to STDOUT.  Any information written to STDOUT will be displayed on the screen as a part of the output from the push command (if you do not see output written to STDOUT, check to make sure that your scripting language is not buffering the output).  Inserting some additional debugging output, is often sufficient to debug the problems that occur in the compile script.
In most cases, errors with the build pack will be obvious.  You’ll see an error or stack trace listed in the output from the push command, however there are some errors which are not so obvious.  Sometimes when you push your application, the build pack will indicate that the application is flapping or the build pack will run without error but the application will not be running.  When errors like this happen, you’ll want to use the tools that the cf command makes available to you to debug further.  
A good place to start debugging is with the commands cf crashlogs and cf logs.  These commands allow you to examine the log files generated by the build pack.  This will include output from the build process, environment variables available when the build pack is run and anything written to STDOUT or STDERR by the application process.  In addition, you can use the cf files command to examine the environment that was built by the build pack.  This is often helpful as applications may have additional log files that are not included in the output of cf logs.  Lastly, the cf events command can be used to see if the application has failed or was killed for some reason.  One instance where this is helpful is when CloudFoundry has killed your application for exceeding its memory limit.

Summary

The CloudFoundry build pack system is a fantastic new feature that gives users more power and control regarding how their applications run on the system.  There are quite a few existing build packs, a few of them are officially supported on CloudFoundry but the majority are Heroku build packs that are compatible with CloudFoundry.  If the user’s needs are not serviced by one of the existing build packs or if the user would like a more customized environment, he or she has the ability to create a custom build pack by writing a few scripts.

3 comments:

Mansi said...

I am currently writing custom buildpack for Geronimo.I am able to start Geronimo server in buildpack. But I am not able to deploy app in Geronimo. There are two ways to deploy app.

1.Hot Deployment:
Directly Put Application War in Geronimo_home/deploy directory.
2.Using Command:
Change directory to /bin and run the following command deploy --user system --password manager deploy --inPlace

I am using 1st option. We can only put war file in deploy folder. But when I push app war from cf tool to bluemix it gets extracted in build path. So how could I put this extracted file in deploy folder? Do I need to create war again? If yes how?

Mansi said...

I am currently developing custom buildpack for Apache Geronimo. In that I need to deploy my app in deploy folder of geronimo. I am pushing geronimo war file from cf tool. When I list files in build path of compile script I see war file in extracted format. I have to move those file in deploy folder of geronimo but it should be war file. Do I need to create war file again? If yes how?

Daniel Mikusa said...

@Mansi - When you run "cf push" and set the path to point to a war file, cf will extract everything from the WAR file and upload the files independently. The main reason it does this is because it can prevent files that have not changes from being uploaded a second time. You need to account for this in your build pack (i.e. expect multiple files and not just a WAR file). How you handle that is up to you. With the Java Build pack, I believe it just deploys the application as an exploded WAR directory. Not sure if that is an option. If it's not, you could probably use the zip utility or even jar to create a new WAR file.