Recently Attivio was asked to certify that we could run our product in the Microsoft Azure cloud. Since AIE is a 100% java solution this posed some challenges we thought might be of interest to the general Java community. Microsoft and Avkash Chauhan put together a great article and sample code on how to run Tomcat in the azure cloud called the Tomcat Accelerator. While this document is helpful in getting Tomcat running it leaves some gaps the reader has to fill in and doesn't clearly spell out the whole story to non-Microsoft types.
This blog is meant to provide a series of tips and tricks for application develops and integrators who are trying to make Java or any non-Microsoft based application work in the Azure cloud. I'm sure there are many different ways to accomplish these goals however these seemed to work for us and provided a simplified means of proving out our product in the cloud with minimal effort. Before I go any further, all of the code used in this project is available upon request.
This article assumes you know how the Azure works at a high level, have basic experience with the Azure SDK inside Visual Studio, and have read through the Tomcat Accelerator documentation.
Design Patterns...Deploying Large Applications
Problem: Deploying a large application to the cloud takes a long time and leads to long iteration cycle times.
Solution: Don't package your application
The Tomcat accelerator starts off by bundling a complete copy of tomcat into the Visual Studio project. Further they instruct you to download a copy of the Java Runtime on your own and add it to your project. There are a few issues with this solution. The first is that you then have to go and add each file to the deployed project manually. Worse, you have to do it by selecting each file as there is no directory level manipulation. The second major issue comes up when you try to deploy your application to the cloud. Depending on the size of your application you'll be packaging and uploading the entire artifact each time you deploy to the cloud. In an iterative deployment scenario, this can add minutes or hours to each deployment. Further, there are known bugs in the Visual Studio deployment tools that cause packages larger than 40mb to fail to upload correctly. Since the JRE itself is over 80mb you're more or less guaranteed to hit this problem.
Our solution was to use the azure storage cloud to host a zipped up copy of our application. We uploaded this once, and then added code to our worker role to download and unzip the file on start up. This drastically speeds up deployments and makes for a much simpler mechanism for versioning applications running in the cloud. You can effectively deploy a new version of your application independent of your azure project. You can use a CloudBlob storage client and the free DotnetZip library as follows to download your app on start up:
String storageUri = "http://myteststorage.blob.core.windows.net";
String storageAccountName = "myteststorage";
String storageKey = "someReallyLongKey";
CloudBlobClient blobClient = new CloudBlobClient(storageUri, new StorageCredentialsAccountAndKey(storageAccountName, storageKey));
CloudBlobContainer appContainer = blobClient.GetContainerReference(container);
CloudBlob blob = appContainer.GetBlobReference(someBlobName);
// note: this file destination needs to be underneath a LocalStrorage path
blob.DownloadToFile(mylocalStorageDestFleName); // then unzip it using Ionic.Zip
using (ZipFile zip = ZipFile.Read(fileName))
{
// note: again, this outputdir destination needs to be underneath a LocalStrorage path
zip.ExtractAll(outputDir);
}
Summary: Since your app, JRE, interpreter, etc is probably not going to change, there's no point in uploading it every time you want to deploy your application. Upload it once, and then let your app grab a copy if it needs to on start up.
Local vs Cloud App Fabric Issues
Problem: Your app works locally but not when deployed to the cloud
Solution: Don't trust the local compute emulator, or at least understand it's limitations.
The local compute and storage emulators are great for proving basic code concepts however they lack some important features for real world testing. Specifically they don't check for proper inbound tcp/http endpoint usage and they don't check for correct read/write file system usage. The azure cloud is based on a model where many people's application will run on the same physical OS instance so port collisions need to be dealt with. Specifically, two applications may both want to open port 80 for inbound communication however that would cause a fundamental networking issue.
As a work around, Azure lets you define which port you'd like opened from an external load balancer's point of view and internally they assign you a random port that you can listen on. The Azure load balancer does the rest. This system works great however the compute emulator does not make sure this is enforced. Secondly, your application's project directory is READ ONLY. That small and unemphasized fact caused me a couple days of lost effort. In order to have a read/write file system you need to define Local Storage for your project. For most applications this also means that what ever code you package with your project and/or download from the cloud storage, you need to make sure that it ends up in your local storage directory. Again, the local emulators do not enforce this so your app may run fine out of the project directory on the local VM but when you go to use a real Azure instance it will fail because it can't write some tmp or log file.
Summary: There's not much you can do here other than really think through your code and try to get your iteration time for deployment to a minimum. Remember, this affects log files, temporary directories and anything else that writes data to disk.
Logging and Diagnostics
Problem: Getting trace information out of the cloud is complex and error prone
Solution: Use storage clients to do your own logging
When developing my initial code base I used Trace.TraceInformation("My Log Message") statements throughout to see what was happening via the compute emulator. This was amazingly useful to see what was happening in my code. However, once you deploy your application to the real cloud, you lose this information. You can setup diagnostics however there are numerous blog posts all with conflicting reports about how to do this. More importantly, the messages get transferred to storage on a schedule; i.e., not in real time. There were some occasions where I saw my log messages appear in the diagnostics tables 20-30 minutes after they executed. As you can imagine, this also adds significantly to developer iteration time.
In order to solve this problem I wrote my own log(String message) method as follows:
int logId = 0;
public void log(String message)
{
// so we can see it in the local compute emulator easily
Trace.TraceInformation(message);
// make sure we see it 'now'
Trace.Flush(); // now write it to something that we can see in the cloud.
// Note: I used a blob client because I was already familiar with the code and I already had a blob client available
// a table client is probably much better from a usability standpoint and is closer to what the Diagnostics is doing
// This is probably a really bad idea from a performance and production standpoint but it was invaluable for development and testing.
CloudBlobContainer container = blobClient.GetContainerReference("logs");
// create a unique sortable id.
CloudBlob obj = container.GetBlobReference(String.Format("{0:0000000}", logId++));
obj.UploadText(message);
}
Summary: Once I used this method for logging I was able to see my log messages in real time as they were happening and was able to find those last few problems with my code.
Wrap up
The Azure cloud can be coerced into running non-Microsoft code however there are a number of gotchas that aren't abundantly obvious. The one thing I will say is that calling Microsot Azure Support is a pleasant experience. You get to talk to a real person ~immediately and they are very serious about helping you solve your problem.




