Unified Information Access Blog

Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.

Recently Attivio was asked to certify that we could run our product in the Microsoft Azure cloud. Since AIE is a 100% java solution this posed some challenges we thought might be of interest to the general Java community. Microsoft and Avkash Chauhan put together a great article and sample code on how to run Tomcat in the azure cloud called the Tomcat Accelerator. While this document is helpful in getting Tomcat running it leaves some gaps the reader has to fill in and doesn't clearly spell out the whole story to non-Microsoft types.

This blog is meant to provide a series of tips and tricks for application develops and integrators who are trying to make Java or any non-Microsoft based application work in the Azure cloud. I'm sure there are many different ways to accomplish these goals however these seemed to work for us and provided a simplified means of proving out our product in the cloud with minimal effort. Before I go any further, all of the code used in this project is available upon request.

ico_blog.pngThis article assumes you know how the Azure works at a high level, have basic experience with the Azure SDK inside Visual Studio, and have read through the Tomcat Accelerator documentation.

Design Patterns...Deploying Large Applications

Problem: Deploying a large application to the cloud takes a long time and leads to long iteration cycle times.

Solution: Don't package your application

The Tomcat accelerator starts off by bundling a complete copy of tomcat into the Visual Studio project. Further they instruct you to download a copy of the Java Runtime on your own and add it to your project. There are a few issues with this solution. The first is that you then have to go and add each file to the deployed project manually. Worse, you have to do it by selecting each file as there is no directory level manipulation. The second major issue comes up when you try to deploy your application to the cloud. Depending on the size of your application you'll be packaging and uploading the entire artifact each time you deploy to the cloud. In an iterative deployment scenario, this can add minutes or hours to each deployment. Further, there are known bugs in the Visual Studio deployment tools that cause packages larger than 40mb to fail to upload correctly. Since the JRE itself is over 80mb you're more or less guaranteed to hit this problem.

Our solution was to use the azure storage cloud to host a zipped up copy of our application. We uploaded this once, and then added code to our worker role to download and unzip the file on start up. This drastically speeds up deployments and makes for a much simpler mechanism for versioning applications running in the cloud. You can effectively deploy a new version of your application independent of your azure project. You can use a CloudBlob storage client and the free DotnetZip library as follows to download your app on start up:

String storageUri = "http://myteststorage.blob.core.windows.net";
String storageAccountName = "myteststorage";
String storageKey = "someReallyLongKey";
CloudBlobClient blobClient = new CloudBlobClient(storageUri, new StorageCredentialsAccountAndKey(storageAccountName, storageKey));
CloudBlobContainer appContainer = blobClient.GetContainerReference(container);
CloudBlob blob = appContainer.GetBlobReference(someBlobName);
// note: this file destination needs to be underneath a LocalStrorage path
blob.DownloadToFile(mylocalStorageDestFleName); // then unzip it using Ionic.Zip
using (ZipFile zip = ZipFile.Read(fileName))
{
// note: again, this outputdir destination needs to be underneath a LocalStrorage path
zip.ExtractAll(outputDir);
}


Summary: Since your app, JRE, interpreter, etc is probably not going to change, there's no point in uploading it every time you want to deploy your application. Upload it once, and then let your app grab a copy if it needs to on start up.

Local vs Cloud App Fabric Issues

Problem: Your app works locally but not when deployed to the cloud

Solution: Don't trust the local compute emulator, or at least understand it's limitations.

The local compute and storage emulators are great for proving basic code concepts however they lack some important features for real world testing. Specifically they don't check for proper inbound tcp/http endpoint usage and they don't check for correct read/write file system usage. The azure cloud is based on a model where many people's application will run on the same physical OS instance so port collisions need to be dealt with. Specifically, two applications may both want to open port 80 for inbound communication however that would cause a fundamental networking issue.

As a work around, Azure lets you define which port you'd like opened from an external load balancer's point of view and internally they assign you a random port that you can listen on. The Azure load balancer does the rest. This system works great however the compute emulator does not make sure this is enforced. Secondly, your application's project directory is READ ONLY. That small and unemphasized fact caused me a couple days of lost effort. In order to have a read/write file system you need to define Local Storage for your project. For most applications this also means that what ever code you package with your project and/or download from the cloud storage, you need to make sure that it ends up in your local storage directory. Again, the local emulators do not enforce this so your app may run fine out of the project directory on the local VM but when you go to use a real Azure instance it will fail because it can't write some tmp or log file.

Summary: There's not much you can do here other than really think through your code and try to get your iteration time for deployment to a minimum. Remember, this affects log files, temporary directories and anything else that writes data to disk.

Logging and Diagnostics

Problem: Getting trace information out of the cloud is complex and error prone

Solution: Use storage clients to do your own logging

When developing my initial code base I used Trace.TraceInformation("My Log Message") statements throughout to see what was happening via the compute emulator. This was amazingly useful to see what was happening in my code. However, once you deploy your application to the real cloud, you lose this information. You can setup diagnostics however there are numerous blog posts all with conflicting reports about how to do this. More importantly, the messages get transferred to storage on a schedule; i.e., not in real time. There were some occasions where I saw my log messages appear in the diagnostics tables 20-30 minutes after they executed. As you can imagine, this also adds significantly to developer iteration time.

In order to solve this problem I wrote my own log(String message) method as follows:

int logId = 0;
public void log(String message)
{
// so we can see it in the local compute emulator easily
Trace.TraceInformation(message);
// make sure we see it 'now'
Trace.Flush(); // now write it to something that we can see in the cloud.
// Note: I used a blob client because I was already familiar with the code and I already had a blob client available
// a table client is probably much better from a usability standpoint and is closer to what the Diagnostics is doing
// This is probably a really bad idea from a performance and production standpoint but it was invaluable for development and testing.
CloudBlobContainer container = blobClient.GetContainerReference("logs");
// create a unique sortable id.
CloudBlob obj = container.GetBlobReference(String.Format("{0:0000000}", logId++));
obj.UploadText(message);
}


Summary: Once I used this method for logging I was able to see my log messages in real time as they were happening and was able to find those last few problems with my code.

Wrap up

The Azure cloud can be coerced into running non-Microsoft code however there are a number of gotchas that aren't abundantly obvious. The one thing I will say is that calling Microsot Azure Support is a pleasant experience. You get to talk to a real person ~immediately and they are very serious about helping you solve your problem.

At Attivio we strive to create software of the highest quality. One technique we've adopted heavily inside our organization is the concept of 'unit' testing. I put unit in quotes because a unit test here might mean validating that MyClass.add2plus2() == 4 however it might also mean starting up an instance of our full application, feeding a few test documents through a workflow and validating via search that the entire system works end to end. Further, while we try to design code in order to be easily testable we try to avoid modifying production code specifically for a given test case.

As you can imagine, the full integration tests are often times very difficult to write and verify. We have a highly parallel system that runs multiple processes, with multiple threads and a message based architecture to perform the business logic in a workflow. We wanted to share some of the testing challenges we've encountered and the strategies we've come up with to solve those issues.

Testing log output

The Problem

We need to verify that a certain block of code is being executed or that a certain log message is being sent to the user.

Our Solution

First a little background. We use logging heavily throughout our system to convey information, error messages and trace level debugging. Our logging system is based on Slf4j with a Log4j backing implementation. All of our normal unit tests run with our default log4j configuration file. In order to address the logging problem we've created a CapturingLog4jAppender class to unsurprisingly, capture log output. It does this by dynamically inserting a new Appender into the log4j system. It works in a test as follows:

// first we need to install the logger and tell it to listen on a given class / package / etc
CapturingLog4jAppender log = CapturingLog4jAppender.install(MyClassBeingTested.class);

// perform test that exercises MyClassBeingTested

// check to see if there were any unexpected warnings
Assert.assertEquals(0, log.getWarningCount());

// we can also check for the existence of certain log messages
// the last flag determines if it is an exact match or a regex match. both of these samples use regex
Assert.assertEquals(1, captured.count(".*Some log message I expect to see.*", false));
Assert.assertEquals(0, captured.count(".*Some log message I didn't expect to see.*", false));

// uninstall the log
log.uninstall();

The nice thing about this approach is that we can effectively monitor any number of classes for very intricate behaviors assuming we have the logging in place. For example, we can make sure that certain error cases were handled correctly if their side effects are hard to evaluate. We've also found that the log messages we add for testing are almost always useful in the field for debugging problems with customer configurations.

Checking for exceptions in other threads

The Problem

Since we have such a highly parallel system we often times encounter exceptions in other threads that are unhandled. While this is usually not a problem in production, it does come up while doing development of new features.*

Our Solution

We use Java's built-in uncaught exception handler to trap these exceptions.

public class TestExceptionHandler {

private static ConcurrentLinkedQueue<Throwable> exceptions = new ConcurrentLinkedQueue<Throwable>();

public static final init() {
Thread.setDefaultUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
@Override
public void uncaughtException(Thread thread, Throwable throwable) {
exceptions.add(throwable);
}
}
});

public static Collection<Throwable> getExceptions() {
return exceptions;
}

public static void clear() {
exceptions.clear();
}
}

Then in your test you can check for exceptions with the following logic:// initialize at the beginning of your test.

// initialize at the beginning of your test
TestExceptionHandler.init();

// run a test that may have lots of exceptions in many threads

// check to make sure there were no exceptions thrown
Assert.assertEquals(0, TestExceptionHandler.getExceptions().size());

// clear it out for the next test
TestExceptionHandler.clear();

Detecting state in an asynchronous workflow engine

The Problem

We need to know when something is done and/or cause work to be paused in the middle of our asynchronous workflow processing engine.

Our Solution

One of the big improvements made in Java 5 and 6 was the inclusion of the concurrent package. We use these classes heavily in our code however we've also found use for some of these in our tests as well, specifically the CountDownLatch. The Javadoc for the CountDownlatch has very good examples of how to use it however we've used the class as a test control mechanism in addition to the normal multi-threaded production code management. If you're only doing simple yes/no type operations you can use a single volatile boolean variable.

CountDownLatch latch = new CountDownLatch(5);

startSomeAsyncProcess();
// wait for my mock async processor to run 5 times.
latch.await();
// check some other affected by the asynchronous processing since we know it's "done"

public static final class MockAsyncProcessor() {
public void run() {
// do 'stuff'
// each time this processor is called the counter decreases by 1.
latch.countDown();
}
}

Each of these mechanisms helps us ensure that our code does what we say it's going to do and more importantly, it enables us to add new functionality at any level of the product and be sure that we're not going to break some small intricate behavior of some other part of the system.

AIE can be used as a comprehensive information application development platform. When it's used this way customers can develop arbitrarily complex applications that can have a variety of performance and memory runtime profiles. So whether we are testing an upcoming feature or helping a customer in the field debug a production application, we often turn to the YourKit Java profiler. Aside from the IDE I use every day (Eclipse), YourKit is hands down the most useful developer's tool I use.

Why YourKit is Awesome

BSD license/API

The YourKit license allows unrestricted shipment and embedding of the server agent. This enables us to ship the agent with our core product. Since the API supports runtime activation and deactivation of profiling, we can include the profiling capability in all executing instances of AIE. Often performance and memory utilization issues can only be observed in the field while running real applications against real data. This capability allows us to have this support ready to turn on, while having minimal impact on runtime performance or stability. Captured profiling snapshots can then be sent back to the office for analysis.

yourkit-ticket.jpg

Customer Service

Whenever we have had an issue with using YourKit, they have responded exceptionally quickly with fixes and updates. They also have a great forum with quick turn around on questions from the development staff. Even if some of the fancier features (which I love but could live without) were not present, this kind of customer service is what makes us feel that we are working with a company we can depend on. This is the type of partnership focus that makes a software company great to work with, and what we always strive to achieve at Attivio. Examples of issues YourKit has addressed for us:

  • They quickly added support for the runtime specification of the agent listener port. AIE uses several ports, each located at standard offset from a user-specified baseport. This feature allows the profiler port to play nicely with our standard configurations.
  • They added the -disableall option to allow easily disabling all instrumentation and default profiling activities. AIE runs with this option by default. This new feature allows us to be future-proofed against new instrumentation options.
  • In large memory environments, dumping the entire JVM heap can take a long time. They added a new option -disableoomedumper in less than 5 business days. See http://forums.yourkit.com/viewtopic.php?f=2&t=3266.

UI/Analysis

YourKit has all the standard profiler capabilities and some great UI to go along with it. The list below is not at all comprehensive, but includes some of my favorites. It is clear that YourKit has put a lot of thought and effort into providing a tool that helps you work with the snapshot you've got, which is critical for diagnosing production issues.

Threads Visualization

Sometimes in a multi-threaded application it can be difficult to understand what is going on at a specific time in the application. The YourKit profiler has a threads visualization that helps with this. Its color-coded view shows the state of threads over time and allows you to view the stack traces of all threads at a particular point in time.

YourKit Screenshot

CPU Telemetry

When it comes to solving CPU performance issues, the CPU telemetry view is our standard first stop. The Hot Spots link usually directs us to the culprit very quickly. When the problem is more complex than that, the Merged Callees tab allows us to find all the places a particular method is called and allows easy navigation up and down the call stack. The image below shows the ability to select a subset of time in order to further fine tune this analysis:

YourKit Screenshot

Monitor Tracking

Sometimes a performance problem is not due to an inefficient algorithm, but due to multiple threads competing for a shared resource. The monitor view helps to track down these types of issues. For each thread, the time it has spent waiting or blocked is displayed along with the stack traces of the various cases.

YourKit Screenshot

There's a lot more I don't have time to address. But YourKit is continually innovating and preparing for the future. Probes are an exciting new feature we plan on using that will allow customization and fine-tuning of profiling. There is JRockit and Java 7 support as well.

More Articles...

Page 1 of 4

Start
Prev
1

Attivio on LinkedIn

 

blue-rss-icon.png

Enter your email address:

 

Articles by Date

Recent Posts

Thinking Like a Tester

As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our early...
Read More...

What AIE and unified information access mean for developers

There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide...
Read More...

The (Real) Semantic Web Requires Machine Learning

The (Real) Semantic Web Requires Machine Learning
We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as: • A large set of subject-verb-object triples, where...
Read More...

More on Triples and Graphs

More on Triples and Graphs
One of the follow-up questions I've received regarding the post on Triples...
Read More...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8