Wednesday, November 3, 2010

JVM: Solving OutOfMemoryError with less Memory

At work we have 6 web applications (WAR) deployed in Glassfish v2.

In production we experienced sporadic java.lang.OutOfMemoryError: Java heap space under high load. We where sure that we did not have a classic Java memory leak since the used HEAP space decreased after some time and returned to normal. We suspected that the problem was related to our use of EHCache (which stores cached objects in HEAP space).

(Note: This blog post is a summary of two days of research - we tried many things and many numbers - on several more or less identical servers - so the numbers and values in this post is approximately correct - but you get the point)

The JVM, and therefor GlassFish, was running with max 768 MB of HEAP space and 385 MB of PermGEN space.

To reduce the possibility of getting OutOfMemoryError until we had worked out the issue, we decided to increase the maximum HEAP space the JVM could allocate (-Xmx).

GlassFish runs on a 4 core 32 bit server with Windows Server 2003 with 8 GB of RAM.

We increased the the max HEAP memory size to 1 GB (-Xmx=1024M). We started GlassFish with no applications deployed. Then deployed one application after the other - 6 WARs. All applications where deployed without problem and our Apps run fine. After some time the JVM suddenly died.

We found a crash JVM crashdump. We didn't read it to carefully, but it talked about OutOfMemoryError. We did some more research and found out that it had died before the HEAP space had reached 1GB. We thought a solution was to set the initial HEAP size that the JVM should initialize. We told the JVM to initialize all the HEAP at startup (-Xms=1024M).

So now we had 1024MB HEAP and 385MB PermGEN which is a total of 1409MB.

When we then again started GlassFish (with no apps deployed), The JVM and GlassFish started up just fine. So we started to deploy applications - one by one. When GlassFish was in the middle of deploying the second application the JVM died.. So by allocating more memory up front, the JVM died with OutOfMemoryError earlier..

After a lot of research and reading this great post: http://www.codingthearchitecture.com/2008/01/14/jvm_lies_the_outofmemory_myth.html, this is how we concluded:

We took a closer look at the JVM crash dump:

java.lang.OutOfMemoryError: requested 884680 bytes for Chunk::new. Out of swap space?

It also says that the JVM crashed in this thread:

0x5be76800 JavaThread "CompilerThread1" daemon [_thread_in_native, id=6764, stack(0x5c1a0000,0x5c1f0000)]

We had configured the JVM to use a lot of memory for HEAP and PermGen. A Windows process can use max 2 GB total. The internals of the JVM (e.g its JIT compiler) needs its own memory, so do the DLLs loaded. Since so much of those 2GB was already allocated for HEAP/Permgen, windows said NO when the JVM asked for more memory inside CompilerThread1. When this happened, the JVM crashed with java.lang.OutOfMemoryError?: requested 884680 bytes for Chunk::new. Out of swap space?

Solution:

Tell the JVM to use LESS memory..

Tuesday, November 2, 2010

maven deptools plugin 1.1 released

Version 1.1 of maven deptools plugin now supports maven 3 and the "maven enforcer plugin"


maven deptools plugin "...gives build error if maven resolves transient dependencies in such a way that the none-newest version is chosen."

This plugin has turned out to be very useful in the company I work for.

The plugin can be found here: http://github.com/mbknor/deptools

Thursday, March 18, 2010

maven deptools plugin - RC1 released

I've just released RC1 of a maven deptools plugin. (This is beta but I need real feedback)

"...Maven 2 plugin which gives build error if maven resolves transient dependencies in such a way that the none-newest version is chosen

At work we have all kinds of different dependencies problems related to transient dependencies...


More info here: http://wiki.github.com/mbknor/deptools/

Wednesday, February 3, 2010

Taking control over legacy code


The problem

Some years ago I faced a situation where a company's main public webapplication ran on a legacy mainframe (OS/390) webserver. It was written in REXX.

The developers had to do the actual coding in a terminal window (3270).

If a developer wanted to code in a regular Text Editor (TextPad), he had to first download the sourcefile via FTP, edit it, then FTP the new version back up to the mainframe to test it. To compile the uploaded sourcefile he had to use a terminal window and navigate to the file (dataset), then disable and enable it to force a recompile of the file.

One other major problem with the FTP-solution was that different developers did overwrite each others changes when they uploaded their new files.

Since the source was not managed by any source control system, it was basically impossible to figure out who had change the code and why.

As you can see this was not an ideal situation.

The ideal solution

The ideal solution is for sure to rewrite the application from scratch with modern technology, but this was not an option for the Company. They felt that they had invested too much in the existing code and that it would take too long time to rewrite it. Not to mention that they would have been unable to create new stuff while porting the old stuff.

Taking control over the legacy code

Since it was not an option to rewrite the application we needed to make it as convenient as possible to work with it.

This is what I ended up doing:

We downloaded all the code and added it to SubVersion. Then we "defined" that that the version stored in SubVersion was the "master (correct) version" of the code, not the version stored on the mainframe.

Then I wrote a deployment tool in Java that automated the deployment-process.

Since we could not prevent other developers (in other teams) to directly edit the code on the mainframe we had to have a mechanism to prevent us from silently overwrite their code changes. This was a critical feature when selling the "idea" to my leader.

To detect this the deployment tool automatically added some metadata to the sourcefile when uploading it to the mainframe. This metadata contained a hash-value (crc, fingerprint) representing the exact state of the sourcecode when uploaded. This made it possible to validate the existing mainframe version of the file before overwriting it with new versions.

The metadata was generated inside a comment (/* metadata */) since the altered source file still needed to compile.

The deployment tool could also be used to compile the source remote on the mainframe. This was done by using a linux component called s3270 which lets you script the terminal session. Since we needed to run the deployment tool on windows, the deployment tool ran s3270 using cygwin.

Since the upload- and compileprocess was slow, we wanted to prevent us from uploading and compiling unchanged files.

To fix this we also included subversion url- and revision-info in the metadata. This made it possible to resolve which files had changed and only upload- and compile those.

Conclution

By taking control over the source (adding it to SubVersion) and automating the deploy- and compile-process we ended up with an much better development environment.

This ended up only being the first step away from the mainframe. Today this old application still lives, only that it runs inside an Emulator running in Tomcat on a Windows Server.

I hope this blog post inspires someone.

Thursday, January 28, 2010

Replacing Weblogic with Glassfish

At work we're working on a rather big Java integration project. It consist of several applications, some exposing services over REST, others consuming REST services (JSF, Spring WebFlow).

In our development environment we run jetty and/or tomcat. Someone decided that we had to use Weblogic in Test- and Production-environment.

Here is a list of some of the problems Weblogic caused us:
  • It starts up/restarts extremely slow
  • The admin console is really slow
  • It takes forever to deploy to it
  • we had problems getting multiple datasources to different DB2-environments (OS390, AS400, Windows) to work at the same time.
  • Our applications ran slower than expected
  • We experienced strange problems related to ajax and richfaces which where impossible to trace down, since the problems where different on different Weblogic instances.
Today we managed to persuade our project leader that we should replace Weblogic with Glassfish v2.

Now everything is running much faster without problems on Glassfish in our Test environment. I really like what I have seen of Glassfish so far.