Transformation and verification of file-processing programs

Record-processing, stream-processing, and file-processing programs are common in domains such as image-file processing and manipulation, packet processing in networking applications, and batch processing in the enterprise domain. Our overall goal is to develop static analysis based tools for (a) improving understandability and reuse of existing file-processing programs, by teasing apart different cohesive units of functionality from such programs, and (b) verification of the input acceptance characteristics of file-processing programs relative to a specification of the file format.

Our first result in this direction is a technique, described in this publication, for identifying loops within file-processing programs that process logically related sequences of records in input files. Our subsequent publication is about a technique to precisely obtain slices of general looping programs. A slice is the portion of the program that is responsible for producing output records that satisfy a given condition.

Matching service descriptions with existing code

The objective of this project is to develop tools to match a service in a domain model with fragments of code in a given existing application that potentially implement this service. The applications of this work include (a) identifying portions of code in the application that satisfy a business requirement, (b) transforming an existing monolithic application into a service-oriented architecture (SOA), and (c) migrating custom business logic from a custom application to a standard package.

We describe an initial semi-automated technique for this problem in this publication.

A subsequent automated technique for this problem is described in this publication. We provide here data from two case studies that we performed using our technique, in which we matched two real Java applications with two real domain models.

A third, more precise automated technique for this problem is described in this publication. We provide here data from two similar case studies that we performed using this technique.

Memory optimizations in Java programs

Java programs typically create large numbers of duplicated (i.e., content-wise matching) objects. If these objects are immutable, then memory can be saved by modifying programs to throw away duplicated objects right after they are created, and instead reuse existing objects. We have devised an approach and a tool to identify allocation sites that create large numbers of duplicated objects, and to estimate the memory savings to be obtained by inserting de-duplication code right after these sites. Our approach is described in this paper. Detailed data from the experiments described in the paper are available here.

Null dereference verification for Java programs

We have a developed a verification approach and a tool for sound, demand-driven, scalable null-dereference analysis for Java programs, via a backwards abstract-interpretation-based weakest-preconditions analysis. Our approach, as well empirical results obtained by applying the approach on real Java programs, is described in this publication.

Our open-source tool based on this approach is available for download here.

Detecting clones in source code

This paper describes our approach to detecting code clones using slicing. This paper describes our approach to eliminating clones by factoring out duplicated code into procedures.

This is our software for detecting clones in C code.