Karsten's Blog

February 26, 2010

Prototyping a Configuration Toolkit for Eclipse SMILA with Xtext

Filed under: Eclipse, MDSD — Tags: , , , , , — kthoms @ 2:06 PM

I got the chance to get a bit more familiar with Eclipse SMILA and started development of a configuration toolkit with Xtext. Target is to develop a prototype which enables an easier setup of a valid SMILA configuration by use of a textual DSL with all the benefits which you get from using such a DSL, like semantic validation, content assist etc. SMILA is configured by a bunch of XML files conforming to defined XSDs. Sometimes information is spread around different configuration files, and misconfiguration leads to runtime errors or even to no error at all.

But lets start with SMILA first…

What is SMILA about?

SMILA is an extensible framework for building search solutions to access unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. Using the framework as their basis will enable developers to concentrate on the creation of higher value solutions, like semantic driven applications etc.

To give a rough imagination: You can configure different kinds of agents which search media for information (e.g. files, web pages etc.), and relevant data is extracted from those resources and published to some queue (ActiveMQ is used by default). Listeners react on entries and execute BPEL processes to process the information. Final goal is to index the data in stores, which can be searched by clients. Lucene is used by default as indexing engine.

Getting SMILA running

The SMILA project provides distributions for Windows and Linux. Since I’m working on a Mac I could not use them. So I followed the development guideline to setup a dev environment. In my fresh workspace I checked out first the trunk, but switched back to tag 0.5-M3 to have the same state as the distributions.

After finishing the checkout I finally was able to follow again the good 5 Minutes to Success tutorial. But don’t expect you can finish the tutorial in 5 minutes ;-) One word to mention: SMILA requires Java 6, and my development IDE is started by default with Java 5. So I needed to configure Java 6 for my target platform and also had to add the RCP delta pack, since 1.6 requires 64 bit libraries on Mac.

Contained in the sources is a example configuration project SMILA.application, which can be started by a launch configuration in the SMILA.launch project. Here is a small screenshot of the SMILA.application project structure.

The application contains several XML configuration files and their XSDs in a structure which reflects the plugins that are used. The tutorial explains small changes to the configuration and which files have to be changed, but for setting up a brand new project it might become more complicated if one is not familiar with the structure.

Starting the prototype

First I have to make clear that the following is early development state. I plan to extend the functionality when getting some time again. Since I’m involved often at customers I cannot tell how fast I progress now. At least I get the possibility to spend some days in the near future on it, so I’m expecting to have something useful in the near future.

I created the Xtext projects for the SMILA DSL and added some first rules. After running the MWE workflow Xtext generated the project infrastructure.

SMILA project wizard

When looking at the example project I recognized that a normal project setup would require copy/paste of an existing one and changing some files. Therefore extending the generated project wizard seemed to be a good starting point. The extended wizard now lets you set up a SMILA application with all the required files.

After finishing the wizard a project in the workspace is created. All static resources (esp. project structure and XSDs) are copied from the UI plugin into the new project and as a start some files are generated using Xpand with the information filled in into the wizard.

The wizard generated from the SimpleProjectWizardFragment was not so extensible for my case as it should be, so I had to copy some code from the generated classes and provide a manual implementation with some copied code. I think the fragment could be changed easily to improve and I will set up a change request on that later and post it to bugzilla.

At the moment the project wizard generates the following artifacts from the information provided on the pages:

  • SMILA DSL model file
  • log4j.properties
  • Launch configuration
  • Tomcat server config

Here you can see the project the wizard created:

Crawler configuration

The first configuration I targeted to describe with the DSL is the configuration of the FileSystemCrawler and FeedAgent. This is pretty straight forward, nearly 1:1 mapping. Here’s an excerpt from the appropriate configuration file “feed.xml” shipped with the example:

<DataSourceConnectionConfig
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.crawler.filesystem/schemas/FileSystemDataSourceConnectionConfigSchema.xsd"
>
  <DataSourceID>file</DataSourceID>
  <SchemaID>org.eclipse.smila.connectivity.framework.crawler.filesystem</SchemaID>
  <DataConnectionID>
    <Crawler>FileSystemCrawler</Crawler>
  </DataConnectionID>
  ...
  <Attributes>
    <Attribute Type="Date" Name="LastModifiedDate" HashAttribute="true">
      <FileAttributes>LastModifiedDate</FileAttributes>
    </Attribute>
    <Attribute Type="String" Name="Filename">
      <FileAttributes>Name</FileAttributes>
    </Attribute>
    <Attribute Type="String" Name="Path" KeyAttribute="true">
      <FileAttributes>Path</FileAttributes>
    </Attribute>
    <Attribute Type="String" Name="Content" Attachment="true">
      <FileAttributes>Content</FileAttributes>
    </Attribute>
    <Attribute Type="String" Name="Extension">
      <FileAttributes>FileExtension</FileAttributes>
    </Attribute>
    <Attribute Type="String" Name="Size">
      <FileAttributes>Size</FileAttributes>
    </Attribute>
  </Attributes>
  <Process>
    <BaseDir>/Users/thoms/temp</BaseDir>
    <Filter Recursive="true" CaseSensitive="false">
      <Include Name="*.txt"/>
      <Include Name="*.htm"/>
      <Include Name="*.html"/>
      <Include Name="*.xml"/>
    </Filter>
  </Process>
</DataSourceConnectionConfig>

And here the same situation described in the DSL (the box with “caseSensitive” is there because I pressed CTRL+SPACE after the keyword “recursive” and the content assist proposes that “caseSensitive” could be entered here):

I decided that the record attribute name (in XML the Attribute#name property) can be omitted in the case that it matches the File attribute name, which I think often will be the case. Only if the names don’t match a mapping has to be done. Here the example is

FileExtension -> Extension

“FileExtension” is the File attribute name and “Extension” is the name of the Record attribute.

Flags are added in brackets and are optional (key, hash, attachment).

Builder Integration

Since Xtext Helios M4 a builder infrastructure was added to Xtext. I leveraged this infrastructure to generate the resulting configuration files on-the-fly when you save the DSL model. So if you, for example, add an “Include” line to your model the respective crawler config is automatically changed. Even better: When you rename your crawler, let’s say from “file” to “userdir_scanner” the configuration file “file.xml” gets deleted from your workspace and is replaced by “userdir_scanner.xml”!

After renaming the FileSystemCrawler:

Outlook

This is just the start of this project and many things have to be done now. I plan to use this project also as a good example for using the Xtext features properly, of course open sourced. Also I have to learn more about SMILA and the appropriate configuration. I’m in exchange with Sebastian Voigt from brox, co-lead of the SMILA project. With his help I think this project can be a valuable contribution to SMILA later.

Here are some features that I want to add to this project:

  • Complete language for covering the tutorial
    In a first step at least everything that makes up the “5 Minutes to Success” tutorial should be possible to describe in the DSL and the configuration files should be generated from that description.
  • Integrate existing configuration files
    I saw that some configuration files might not be worth to be mapped to a DSL and might be better left just in XML for editing. One example is the QueueWorkerConnectionConfig.xml, where the available brokers are defined. Of course from the DSL I want to refer to brokers at several places and I need to get them from this file. My first idea is here to use generate EMF models from the XSDs using the XSD importer. That makes it possible to reference types from that schema directly in the DSL. It should be like normal integration of existing Ecore models.
  • Validation
    One of the major benefits that the DSL can provide is the ease to add validation on the models. Especially consistency constraints make sense, for example to prove that every queue where records are routed to must have a listener that processes that records further.
  • JDT integration
    In the BPEL configuration files services are invoked. The services are qualified by their class names and parameters that can be passed correspond to properties. At Eclipse Summit 2009 Sven Efftinge and Sebastian Zarnekow showed a nice integration of Xtext with JDT to add content assist and validation on qualified Java classes.
  • Product build
    The complete bundle, SMILA and the Configuration toolkit, should be available as a ready-to-use product. I’m planning to use Maven Tycho for setting up the build process.

These are just some few examples of what I can imagine for the future. I hope that I find or get some time to realize this.

February 20, 2010

First experience with SSD in MacBook Pro

Filed under: Mac — Tags: , , — kthoms @ 11:57 AM

The weakest link in the hardware chain of a notebook is always the harddisk. What do you really have from dual core processors and gigabytes of RAM when most of the time you have to wait for I/O? Over the time my notebook got steadily slower and the harddisk was running and running. I tried cleaning up the harddisk and followed several advises, all with just small success. It must be remembered that I’m primarily working on Java software development with Eclipse, where it is natural that you have thousands of small files to load and to write. These files cannot be read in one flow, and handling lots of small files is much slower than handling larger files. Now the notebook got so slow that I was really badly annoyed and finally decided to ask our admin whether I could get a SSD for my Mac. I hoped this will reduce the performance bottleneck and help me work more efficient again.

Today our supplier got this disc and I immediately went to them to exchange the disk. Just made a backup and some benchmarks before the change. Now I have all working again and getting my first impressions. What I just can say is: Go get an SSD! The difference is amazing!

Facts

Technical Details

  • MacBook Pro (2007 Series)
  • 2.33 GHz Intel Core 2 Duo
  • 4 GB 667 MHz DDR2 SDRAM
  • Mac OS X 10.5.8

Old Disk:

  • Hitachi HTS5416116J9SA00, SATA
  • 5400 RPM
  • 160 GB
  • Apple HDD Firmware 2006
  • Manufactured JAN-07

New Disk:

  • Corsair CMFSSD-128GBG2D
  • 128 GB

Benchmark with AJA System Test

Write Performance: 32.6 -> 87.6 MB/s   ( x 2.68 )
Read Performance: 38.5 -> 126.4 MB/s   ( x 3.28 )

Benchmark with Xbench

Disk Test Hitachi SATA 5400 RPM 160GB Corsair SSD 128 GB Factor
Sequential Uncached Write 4K blocks 29.69 MB/sec 50.76 MB/sec 1.71
Uncached Write 256K blocks 37.20 MB/sec 88.88 MB/sec 2.39
Uncached Read 4K blocks 18.13 MB/sec 21.83 MB/sec 1.20
Uncached Read 256K blocks 38.30 MB/sec 99.32 MB/sec 2.59
Random Uncached Write 4K blocks 1.05 MB/sec 11.29 MB/sec 10.75
Uncached Write 256K blocks 20.48 MB/sec 30.76 MB/sec 1.50
Uncached Read 4K blocks 0.47 MB/sec 12.54 MB/sec 26.68
Uncached Read 256K blocks 17.24 MB/sec 66.08 MB/sec 3.83

The complete results can be downloaded and viewed with Xbench: Before After.

Conclusion

The benchmarks clearly show that I/O has boosted performance by factors. Really extraordinary is the comparison of writing (10x) and reading (26x) randomly small files, which comes close to the behavior that you have when working with Eclipse and larger projects. But even in other categories it is clear that the SSD beats the old HD clearly.

What the benchmarks already show I can share with my subjective feeling when working with the new hardware. It is now as if I would have a completely other system. Startup of the system and applications is now really fast. To give you an imagination: Open Office (yes, I use it from time to time, don’t sent me comments on that!) starts now up in 2 seconds! I did not measure it before, but guess it would be about 10-15 seconds before. iTunes takes less than 2 seconds to start. And Eclipse starts up I would say about 4 times faster.

When I think about how many time got lost for me just for waiting for I/O before… And finally it prolonged the time that I will use my MacBook, since now again I don’t have the feeling that I need to upgrade to a newer generation. Otherwise I think this year I would have asked itemis to buy a new one.

In a few years I think most notebooks will have SSDs. Besides they are much faster they also consume less energy and thus don’t heat up the notebook that much, are more robust, and are silent. It’s just that SSDs are rather expensive now and have a limited lifetime. Anyway, I believe that the SSD is its investment really worth!

February 15, 2010

Lazy evaluation in Xpand

Filed under: Eclipse, openArchitectureWare, Xpand — kthoms @ 10:36 AM

Usually code generation is a purely sequential process. Since the model does not change during the generation of an artifact all content can be computed in the template where it is needed for the output. But sometimes there is the wish to defer the output to a later point of time during the generation of an artifact.

The typical use case for this is import statements. If for example you want to generate a Java class and want to import all used types then the following alternatives are given:

  • Compute the types that the about-to-be-generated class will use
  • Print out all type names full qualified whenever needed and organize the imports with a postprocessor. For Java code generation the Hybridlabs Beautifier is used widely.

However, both approaches do not seamlessly solve the problem. What really is needed is some kind of lazy evaluation in Xpand. Therefore Jos Warmer wrote a feature proposal once. The feature that he proposed for Xpand is called Insertion Point. The idea was to mark some point in the Xpand template where some code will be inserted at a later point of time. Code is evaluated into this insertion point when the content can be derived easier.

From this feature proposal Feature Request #261607 was created in Eclipse Bugzilla system. In this bug entry, and also offline, a lively discussion arose in the team. The challenges for this feature request were:

  • The Xpand language has a rather small set of keywords. Adding this feature, which is used in some cases only, should not introduce too much changes to the Xpand language
  • An implementation should not break existing code
  • The solution should enable decent evaluation, e.g. avoiding duplicates, sorting

The latest proposal just introduces just one new keyword in Xpand, but requires an implementation pattern with Xtend function. The proposal is to add a keyword ONFILECLOSE to the EXPAND statement. By calling EXPAND with ONFILECLOSE the evaluation of the EXPAND statement is deferred until the FILE statement is closed. Any state that is used by the called definition is computed during the FILE evaluation. The EXPAND statement has to be evaluated with the execution context which is active when reaching the EXPAND statement.

Let’s see this by example. As example we take the project that Xpand’s project wizard creates, with small changes. The entities and types have now an additional ‘packageName’ attribute, and the both entities have been assigned different packages ‘entities1′ and ‘entities2′. Additionally entity ‘Person’ has a feature ‘birthday’ of type Date, which is mapped to java.util.Date. Therefore class Person.java has to import entities2.Address and java.util.Date. The used types should be collected when rendering instance variables and accessor methods, but inserted earlier in the code.

First take a look at the template code:


As you can see the definition ImportBlock is called for each Type instance (e.g. Entity and DataType instances) in the collection returned by the Xtend function UsedType(). In an alternative approach this function would be responsible to compute all the types that will be used by this Entity instance. But for the new implementation it just creates an empty list and returns that one. The implementation of the UsedType() function (in file GeneratorExtensions.ext) is:

create List[Type] UsedType (Entity e) : (List[Type]) {};

So at the time the Xpand engine reaches EXPAND ImportBlock… the list would be empty. Now note the ONFILECLOSE keyword at the end of the EXPAND statement. This one tells the engine now that the evaluation of this Xpand statement should be deferred until the file is about to be closed. During evaluation of the template code, in the FOREACH loop, another extension function addUsedType(f.type) is called. This one adds the type of the current processed feature to the collection returned by UsedTypes(). Therfore it is important that the UsedType() function uses the create keyword, since we want to create the collection on first access for one Entity and return the same instance when UsedTypes() is called later for that Entity again.

The function addUsedType() is used in the template like this:

«addUsedType(f.type)»

Xpand would print out the result of the function as string, but we don’t want to produce any output by calling the function. Therefore we assure that the function adds the type to the collection and returns an empty string:

addUsedType (Entity e, Type t) : UsedType(e).add(t) -> "";

During evaluation it could be that types were used multiple times within the template, but we want just one import statement per type. Further we don’t need imports for types from the java.lang package (here the packageName information for the DataType instances String and Integer is null) or for types that are in the same package like the entity. Therefore we transform the UsedType() collection before finally invoking the ImportBlock definition.

«EXPAND ImportBlock FOREACH UsedType()
 .select(t|t.packageName!=null && t.packageName!=this.packageName).toSet()
 .sortBy(t|t.qualifiedName()) ONFILECLOSE»

Conclusion

Lazy evaluation allows code generated with Xpand in an non-sequential matter. The proposed solution using ONFILECLOSE solves the desired Insertion Point feature by adding just one additional keyword to the Xpand language. It does not break existing template code. When accepted this code will be contributed to Xpand 0.8.0-M6 soon. For those who want to test it I have created a feature patch for the org.eclipse.xpand feature. The example project with the sources listed in this article can be downloaded here.

Feedback is welcome and is best placed in the bugzilla feature request.

The Silver is the New Black Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 375 other followers