I got the chance to get a bit more familiar with Eclipse SMILA and started development of a configuration toolkit with Xtext. Target is to develop a prototype which enables an easier setup of a valid SMILA configuration by use of a textual DSL with all the benefits which you get from using such a DSL, like semantic validation, content assist etc. SMILA is configured by a bunch of XML files conforming to defined XSDs. Sometimes information is spread around different configuration files, and misconfiguration leads to runtime errors or even to no error at all.
But lets start with SMILA first…
What is SMILA about?
SMILA is an extensible framework for building search solutions to access unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. Using the framework as their basis will enable developers to concentrate on the creation of higher value solutions, like semantic driven applications etc.
To give a rough imagination: You can configure different kinds of agents which search media for information (e.g. files, web pages etc.), and relevant data is extracted from those resources and published to some queue (ActiveMQ is used by default). Listeners react on entries and execute BPEL processes to process the information. Final goal is to index the data in stores, which can be searched by clients. Lucene is used by default as indexing engine.
Getting SMILA running
The SMILA project provides distributions for Windows and Linux. Since I’m working on a Mac I could not use them. So I followed the development guideline to setup a dev environment. In my fresh workspace I checked out first the trunk, but switched back to tag 0.5-M3 to have the same state as the distributions.
After finishing the checkout I finally was able to follow again the good 5 Minutes to Success tutorial. But don’t expect you can finish the tutorial in 5 minutes ;-) One word to mention: SMILA requires Java 6, and my development IDE is started by default with Java 5. So I needed to configure Java 6 for my target platform and also had to add the RCP delta pack, since 1.6 requires 64 bit libraries on Mac.
Contained in the sources is a example configuration project SMILA.application, which can be started by a launch configuration in the SMILA.launch project. Here is a small screenshot of the SMILA.application project structure.
The application contains several XML configuration files and their XSDs in a structure which reflects the plugins that are used. The tutorial explains small changes to the configuration and which files have to be changed, but for setting up a brand new project it might become more complicated if one is not familiar with the structure.
Starting the prototype
First I have to make clear that the following is early development state. I plan to extend the functionality when getting some time again. Since I’m involved often at customers I cannot tell how fast I progress now. At least I get the possibility to spend some days in the near future on it, so I’m expecting to have something useful in the near future.
I created the Xtext projects for the SMILA DSL and added some first rules. After running the MWE workflow Xtext generated the project infrastructure.
SMILA project wizard
When looking at the example project I recognized that a normal project setup would require copy/paste of an existing one and changing some files. Therefore extending the generated project wizard seemed to be a good starting point. The extended wizard now lets you set up a SMILA application with all the required files.
After finishing the wizard a project in the workspace is created. All static resources (esp. project structure and XSDs) are copied from the UI plugin into the new project and as a start some files are generated using Xpand with the information filled in into the wizard.
The wizard generated from the SimpleProjectWizardFragment was not so extensible for my case as it should be, so I had to copy some code from the generated classes and provide a manual implementation with some copied code. I think the fragment could be changed easily to improve and I will set up a change request on that later and post it to bugzilla.
At the moment the project wizard generates the following artifacts from the information provided on the pages:
- SMILA DSL model file
- Launch configuration
- Tomcat server config
Here you can see the project the wizard created:
The first configuration I targeted to describe with the DSL is the configuration of the FileSystemCrawler and FeedAgent. This is pretty straight forward, nearly 1:1 mapping. Here’s an excerpt from the appropriate configuration file “feed.xml” shipped with the example:
<DataSourceConnectionConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.crawler.filesystem/schemas/FileSystemDataSourceConnectionConfigSchema.xsd" > <DataSourceID>file</DataSourceID> <SchemaID>org.eclipse.smila.connectivity.framework.crawler.filesystem</SchemaID> <DataConnectionID> <Crawler>FileSystemCrawler</Crawler> </DataConnectionID> ... <Attributes> <Attribute Type="Date" Name="LastModifiedDate" HashAttribute="true"> <FileAttributes>LastModifiedDate</FileAttributes> </Attribute> <Attribute Type="String" Name="Filename"> <FileAttributes>Name</FileAttributes> </Attribute> <Attribute Type="String" Name="Path" KeyAttribute="true"> <FileAttributes>Path</FileAttributes> </Attribute> <Attribute Type="String" Name="Content" Attachment="true"> <FileAttributes>Content</FileAttributes> </Attribute> <Attribute Type="String" Name="Extension"> <FileAttributes>FileExtension</FileAttributes> </Attribute> <Attribute Type="String" Name="Size"> <FileAttributes>Size</FileAttributes> </Attribute> </Attributes> <Process> <BaseDir>/Users/thoms/temp</BaseDir> <Filter Recursive="true" CaseSensitive="false"> <Include Name="*.txt"/> <Include Name="*.htm"/> <Include Name="*.html"/> <Include Name="*.xml"/> </Filter> </Process> </DataSourceConnectionConfig>
And here the same situation described in the DSL (the box with “caseSensitive” is there because I pressed CTRL+SPACE after the keyword “recursive” and the content assist proposes that “caseSensitive” could be entered here):
I decided that the record attribute name (in XML the Attribute#name property) can be omitted in the case that it matches the File attribute name, which I think often will be the case. Only if the names don’t match a mapping has to be done. Here the example is
FileExtension -> Extension
“FileExtension” is the File attribute name and “Extension” is the name of the Record attribute.
Flags are added in brackets and are optional (key, hash, attachment).
Since Xtext Helios M4 a builder infrastructure was added to Xtext. I leveraged this infrastructure to generate the resulting configuration files on-the-fly when you save the DSL model. So if you, for example, add an “Include” line to your model the respective crawler config is automatically changed. Even better: When you rename your crawler, let’s say from “file” to “userdir_scanner” the configuration file “file.xml” gets deleted from your workspace and is replaced by “userdir_scanner.xml”!
This is just the start of this project and many things have to be done now. I plan to use this project also as a good example for using the Xtext features properly, of course open sourced. Also I have to learn more about SMILA and the appropriate configuration. I’m in exchange with Sebastian Voigt from brox, co-lead of the SMILA project. With his help I think this project can be a valuable contribution to SMILA later.
Here are some features that I want to add to this project:
- Complete language for covering the tutorial
In a first step at least everything that makes up the “5 Minutes to Success” tutorial should be possible to describe in the DSL and the configuration files should be generated from that description.
- Integrate existing configuration files
I saw that some configuration files might not be worth to be mapped to a DSL and might be better left just in XML for editing. One example is the QueueWorkerConnectionConfig.xml, where the available brokers are defined. Of course from the DSL I want to refer to brokers at several places and I need to get them from this file. My first idea is here to use generate EMF models from the XSDs using the XSD importer. That makes it possible to reference types from that schema directly in the DSL. It should be like normal integration of existing Ecore models.
One of the major benefits that the DSL can provide is the ease to add validation on the models. Especially consistency constraints make sense, for example to prove that every queue where records are routed to must have a listener that processes that records further.
- JDT integration
In the BPEL configuration files services are invoked. The services are qualified by their class names and parameters that can be passed correspond to properties. At Eclipse Summit 2009 Sven Efftinge and Sebastian Zarnekow showed a nice integration of Xtext with JDT to add content assist and validation on qualified Java classes.
- Product build
The complete bundle, SMILA and the Configuration toolkit, should be available as a ready-to-use product. I’m planning to use Maven Tycho for setting up the build process.
These are just some few examples of what I can imagine for the future. I hope that I find or get some time to realize this.