OGSA-DAI Introductory Practical 1: A Simple Workflow

OGSA-DAI is a middleware product that allows data resources, such as relational or XML databases, to be accessed via web services. An OGSA-DAI web service allows data to be queried, updated, transformed and delivered. OGSA-DAI web services can be used to provide web services that offer data integration services to clients.

This tutorial will introduce you to using OGSA-DAI to access Files and Databases.

This OGSA-DAI tutorial has multiple aims:

    * to show how OGSA-DAI enables easy access to remote data resources (where direct access to the data may not be possible).
    * to show how to use the OGSA-DAI client toolkit to access a data source and perform multiple actions on the data before retrieving the results.

The OGSA-DAI client toolkit comes with several pre-built tools and the first part of this tutorial will make use of these tools before focussing on using the API for the toolkit. We will be using an OGSA-DAI server running on tc07.nesc.ed.ac.uk. These have version 3.0 installed .

You can download the files you need from here.


1. Setup the Environment


Using GSI-SSH, connect to the machine tc07.nesc.ed.ac.uk on the training cluster. There should be a Java GSISSH client on your desktop. Just double click the icon, then select open, and enter the machine name.

Next we need to setup the environment we will be using. This command sets up the environment for Java, Globus 4.0.5 which contains the OGSA-DAI implementation, and the environment variables for Globus.

module load java ogsadai_wsrf


2. Testing OGSA-DAI


These next few commands will test your OGSA-DAI environment.

We will be using the XMLDB client, this is a simple command-line client for listing collections and resources in XMLDB data resources and running XPath queries and displaying the results.

The first command executes the ListResources method, which provides interfaces to OGSA-DAI WSRF-compliant data services. It returns a list of the IDs of the data service resources currently known to this service, in particular we are after the data resource ID, SwissProt_Resource. The -u command provides the Services base URL, where OGSA-DAI sits listening on.

java uk.org.ogsadai.client.toolkit.example.XMLDBClient -u http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai -c listResources -d SwissProt_Resource

You should get the following output.

[jlyons@tc07 practical]$ java uk.org.ogsadai.client.toolkit.example.XMLDBClient -u http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai -c listResources -d SwissProt_Resource

DRER ID: DataRequestExecutionResource

Data Resource ID: SwissProt_Resource

Base Services URL: http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai

Command: listResources

DRER ID: DataRequestExecutionResource

Data Resource ID: SwissProt_Resource

Base Services URL: http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai

Command: listResources

uk.org.ogsadai.resource.request.status.COMPLETED

shortenedSprot_50000.xml

Next we will execute a SQL query on the SwissProt_Resource data resource. SwissProt_Resource is a read-only relational resource using the postgresql server on another machine on the training cluster.

The -q command contains the simple SQL query, which in this case just lists all the entries from the Table SwissProt with id less than 10.

This command should be types all on one line.

java uk.org.ogsadai.client.toolkit.example.SQLClient -u http://tc07:8080/wsrf/services/dai/ -q "SELECT * FROM swissprot limit 10" -d SwissProt_Metadata

Execute this command and you should get the following list.

[jlyons@tc07 practical]$ java uk.org.ogsadai.client.toolkit.example.SQLClient -u http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai/ -q "SELECT * FROM swissprot limit 10" -d SwissProt_Metadata

DRER ID: DataRequestExecutionResource

Data Resource ID: SwissProt_Metadata

Base Services URL: http://tc07:8080/wsrf/services/dai/

SQLQuery: SELECT * FROM swissprot limit 10

uk.org.ogsadai.resource.request.status.COMPLETED

| accession | filename                                           | name                                            |

| P17407    | fasta_files/55eb8cabb0fbdef7d6e62f3d50c48c38.fasta | P17407|21KD_DAUCA|21kDaproteinprecursor                                         |

| P68255    | fasta_files/27b269b27a568164b31fe58c219deafc.fasta | P68255|1433T_RAT|14-3-3proteintheta                                            |

| Q9U408    | fasta_files/e76d6605ba735a7124ee8fad9356f722.fasta | Q9U408|14331_ECHGR|14-3-3proteinhomolog1                                        |

| Q7T356    | fasta_files/384219e5fe652256b8e1bb4d0e170675.fasta | Q7T356|143B2_BRARE|14-3-3proteinbeta/alpha-2                                    |

| P62261    | fasta_files/98301f7c673d7a61694ec2beace5e3b9.fasta | P62261|1433E_BOVIN|14-3-3proteinepsilon                                         |

| Q29940    | fasta_files/61f1aa94349b379c4626659fcedf29f7.fasta | Q29940|1B59_HUMAN|HLAclassIhistocompatibilityantigen,B-59alphachainprecursor    |

| P30382    | fasta_files/341860099d8c18daa8b915a3cef5b75e.fasta | P30382|1B04_GORGO|ClassIhistocompatibilityantigen,GOGO-B0201alphachainprecursor |

| Q06967    | fasta_files/82c75bbf0c70b919776d48950f1e41cf.fasta | Q06967|14336_ORYSA|14-3-3-likeproteinGF14-F                                     |

| P16909    | fasta_files/f99ced15875bfcbb0ffa0ea54d0f501b.fasta | P16909|18C_DROME|Histone-likeprotein18C                                         |

| Q00257    | fasta_files/81b9cdd5ffd0c174505c8cec11904965.fasta | Q00257|1A12_CUCMA|1-aminocyclopropane-1-carboxylatesynthaseCMA101               |

Success. We see before the data that the client display a bunch of logical information. It tells us the Data Resource we are accessing, SwissProt_Metadata, the Base servive URL where OGSA-DAI is hosted, and the actual SQL query. We next see that it returns with Status completed, and the data of the first 9 entries in the SwissProt_Metadata table.


3. A simple workflow

In the following exercise we will look at how to implement and execute a workflow with the OGSA-DAI client toolkit. The scenario below shows a simple workflow that we are going to implement. Data is read from a file with the ReadFromFile activity. The data is then piped to the Tee activity which produces two copies which are then piped to two different delivery activities. The first delivery method writes an email to you. The second delivery method writes the data to a remote FTP server.

So now we have the workflow, lets look at adding it to the file. You will need to use a unix text editor to edit the file. You can also use a graphical editor, or a text editor. Just choose ONE of the ways presented below.

  • You can use an ordinary text editor like nano, which is quite easy to use. Just type the following command. nano practical/Scenario1.java You can also use vi if you wish.
  • Or you can use a graphical editor, which supports the mouse and menus. Just double click the Excess icon on the desktop, then run kwrite with the following command kwrite practical/Scenario1.java &

Find the following section in the file, and you will need to start adding the lines below underneath the line // Add code here.

private static void executeRequest(DataRequestExecutionResource drer) throws Exception { // Add code here }

For each of the activities shown in the diagram there is a corresponding activity in the client toolkit. These are Java classes. If you want to create a request like the one above you must construct each of the activities, and add it to the Workflow. The first step is to create the Workflow, using the following line.

PipelineWorkflow pipeline = new PipelineWorkflow();

Then we need to create each activity, add parameters to the activity, and connect it to the workflow.

We will start with the first activity, ReadFromFile , which reads the content of a file stored in a file system resource.

First step is to create the activity, call it readFromFile. ReadFromFile expects the name of a target resource, which is called FileResource on our OGSA-DAI server. Then you need to select the file from which to read data, which is called results1.txt. The final step is to add this to the workflow.

ReadFromFile readFromFile = new ReadFromFile();
readFromFile.setResourceID("FileResource");
readFromFile.addFile("results1.txt");
pipeline.add(readFromFile);

The next activity in our workflow is the Tee activity, which accepts any block as its input and sends this block to all its outputs. We the connect the output of the ReadFromFile activity with the input of the Tee activity. The Tee activity should have two outputs - this means that the data will be copied to two outputs.

Tee tee = new Tee();
tee.connectInput(readFromFile.getDataOutput());
tee.setNumberOfOutputs(2);
pipeline.add(tee);

deliverToFTP activity read from the output of tee, and uploads it as a file on a FTP server. We connect the input of the FTP delivery activity to the first of those outputs and the SMTP delivery activity to the second one.

You must provide a file name to write the data to and the FTP host for the FTP delivery activity. We also need to tell OGSA to use the passive mode for the connection.

DeliverToFTP deliverToFTP = new DeliverToFTP();
deliverToFTP.connectDataInput(tee.getOutput(0));
deliverToFTP.addFilename("/incoming/results_user00.txt");
deliverToFTP.addHost("anonymous:anonymous@tc07.nesc.ed.ac.uk");
deliverToFTP.addPassiveMode(true);
pipeline.add(deliverToFTP);

DeliverToSMTP is the final activity we need to add. The activity takes a sender address, a list of target email addresses, a subject line and data to send to the recipients. The activity will send the data to the list of recipients using the SMTP server taken from the configuration parameters. And the email delivery method also expects a few parameters: the subject of the email, the recipient and the sender email addresses. As there can be multiple recipients we're passing a list (with only one address for this example). Make sure you put in your own email address, rather than youremail@example.com

DeliverToSMTP deliverToSMTP = new DeliverToSMTP();
deliverToSMTP.connectDataInput(tee.getOutput(1));
deliverToSMTP.addFrom("youremail@example.com");
deliverToSMTP.addSubject("OGSA-DAI Test");
List to = Collections.singletonList("youremail@example.com");
deliverToSMTP.addTo(to.iterator());
pipeline.add(deliverToSMTP);

The pipeline request is now ready and you can submit it for execution by the DataRequestExecutionresource on the server.

RequestResource requestResource = drer.execute(pipeline, RequestExecutionType.SYNCHRONOUS);

You can see from the above that we have chosen to execute the request synchronously. This means that the method returns after the request has completed. When the request has completed and the method returned you can print out the request status. The request status contains a status for each activity and any warnings or errors that may have occurred.

System.out.println(requestResource.getRequestStatus());

After adding all these lines, your file should have the following code. This is sligthly rearranged into more groups of the different workflow elements.

//Create our 4 Activities
ReadFromFile readFromFile = new ReadFromFile();;
Tee tee = new Tee();
DeliverToFTP deliverToFTP = new DeliverToFTP();
DeliverToSMTP deliverToSMTP = new DeliverToSMTP();
//Tee
tee.connectInput(readFromFile.getDataOutput());
tee.setNumberOfOutputs(2);
//Read from File Resource
readFromFile.setResourceID("FileResource");
readFromFile.addFile("results1.txt");
//Connect two outputs, one to FTP one to email
deliverToFTP.connectDataInput(tee.getOutput(0));
deliverToSMTP.connectDataInput(tee.getOutput(1));
//Deliver to FTP
deliverToFTP.addFilename("/incoming/results_user00.txt");
deliverToFTP.addHost("anonymous:anonymous@tc07.nesc.ed.ac.uk");
deliverToFTP.addPassiveMode(true);
//Deliver to Email
deliverToSMTP.addFrom("youremail@example.com");
deliverToSMTP.addSubject("OGSA-DAI Test");
List to = Collections.singletonList("youremail@example.com");
deliverToSMTP.addTo(to.iterator());
//WorkFlow
PipelineWorkflow pipeline = new PipelineWorkflow();
pipeline.add(readFromFile);
pipeline.add(tee);
pipeline.add(deliverToFTP);
pipeline.add(deliverToSMTP);
RequestResource requestResource = drer.execute(pipeline, RequestExecutionType.SYNCHRONOUS);
System.out.println(requestResource.getRequestStatus());

We are ready to then compile and run the project. The first command compiles the file we just added to with the Java compiler. The next command runs the file.

[jlyons@tc07 OGSA]$ javac practical/Scenario1.java 

[jlyons@tc07 OGSA]$ java practical.Scenario1 

Request id="ogsadai-115652573a6" 

Request status="uk.org.ogsadai.resource.request.status.COMPLETED" 

Activity instanceName="uk.org.ogsadai.DeliverToFTP-ogsadai-11565397569" status="COMPLETED" 

Activity instanceName="uk.org.ogsadai.Tee-ogsadai-11565397565" status="COMPLETED" 

Activity instanceName="uk.org.ogsadai.DeliverToSMTP-ogsadai-1156539756a" status="COMPLETED" 

Activity instanceName="uk.org.ogsadai.ReadFromFile-ogsadai-11565397563" status="COMPLETED"

When we run the application, we see all parts successfully completed.

The first thing we get back is an uniquely generated ID. Then we see a list of our workflow activites that have been run, and their status.

You should receive an email in the next few seconds. And you can check that the file has been delivered by FTP by type the following command, replacing the results_user00.txt file with the one you created.

more /var/ftp/incoming/results_user00.txt

If the file was correctly delivered via FTP, you will get the following.

[jlyons@tc07 OGSA]$ more /var/ftp/incoming/results_user00.txt 

You have successfully completed the first tutorial!

You have successfully completed the first part of the OGSA-DAI Introductory Practical

4. Producing useful error information

If you have any errors with connecting to the OGSA-DAI resource, we can use the following commands to display more information about what exactly is going on. Replace the line:

RequestResource requestResource = drer.execute(pipeline, RequestExecutionType.SYNCHRONOUS);

with the following lines below:

RequestResource requestResource = null;
try
{
requestResource = drer.execute(pipeline, RequestExecutionType.SYNCHRONOUS);
}
catch(RequestCompletedWithErrorException e)
{
requestResource = e.getRequestResource();
System.out.println(requestResource.getRequestStatus());
}

This will force your application to print out a lot more error information.

5. The Solutions

You can download the solution to this practical here.

6. More Resources

Here are some more resources that you might find useful.

OGSA-DAI 3.0 List of Activities

OGSA-DAI 3.0 User Documentation

OGSA-DAI 3.0: Developing an OGSA-DAI client

Another OGSA-DAI practical

Introductory OGSA-DAI practical 2
Page navigation
Go back to the top of this document's content
Go back to the top of this document
Go back to the main navigation
Jump to this document sections's related navigation
Page navigation
Go back to the top of this document
Go back to the top of this document's content
Go back to the main navigation
Go back to this document's related navigation
Go back to this document sections's related navigation