OGSA-DAI Introductory Practical 1: A Simple Workflow

OGSA-DAI is a middleware product that allows data resources, such as relational or XML databases, to be accessed via web services. An OGSA-DAI web service allows data to be queried, updated, transformed and delivered. OGSA-DAI web services can be used to provide web services that offer data integration services to clients.

This tutorial will introduce you to using OGSA-DAI to access Files and Databases.

This OGSA-DAI tutorial has multiple aims:

    * to show how OGSA-DAI enables easy access to remote data resources (where direct access to the data may not be possible).
    * to show how to use the OGSA-DAI client toolkit to access a data source and perform multiple actions on the data before retrieving the results.

The OGSA-DAI client toolkit comes with several pre-built tools and the first part of this tutorial will make use of these tools before focussing on using the API for the toolkit. We will be using an OGSA-DAI server running on tc07.nesc.ed.ac.uk. These have version 3.0 installed .

 

1. Setup the Environment


Using GSI-SSH, connect to the machine tc07.nesc.ed.ac.uk on the training cluster: there should be a Java GSISSH client on your desktop. Just double click the icon, then select open, and enter the machine name.

Next we need to setup the environment we will be using. This command sets up the environment for Java, Globus 4.0.5 which contains the OGSA-DAI implementation, and the environment variables for Globus.

module load java ogsadai_wsrf

Next, we will download the files we will need, and extract them into a usable form. This command will download them from this website.

wget http://training.omii-europe.org/Tutorials/OGSA-DAI/WorkflowTC.tar.gz

And then we extract them using the tar command. This will create a directory called practical, that contains the files we will need for the practical

tar -xzvf WorkflowTC.tar.gz

You should see the following two files extracted in the folder practical. Scenario1.java is the files that we are most interested in, the one we will be editing today.

[user00@tc07 test1]$ tar -xzvf WorkflowTC.tar.gz

practical/ClientBase.java

practical/Scenario1.java


2. Testing OGSA-DAI


These next few commands will test your OGSA-DAI environment.

We will be using the XMLDB client, this is a simple command-line client for listing collections and resources in XMLDB data resources and running XPath queries and displaying the results.

The first command executes the ListResources method, which provides interfaces to OGSA-DAI WSRF-compliant data services. It returns a list of the IDs of the data service resources currently known to this service, in particular we are after the data resource ID, SwissProt_Resource. The -u command provides the Services base URL, where OGSA-DAI sits listening on.

java uk.org.ogsadai.client.toolkit.example.XMLDBClient -u http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai -c listResources -d SwissProt_Resource

You should get the following output.

[jlyons@tc07 practical]$ java uk.org.ogsadai.client.toolkit.example.XMLDBClient -u http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai -c listResources -d SwissProt_Resource

DRER ID: DataRequestExecutionResource

Data Resource ID: SwissProt_Resource

Base Services URL: http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai

Command: listResources

DRER ID: DataRequestExecutionResource

Data Resource ID: SwissProt_Resource

Base Services URL: http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai

Command: listResources

uk.org.ogsadai.resource.request.status.COMPLETED

shortenedSprot_50000.xml

 

Next we will execute a SQL query on the LittleBlackBook data resource. LittleBlackBook is a read-only SQL resource using the postgresql server on another machine on the training cluster.

The -q command contains the simple SQL query, which in this case just lists all the entries with id less than 10 from the Table LittleBlackBook on the resource LittleBlackBook.

This command should be typed all on one line.

java uk.org.ogsadai.client.toolkit.example.SQLClient -u http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai/ -q "SELECT * FROM LittleBlackBook limit 10" -d LittleBlackBook

Execute this command and you should get the following list.

[user00@tc07 test1]$ java uk.org.ogsadai.client.toolkit.example.SQLClient -u http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai/ 
-q "SELECT * FROM LittleBlackBook limit 10" -d LittleBlackBook
DRER ID: DataRequestExecutionResource Data Resource ID: LittleBlackBook Base Services URL: http://tc07.nesc.ed.ac.uk:8080/wsrf/services/dai/ SQLQuery: SELECT * FROM LittleBlackBook limit 10 uk.org.ogsadai.resource.request.status.COMPLETED | id | name | address | phone | | 1 | Andrew Hume | 643 Borley Street, Southampton | 09634351446 | | 2 | Simon Hicken | 130 Watson Crescent, Southampton | 01694469132 | | 3 | Dave Laws | 315 Watson Lane, San Jose | 01533184035 | | 4 | Mike Magowan | 115 Atkinson Street, Southampton | 06606928401 | | 5 | Dave Anjomshoaa | 29 Anjomshoaa Gardens, San Jose | 01511301812 | | 6 | Ally Laws | 111 Chue Hong Drive, Southampton | 03790714419 | | 7 | Dave Anjomshoaa | 83 Sugden Place, San Jose | 05080280061 | | 8 | Paul Hume | 735 Hicken Road, Edinburgh | 03623361318 | | 9 | Mike Hume | 328 Chue Hong Place, Southampton | 02876302616 | | 10 | Charaka Palansuriya | 845 Magowan Gardens, Edinburgh | 01714367452 |

Success. It first tells us the Data Resource we are accessing, LittleBlackBook, the Base service URL where OGSA-DAI is hosted, and the actual SQL query. Next we see that it returns with the Status completed, and the data of the first 10 entries in the LittleBlackBook table.


3. A simple workflow

In the following exercise we will look at how to implement and execute a workflow with the OGSA-DAI client toolkit. The scenario below shows a simple workflow that we are going to implement. Data is from the SQL database using the SQLQuery activity. The data are then piped to the TupleToWebRowSetCharArrays activity which produces the output in a readable form. The final activity writes the data to a remote FTP server.

So now we have the workflow, lets look at adding it to the file. You will need to use a unix text editor to edit the file. You can also use a graphical editor, or a text editor. Just choose ONE of the ways presented below.

  • You can use an ordinary text editor like nano, which is quite easy to use. Just type the following command. nano practical/Scenario1.java
  • Or if you have X desktop support (eg. Exceed) you can use a graphical editor like kwrite, which supports the mouse and menus. kwrite practical/Scenario1.java &

vi is also available if you would like to use that.

Find the following section in the file, this is where we will start adding the new code, just below // Add code here.

private static void executeRequest(DataRequestExecutionResource drer) throws Exception { // Add code here }

For each of the activities shown in the diagram there is a corresponding activity in the client toolkit. These are Java classes. If you want to create a request like the one above you must construct each of the activities, and add it to the Workflow. Creating each activity takes the same number of steps. We instantiate it, then add parameters to the activity, then connect it to the workflow.

The first step is to create the Workflow, using the following line.

PipelineWorkflow pipeline = new PipelineWorkflow();

We will start with the first activity, SQLQuery , which sends an SQL Query to a data resource, and returns with the data in the form of tuples. We first create the activity, add parameters to the activity, in this case two parameters, the ResourceID we will be querying, and the SQL expression. Then we connect the activity to the workflow.

First step is to create the activity, call it dsql. SQLQuery expects the name of a data resource, which is called LittleBlackBook on our OGSA-DAI server. Then you need to call SQLQuery with the SQL command. The final step is to add this to the workflow.

SQLQuery dsql = new SQLQuery();
dsql.setResourceID("LittleBlackBook");
dsql.addExpression("select * from LittleBlackBook where id<10");
pipeline.add(dsql);

The next activity in our workflow is the TupleToWebRowSetCharArraysRef activity, which accepts lists of tuples and converts them to WebRowSet XML formatted output of type [char[]].

We only need to connect input of the TupleToWebRowSetCharArrays activity with the output of the SQLQuery activity. Then all we need to do is add it to the pipeline.

TupleToWebRowSetCharArrays tupleToWebRowSet = new TupleToWebRowSetCharArrays();
tupleToWebRowSet.connectDataInput(dsql.getDataOutput());
pipeline.add(tupleToWebRowSet);

deliverToFTP activity read from the output of tee, and uploads it as a file on a FTP server. We connect the output of the TupleToWebRowSetCharArrays delivery activity to the input of the FTP activity.

After creating the activity, we need to provide a name and location for the data to be stored in. You will need to change the name of the directory this is stored in, to your user name. So change the red section. We also need to add the FTP host for the FTP delivery activity and to tell OGSA-DAI to use the passive mode for the connection. ANd of course, we need to connect in input of this activity to the output of the last activity, or it will have no data. Be sure to change the filename to something directory name in red, so that you don't overwrite over peoples uploads.

DeliverToFTP dftp = new DeliverToFTP();
dftp.addFilename("/incoming/user00/SQLOutput.txt");
dftp.addHost("anonymous:anonymous@tc07.nesc.ed.ac.uk");
dftp.addPassiveMode(true);
dftp.connectDataInput(tupleToWebRowSet.getResultOutput());
pipeline.add(dftp);

We have now added out 3 activities, and connected them to the pipeline. The pipeline request is now ready and you can submit it for execution by the DataRequestExecutionresource on the server.

RequestResource requestResource = drer.execute(pipeline, RequestExecutionType.SYNCHRONOUS);

You can see from the above that we have chosen to execute the request synchronously. This means that the method returns after the request has completed. When the request has completed and the method returned you can print out the request status. The request status contains a status for each activity and any warnings or errors that may have occurred.

System.out.println(requestResource.getRequestStatus());

Nextn compile the code and run the project. The first command compiles the file we just added to with the Java compiler. The next command runs the file.

[jlyons@tc07 OGSA]$ javac practical/Scenario1.java 

[jlyons@tc07 OGSA]$ java practical.Scenario1 

When we run the application, you should see all parts successfully completed.

[jlyons@tc07 OGSA]$  java practical.Scenario1

Request id="ogsadai-115852d323c"

Request status="uk.org.ogsadai.resource.request.status.COMPLETED"

Activity instanceName="uk.org.ogsadai.DeliverToFTP-ogsadai-1175f2b3fc9" status="COMPLETED"

Activity instanceName="uk.org.ogsadai.TupleToWebRowSetCharArrays-ogsadai-1175f2b3fc7" status="COMPLETED"

Activity instanceName="uk.org.ogsadai.SQLQuery-ogsadai-1175f2b3fc5" status="COMPLETED"

If all has run successfully, you should be able to see the file created on the FTP server.You can check that the file has been delivered by FTP by type the following command, replacing the red user00 with your own user number file with the one you created.

more /var/ftp/incoming/user00/SQLOutput.txt

You have successfully completed the first part of the OGSA-DAI Introductory Practical

4. A more complex workflow

Next we are going to add to the workflow, so that it not only uploads to a ftp server, it also send you the data in an email. You can see from the workflow below that we not only need to add the new email delivery activity, but an activity called Tee.

Load up the last file just as you did before. We will be making some changes to it, adding in two new activities. Just as before, we need to create each activity, add parameters to the activity, and connect it to the workflow.

We already have the pipeline defined, and 3 activities created. The first activity we will add is the Tee activity, which accepts any block as its input and sends this block to all its outputs. In this case, we will need to add the input to TupleToWebRowSetCharArraysRef activity.

The Tee activity has two parameters, we need to connect the input to tupleToWebRowSet (connectInput), just as we did for the DeliverToFTP activity above. And we need to tell it to setNumberOfOutputs to 2 (setNumberOfOutputs).

//Tee - create 2 outputs
Tee tee = new Tee();
...
...

DeliverToSMTP is the next activity we need to add. The activity takes a sender address (addFrom), a list of target email addresses (addTo), a subject line (addSubject) and data to send to the recipients (connectDataInput). The activity will send the data to the list of recipients using the SMTP server taken from the configuration parameters. As there can be multiple recipients we're passing a list, with only one address for this example (Collections). Make sure you put in your own email address.

So the first step is to create the activity,

//Send email
DeliverToSMTP deliverToSMTP = new DeliverToSMTP();
...
...
...
...
...
...

The last step you need to do, is alter the input of the DeliverToFTP activity, so that it now reads from the Tee, not from the tupleToWebRowSet activity.

If you have problems adding these two activities on your own, you can check out the hints files, which goes into more details.

We are ready to then compile and run the project again.

[jlyons@tc07 OGSA]$ javac practical/Scenario1.java 

[jlyons@tc07 OGSA]$ java practical.Scenario1 

Request id="ogsadai-115652573a6" 

Request status="uk.org.ogsadai.resource.request.status.COMPLETED" 

Activity instanceName="uk.org.ogsadai.DeliverToFTP-ogsadai-11565397569" status="COMPLETED" 

Activity instanceName="uk.org.ogsadai.Tee-ogsadai-11565397565" status="COMPLETED" 

Activity instanceName="uk.org.ogsadai.DeliverToSMTP-ogsadai-1156539756a" status="COMPLETED" 

Activity instanceName="uk.org.ogsadai.ReadFromFile-ogsadai-11565397563" status="COMPLETED"

When we run the application, we see all parts successfully completed.

In addition to the previous activity status report we get two more activities returning completed status.

You should receive an email in the next few seconds.

You have successfully completed the first part of the OGSA-DAI Introductory Practical

5. Producing useful error information

If you have any errors with connecting to the OGSA-DAI resource, we can use the following commands to display more information about what exactly is going on. Replace the line:

RequestResource requestResource = drer.execute(pipeline, RequestExecutionType.SYNCHRONOUS);

with the following lines below:

RequestResource requestResource = null;
try
{
requestResource = drer.execute(pipeline, RequestExecutionType.SYNCHRONOUS);
}
catch(RequestCompletedWithErrorException e)
{
requestResource = e.getRequestResource();
System.out.println(requestResource.getRequestStatus());
}

This will force your application to print out a lot more error information.

6. The Solutions

You can download the solution to this practical here.

7. More Resources

Here are some more resources that you might find useful.

OGSA-DAI 3.0 List of Activities The complete list of activities shipped with OGSA-DAI

OGSA-DAI 3.0 User Documentation

OGSA-DAI 3.0: Developing an OGSA-DAI client

Another OGSA-DAI practical

Introductory OGSA-DAI practical 2
Page navigation
Go back to the top of this document's content
Go back to the top of this document
Go back to the main navigation
Jump to this document sections's related navigation
Page navigation
Go back to the top of this document
Go back to the top of this document's content
Go back to the main navigation
Go back to this document's related navigation
Go back to this document sections's related navigation