Search This Blog

Sunday, January 20, 2013

A UsefulChem Update from 2006-2013 (a little out of date)


Automation components in UsefulChem


This page describes the evolution of software tools which process the usefulchem-molecules blog into a variety of useful formats, e.g., spreadsheets, RSS feeds, and CML for molecular visualization/manipulation tools such as Jmol, as well as adding additional chemical information (InChIs, MWs, supplier info) for the molecules in the UsefulChem project. I will also discuss the on-going development of an automated RSS feed reader for extracting and performing further processing this chemical information, and potential future work in these areas. For more information on this work, and to follow new developments, please refer to my blog entries at http://usefulchem.blogspot.com.

Initial work with Excel / Excel VBA:

Molecule entries in http://usefulchem-molecules.blogspot.com are characterized primarily by a UC number (e.g., UC0188), a SMILES notation, and an image, although other information, such as CAS number, is often added. To summarize and expand on this data in a convenient format, a program in Microsoft Excel Visual Basic for Applications (VBA) (http://showme.physics.drexel.edu/usefulchem/Software/MoleculeBlogInfo/MoleculeBlogInfo.zip) was developed which downloads this page, parses out the desired information, and generates a spreadsheet (http://showme.physics.drexel.edu/usefulchem/Software/MoleculeBlogInfo/usefulchem-molecules/usefulchem-molecules.xls) in which each row represents one blog entry. Given that the blog format itself is rather loose – for example, the SMILES entry might be prefixed by “SMILES” or “SMILES:” – and can change over time, the search criteria for fields were made fully configurable by placing them in an initialization (.ini) file.

Additional information beyond that provided by the blog, such as links to suppliers, were desired, and for this purpose several different freely available software packages and libraries were used. Molecular weight information and molecular format files (CML, MOL) were generated from the SMILES using the CDK Java libraries, while InChI descriptors were produced by OpenBabel. Image files were at first generated using ChemSketch, although these are now simply downloaded directly from the blog itself. Supplier information was acquired by sending HTTP GET requests to chmoogle.com (now eMolecules.com), and processing the responses gleaned from this service.

In addition to the spreadsheet, this software also creates HTML and CML files (e. g., http://showme.physics.drexel.edu/usefulchem/Software/MoleculeBlogInfo/usefulchem-molecules/UC0088.htm) for each blog entry, which in combination allow the molecules in the blog to be viewed with the Jmol applet.

 
RSS feeds and Automation Software in Java:

The spreadsheet format for the usefulchem-molecules blog was a useful beginning. It was, however, not very amenable to automated data processing or other kinds of display desired, particularly for the internet/web. An initial attempt to address these deficiencies involved modifying the Excel VBA software to generate an RSS 1.0 feed (http://showme.physics.drexel.edu/usefulchem/Software/MoleculeBlogInfo/usefulchem-molecules/usefulchem-molecules.rss) of the blog data in addition to its other output. The advantage to having the data in a feed is that can then be viewed using any number of available desktop or web-based readers, such as RSS Bandit (http://www.rssbandit.org) or Bloglines (http://www.bloglines.com). Furthermore, as RSS is simply XML, feeds can contain other XML formatted data, such as Chemical Markup Language (CML). Thus, a feed can be downloaded and parsed for its CML by software such as Bioclipse (http://www.bioclipse.net) or Jmol (http://jmol.sourceforge.net).

A shortcoming of using Excel VBA is that it does not easily lend itself to automation. Also, it is neither truly an open source development platform nor portable to other operating systems such as Unix or Macintosh. Therefore, to address these shortcomings, I rewrote the VBA code in the Java programming language, which is both free (see http://java.sun.com/javase/downloads/index.jsp to download the Java Development Kit) and is implemented on all major operating systems. Once in Java, it was straightforward to set the software up as an service to be run periodically. As a result, the RSS feed and associated files are now regenerated automatically whenever additions or changes are made the usefulchem-molecules blog.

A zip file containing both the source and compiled code for the Java software to convert the usefulchem-molecules blog to an RSS feed can be found at http://showme.physics.drexel.edu/usefulchem/Software/Java/MoleculeBlogInfo/MoleculeBlogInfo.zip.


CMLRSSReader:

Having an RSS feed with special fields provides a launching platform of essentially unlimited opportunities for further treatment of chemical information. Standard RSS readers, however, rarely display little more the and several other standard fields in a feed. Furthermore, they are not extendable or configurable to include additional processing via plug-ins or “hook” programs on a feed, its entries, or the various specialized fields it can contain. Thus, a specialized reader seemed necessary.

Writing a simple feed reader is actually not a particularly difficult software project, and there is a lot of help available in books and web sites (I used “RSS and Atom Programming” from Wrox books (Wrox.com) as a guide for all my RSS programming). I have developed such a reader, again using Java, which begins to address some of our specialized requirements for feeds containing CML and other chemical information. This reader and associated software, which can be downloaded from http://showme.physics.drexel.edu/usefulchem/Software/Java/CMLRSSReader/CMLRSSReader.zip, is still at an early stage in development and can currently handle only RSS 1.0 feeds (and so far has only been tested on the usefulchem-molecules and two other closely related feeds), but demonstrates some of what can be done along lines described above. In addition to the standard reader features of automatically downloading and managing multiple feeds, displaying information contained their item entries, and as tracking new or changed items, the software also allows specialized programs to be executed on the feeds themselves and their contents. In its current form, programs can be configured to run after feed file download and/or processing. These programs can be written in any language, even DOS BAT files (although Java must be used on processed feeds, as they are stored via Java serialization), and can perform any processing/reporting desired, such as calculations using the CML in the feed, internet searches, database entry, and/or e-mailing results to the interested parties.

Two examples of this capability are already being used to automatically generate and upload information for display on the web. One, ExtractHTMLPages, is a Java program that parses the usefulchem-molecules feed file for its item fields and generates an HTML file for each item. ExtractHTMLPages also generates an index file (http://showme.physics.drexel.edu/usefulchem/Software/MoleculeBlogInfo/usefulchem-molecules/Items/UsefulChemistryMolecules.html) of the item HTML files which, using a combination of JavaScript and HTML iframes, allows any of them to be selected for viewing from a drop-down list. When CMLRSSReader downloads a feed, which it does whenever the feed has been updated (which in the case of usefulchem-molecules, occurs whenever the blog is updated), it automatically runs ExtractHTMLPages, generating and uploading all of these files to the web server.

The other example, ExtractNewItems, is a Java program which works with processed feeds to record and detail changes to the feed. When new items are added to the usefulchem-molecules feed, or new information about an item is added or modified, ExtractNewItems generates and uploads two files: newItems.html (http://showme.physics.drexel.edu/usefulchem/Software/MoleculeBlogInfo/usefulchem-molecules/newItems.html) and newItems.xls. True to their names, these files list items that have been added or updated since the last time the program was run. Ultimately, the reason for a new listing will also be given, such as new supplier information, but this is not currently implemented.

Future Directions:

Quite a bit of ground has been covered, and a lot of evolution occurred, since the initial work with Excel VBA. A certain amount of consolidation and strategic consideration would seem to be worthwhile at this point. To begin, the numerous web sites and pages generated would benefit from some organization. This can be done with a single page, or small set of pages, providing links to and descriptions of the various software tools and the pages they generate.

Second, although I have tried to make the CML RSS reader software highly flexible, it needs to be tested for compatibility with other RSS 1.0 feeds containing CML if it is to become of general use to the scientific community. Additional development is almost certainly going to be needed here (no one should expect to be that lucky!). I am also eager to see how the reader might interact with other software, such as Bioclipse, for example in providing CML and other data in automated fashion. This should prove fruitful, as Bioclipse obviously provides so much more in the way of processing and visualization tools than the reader itself. Other enhancements include a replacement for Java’s JEditorPane for displaying item data (JEditorPane’s handling of HTML is fairly primitive), other improvements to the user interface, and more configurable program extensions and/or plug-ins.

Finally, a lot of technologies have yet to be explored in this area. One excellent candidate is the combination of Ajax in HTML pages with chemical information web services. Ajax provides the ability to dynamically query web sites and services without the overhead in time and resources of retransmitting/reloading entire pages. In conjunction with JavaScript events and dynamic HTML, this can essentially turn an ordinary browser into a full-featured software user interface. Ajax also appears quite easy to use. For some simple examples of what can be done with Ajax, see http://showme.physics.drexel.edu/usefulchem/Software/Ajax/UsefulChemistryMolecules/UsefulChemistryMolecules.htm and http://showme.physics.drexel.edu/usefulchem/Software/Ajax/UsefulChemistryMolecules/UsefulChemistryMolecules2.htm (simply hover over any of the UC numbers).

Also, I have just begun to learn about OpenOffice, and hope to convert the Excel applications into them.

Some More Belated JCAMP Work for UsefulChem

Blog Text I have developed a Java package to decompress NMR data taken from our Bruker instrument and stored in JCAMP format.  This software was adapted from Robert Lancashire's jspecview program, specifically the JDXCompressor.java and Coordinate.java classes.  It reads a set of compressed JCAMP NMR files according to a configuration file with the following format: the program's output is a BLOCK JCAMP file, in this case output.jdx, containing the decompressed data from the input files.  Right now only a few of the header fields are retained, those needed for plotting the spectra via Excel VBA software (work in progress!).  An example of this can be downloaded here.

SA expert pushes asteroid mining

SA expert pushes asteroid mining

2012-10-12 14:34
Ron Olivier of SIP wants SA to develop a space mining programme. (Duncan Alfreds, News24)
Ron Olivier of SIP wants SA to develop a space mining programme. (Duncan Alfreds, News24)

kalahari.com



Cape Town - In the future, South African mining companies may become space firms, if a local engineer has his way.

Engineer Ron Olivier is pushing for SA to develop a space mining programme that will either exploit raw materials on the Moon or on Near Earth Objects (NEOs) like asteroids.

The idea holds promise because of the capacity developed when SA built a satellite and launched it into space, he said.

"It came from expertise; it came from my time at SunSpace where we built spacecraft out of basically nothing and reasonably successfully so," Olivier of Shamayan Innovation Partnerships (SIP) told News24.

His presentation at the SA Space Association Congress in Cape Town proposed that a mission to mine NEOs could "produce the largest economic benefit" to the country since the discovery of gold and diamonds.

Extraterrestrial mining

Olivier suggests that partnerships with countries in the Brics could jumpstart a programme to mine asteroids of at least 1km and rich in mineral resources.

The idea may not be as far-fetched as Google's Larry Page and director James Cameron have backed a company called Planetary Resources to mine asteroids.

Some think that NEOs contain high levels of iron ore, platinum, nickel and zinc and that if it could be extracted efficiently, may present a business model to conduct extraterrestrial mining activities.

Olivier suggested that a space port similar to the International Space Station (ISS) could be used to launch missions to asteroids.

"We may want to use an ISS type of organism out there, and then exploit that and launch from there. At the moment the ISS exists and it's been shown to be possible - that you can do that, but it will take a couple of billion to construct that.

In his presentation, Olivier suggested that a 1km asteroid can deliver $150bn in platinum value at current prices and if a re-usable vehicle could be developed to be cost-effective, it made a space mining programme viable within a decade.

"Most probably closer to 10 years than 50 years: Number one, South Africa has immense innovation in the industrialisation of Earth-bound mining machinery," he said.

Partnerships

Unlike Planetary Resources that plans to send astronauts to mine asteroids, SIP intends unmanned robots to do the work.

"The automation is restricted in this country because of our requirements to provide a tremendous amount of jobs to people. No such restrictions are out there in outer space.

"You don't need to transport miners to outer space to go and mine there; in fact, it would be stupid to do so. You have to take machines there and necessity is the mother of all invention," said Olivier.

He proposes partnerships with experts in various disciplines to reduce costs and secure funding.

"What I have suggested here is a purely commercial outlook with some government funding on the side of it. But nothing like funding that whole project. It's a commercial venture."

The idea may seem a bit out of this world, but Olivier said that once the programme was up and running companies would back it.

"SIP is at the stage where it needs quite a bit of funding just in order for me to get around, so we're starting off at zero base. And this is the thing that makes it even crazier to the normal mind, but at the SunSat programme we started at zero base as well."

Olivier challenged South African companies to consider that such a project would be viable as the cost of resources escalate.

"I'm going to say to the companies: 'Either come in, or be left out.'"


- Follow Duncan on Twitter

NASA Funding

Have discovered the recent comments on PENNY4NASA:

Penny4NASA was founded to uphold the importance of Space Exploration and Science. We believe wholeheartedly that our federal funding of the National Aeronautics and Space Administration, at a wimpy 0.48% of the total, does not reflect the hugely important economical, technological and inspirational resource that this agency has been throughout its 50+ year history. With approximately $10 coming back into the economy for every $1 spent, thousands of new science and engineering students becoming inspired continuously, and the multitude of technologies that NASA research has both directly and indirectly made possible, we believe that NASA needs to be funded at a level of at least 1% of the US federal budget. This isn’t a partisan argument, and this isn’t a fiscal budget argument. What this is, is the American people saying that as a society, we want our tax dollars to reflect the importance of science and space exploration. And 0.48% doesn’t cut it. We are calling for NASA budget to be increased to at least 1% of the US annual budget.

I wrote the following the PA congressmen:

Today at 12:44pm
Support Doubling Funding for NASA and the Future Priorities of U.S. Involvement in Space
Dear Representative:
I support Doubling Funding for NASA and the Future Priorities of U.S. Involvement in Space because with even one percent of the federal budget allocated toward NASA, we could essentially half the time of developing space technology and the serious scientific and economic benefits that would result from it; e.g., mining ores & minerals from asteroids would greatly reduce pollution here on Earth. We would all, as well, and not just Americans, finally perceive a future worth working toward, a future which would help us overcome the problems of nations and cultures competing with each other. Thank you.

Proto-metabolism

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wi...