bp, a Perl Bibliography Package (Jont Allen )


Subject: bp, a Perl Bibliography Package
From:    Jont Allen  <jba(at)RESEARCH.ATT.COM>
Date:    Tue, 9 Jun 1998 23:16:35 -0400

This is a multi-part message in MIME format. --------------6D06CFF97FC45AABC9A88FBF Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Dear List, Well, here is an update on my progress with NETBIB, an auditory bib service via the internet, to help you (and me) with your (our) references. My (our?) goal is to help provide all of us with a highly accurate source of bib info, that can quickly be incorporated into your paper, in any format. Not too surprisingly, there is a lot of software out there already. The UNIX community has been busy, covering our bases, yet again, filling in the gaps that MS has chosen not to fill (yet they charged us, for their own personal greedy money-grubing self-serving reasons). Below is open source that does what we (I?) want, available for free. For those of you that know how to use this stuff, here it is. For those of you who do not, and do not have the time to learn it, my (our) project continues. I will research or write an interface that makes it available to us all, for free. But the basic engine has been written, and, as is typical, is free for the asking. What we still need is a netscape (HTML) interface that makes it easy to use, plus more of your data files, which can be sorted and stored by subject, author, year, or what ever you want, etc. Namely an interface that will search the database of papers, and return a file, preformatted in a format of your choice (Latex, ascii, endnote, yes, and even a doc format, if such a thing exists, in a stable and usable form). I have received 15 bibtex databases, and 7 text databases. I would guess that the text databases are conversions from endnote. Perhaps one is not. As it turns out, the text data bases are not too useful, as there are no key words. Thus it looks like I need to get the endnote source, and process it. I am not ready to do that, so please do not send me any endnote files, or any ascii databases. They are not yet useful to me. However I am still interested in bibtex databases, that are 99.99999% free of errors. It is likely that if the database is of your personal papers only, it is likely pretty free of errors. More later. And thanks for your tremendous support. This is all based on 1% of my time. So please be patient. Jont http://www.ecst.csuchico.edu/~jacobsd/bib/bp/index.html -- Jont B. Allen, Room E161 AT&T Labs-Research 180 Park AV. Florham Park NJ 07932 973/360-8545voice, x8092fax http://www.research.att.com/info/jba Can you believe so much been made by so few, by pushing so much of so little, on so many with so little resistance? So fight back. Try Linux. --------------6D06CFF97FC45AABC9A88FBF Content-Type: text/html; charset=us-ascii; name="index.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="index.html" Content-Base: "http://www.ecst.csuchico.edu/~jacobsd/ bib/bp/index.html" <HTML><HEAD><TITLE>bp, a Perl Bibliography Package</TITLE></HEAD> <BODY><H1 align=center><CITE>bp</CITE>, a Perl Bibliography Package</H1> <HR> <P> <CITE>bp</CITE> is a <A href="http://www.perl.com/perl/">Perl</A> library that is designed to: <UL> <LI> Let you quickly make tools to access bibliographies <LI> Let you access multiple bibliography formats transparently <LI> Let you convert between formats <LI> Let you convert between character sets </UL> My first goal when designing the package was only the first -- I had written a number of tools that accessed my BibTeX bibliographies, and I saw that I was reusing a lot of code. So I decided to make a generic package to access BibTeX bibliographies. About a year later I decided that it would be even better if the package could read multiple formats, and convert between them. The result is <CITE>bp</CITE>. <P> This package is in development. It is in the BETA stage, which means that I may still change the interface, but major changes are unlikely. Parts of the package are still missing (namely documentation, automatic format recognition, and a good set of utilities), but everything needed for a working system exists. <P> <H2>Availability</H2> <BLOCKQUOTE> <B>Source code:</B> <UL> <LI>Source (version 0.2.97 -- pre 0.3.0) as of 19 December 1996: <A href="http://www.ecst.csuchico.edu/~jacobsd/bib/archives/bp-0.2.97.tar.gz"> 243k gzipped tar file</A> <LI>Old sources <UL> <LI>Source (version 0.2.3) as of 26 March 1996: <A href="http://www.ecst.csuchico.edu/~jacobsd/bib/archives/bp-0.2.3.tar.gz"> 227k gzipped tar file</A> </UL> <LI>Format updates for 0.2.97 <UL> <LI>Medline, 7 Oct 97 (Jr names, new Entrez HTML): <A href="bp-medline.pl">8k bp module</A> </UL> </UL> <P> <B>Examples:</B> <UL> <LI><A href="ConvertForm.html">Form-interface converter</A> </UL> <P> <B>Documentation:</B> <UL> <LI><A href="CanonicalFields.html">Canonical Format fields</A> <LI><A href="Examples.html">Some sample programs that use <CITE>bp</CITE></A> <LI><A href="FormatDescs.html">An idea for format descriptions</A> <LI><A href="CommandLine.html">The command line options recognized by stdargs</A> </UL> </BLOCKQUOTE> <CITE>bp</CITE> is freely available for use without charge, and may be redistributed freely.<BR> It is copyright 1992-1997 by Dana Jacobsen.<BR> <CITE>bp</CITE> works with both Perl 4 and Perl 5. <H2>Formats Supported</H2> <table border> <tr> <th>Format</th><th>read</th><th>write</th><th>notes</th> </tr> <tr valign=top><td><A href="../formats/bibtex.html">bibtex</A></td> <td>x</td><td>x</td><td>The BibTeX format</td></tr> <tr valign=top><td><A href="../formats/refer.html">refer</A></td> <td>x</td><td>x</td><td>The Refer / BibIX format</td></tr> <tr valign=top><td><A href="../formats/endnote.html">endnote</A></td> <td>x</td><td>x</td><td>EndNote's refer-like format</td></tr> <tr valign=top><td><A href="../formats/tib.html">tib</A></td> <td>x</td><td>x</td><td>Like refer using TeX</td></tr> <tr valign=top><td>procite</td> <td>x</td><td>x</td><td>Comma-delimited import format</td></tr> <tr valign=top><td><A href="../formats/rfc1807.html">rfc1807</A></td> <td>x</td><td>x</td><td>RFC 1807 / 1357. Not well tested.</td></tr> <tr valign=top><td>text</td> <td>x</td><td>x</td><td>Raw lines or paragraphs</td></tr> <tr valign=top><td><A href="ftp://rdt.monash.edu.au/pub/techreports/reports/README">cstra</A></td> <td>x</td><td> </td><td><A href="http://www.rdt.monash.edu.au/tr/siteslist.html">CS Tech Report</A> format</td></tr> <tr valign=top><td>inspec</td> <td>x</td><td> </td><td>The "Doc Type" version.</td></tr> <tr valign=top><td><A href="../formats/medline.html">medline</A></td> <td>x</td><td> </td><td>MEDLARS as output by <A href="http://atlas.nlm.nih.gov:5700/Entrez/index.html">Entrez</A></td></tr> <tr valign=top><td>melvyl</td> <td>x</td><td> </td><td>Not quite sure which version.</td></tr> <tr valign=top><td>ieee</td> <td>x</td><td> </td><td>The old IEEE catalog</td></tr> <tr valign=top><td>powells</td> <td>x</td><td> </td><td>An old <A href="http://www.technical.powells.portland.or.us/">Powell's</A> format</td></tr> <tr valign=top><td>output</td> <td> </td><td>x</td><td>Styles generic, booklist, aacf.</td></tr> <tr valign=top><td>html</td> <td> </td><td>x</td><td>Synonym for output format with HTML charset</td></tr> </table> <H2>Character Sets Supported</H2> <table border> <tr> <th>Charset</th><th>notes</th> </tr> <tr valign=top><td>8859-1</td> <td>ISO 8859-1 8bit characters</td></tr> <tr valign=top><td>apple</td> <td>Apple's 8bit mapping</td></tr> <tr valign=top><td>html</td> <td>HTML -- Hypertext Markup Language</td></tr> <tr valign=top><td>tex</td> <td>TeX</td></tr> <tr valign=top><td>troff</td> <td>troff</td></tr> <tr valign=top><td>dead</td> <td>Dead-key. \'a for &#225;, \/o for &#248;, \-D for &#208;</td></tr> <tr valign=top><td>none</td> <td>strips accents -- 7bit clean</td></tr> </table> <H2>Related Programs</H2> <A href="http://www.mcs.com/~eryq/www/pub/Bib.pm.html">Bib.pm</A> is a Perl5 module that provides an abstraction on <A href="http://www.ecst.csuchico.edu/~jacobsd/bib/formats/refer.html">refer</A> files. It uses the object oriented features of Perl5 to good effect. I wanted to do something of the same thing with bp, but having the ability to work with Perl 4 is nice enough to not want to drop. <P> <A href="http://www.tcisoft.com/tcisoft/bibdb.html">BibDB</A> by Eyal Doron allows importing refer and tib data into it's BibTeX centered database. It is a DOS/Windows program only however, and doesn't allow arbitrary programming. <P> The <A href="http://www.sil.org/sgml/ica2.html">Integrated Chameleon Architecture</A> is a toolset for data translation, that claims to come with a refer &lt;--&gt; bibtex translator among other things. I do not know what the current status is. <H2>Work</H2> <BLOCKQUOTE> <DL> <DT><B>1 Jan 97</B> <DD>String evals in Perl 5.003 are still not cached, so are deathly slow. This meant some changed for standard implode. For all formats, define (at)reginfo at beginning, then register at end (mainly so the registration info will be placed at the head of the file, but the actual registration will happen after all other variables (notably %options) are defined). Charsets now use &amp;reg_charset(). <DT><B>26 Dec 96</B> <DD>Help for formats is set up ("-bibhelp bibtex" gives some information on the BibTeX format and also on module options). There is a standard options parser for modules, and all modules have been changed to use it. <DT><B>23 Dec 96</B> <DD>Lots of misc work throughout. RFC1807 has been completely rewritten and follows the specification almost exactly. <DT><B>21 Dec 96</B> <DD>Most of the 01XX Unicode characters are supported by the TeX charset. BibTeX does OPTfield squeezing. <DT><B>19 Dec 96</B> <DD>Put a pre-0.3.0 release on the web page. BibTeX parsing is the biggest change -- strings are much more robust, crossrefs are filled out. There is a CACM output style. <DT><B>17 Dec 96</B> <DD>Getting ready for another release in a couple weeks. Endnote has been put into a separate module from refer because of the charset issues. A BIDS module should be in place. bibconv has a usage display! The BibTeX module now handles crossref entries. <DT><B>15 Mar 96</B> <DD>More changes to a lot of code. The old HTML code is gone, replaced with a generic output module, with style programs generated from a more generic description file. Support for HTML headers and trailers exists, so all the functionality of the old HTML module (what little there was) is there. Now reads comma-delimited Procite files, write RSN. Automatic EndNote record detection in the refer module. New bibconv program that's a little more user-friendly than tconv. <DT><B>21 Jan 96</B> <DD>Minor cleanup to most packages. Finished basic EndNote support. Handle BibTeX name conversion with braces. Fixed typo in the convert code. Tried writing a procite converter, but the every database seems different. <DT><B>17 Jan 96</B> <DD>Changed Medline to read <A href="http://atlas.nlm.nih.gov:5700/Entrez/index.html">Entrez</A> MEDLARS format. Both the saved files, and with the html option, search results. <DD>Refer now has an <tt>endnote</tt> option that supports some of the differences to refer that EndNote uses. <DD>Added a CSTRA format that reads the format defined by <A href="ftp://rdt.monash.edu.au/pub/techreports/reports/README">CSTRA</A>. <DT><B>15 Jan 96</B> <DD>Added INSPEC format. <DT><B>2 Dec 95</B> <DD>Just released 0.2.0 which uses an all new character set methodology. <DT><B>1 Dec 95</B> <DD>Set up the newest <A href="ConvertForm.html">form interface converter</A>. <DT><B>28 Nov 95</B> <DD>Began work on <A href="ModuleDesign.html">format module documentation</A>. <DT><B>Previous</B> <DD><A href="worklogs.html">Older logs</A> are archived. </DL> </BLOCKQUOTE> <H2>Major work left to do (in arbitrary order)</H2> <UL> <LI>Tests <UL> <LI>Regression test, looks directly at error string to check proper errors </UL> <LI>End User Programs <UL> <LI>count (count entries) -- <em>done</em> <LI>grep (search for data) -- <em>done</em> <LI>sort (sort data) -- <em>done</em> <LI>merge (merge multiple files) <LI>dupdet (detect duplicates using fuzzy analysis) -- <em>done</em> <LI>gedit (global search and replace) <LI>flist (unique sorted list of keys in a field) <LI>something to keep a database of appropriate names, and flag any entries which are not listed in that database (journal names, for instance) <LI>bibtv, or other text-based interactive interface <LI>Use TK to make an X interface? </UL> <LI>Documentation <UL> <LI>Online <UL> <LI>Formats <LI>Tools </UL> <LI>bp's <I>doc</I> routine <UL> <LI>Specs on valid calls <LI>What to pass on to formats? </UL> </UL> <LI>Automatic format recognition <UL> <LI>How do we pick formats to call? <LI>Do we open, read, close? Check length of file? Do backtrackable I/O? </UL> <LI>Character set conversion <UL> <LI>Make sure we handle charset conversion on the same format <LI>TeX/troff from r2b <LI>Oscar Nierstrasz's <A href="http://iamwww.unibe.ch/~scg/Src/">accent.pl</A> <LI>MARC documents <LI>troff lists <LI>TEI has a <em>lot</em> of character set definitions. </UL> <LI>User config files <UL> <LI>Default format <LI>auto-recognizer format list <LI>pre/post conversion links <LI>extra directories to search for formats <LI>Possibly overriding formats, charsets, and conversions </UL> <LI>Hooks <UL> <LI>Where do they go? <LI>How to call without adding excessive overhead </UL> </UL> <HR> <b>2 January 1997</b> <ADDRESS> <A href="http://www.ecst.csuchico.edu/~jacobsd/Dana.html">Dana Jacobsen</A><BR> dana(at)acm.org </ADDRESS> </BODY></HTML> --------------6D06CFF97FC45AABC9A88FBF-- McGill is running a new version of LISTSERV (1.8c on Windows NT). Information is available on the WEB at http://www.mcgill.ca/cc/listserv


This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University