It’s FY 2013 appropriations season, and bulk access to THOMAS made an appearance in H.Rpt. 112-511 (under the Government Printing Office). The report asks a series of questions about the virtues of XML, and relegates open access to “a task force composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary, to examine these and any additional issues it considers relevant and to report back to the Committee on Appropriations of the House and Senate.” As a Washingtonian, and as a Methodist, even I think that sounds like a lot of committee-ing.
The report’s primary concern is authentication of official government data, certainly a valid concern. Although the suggestion that PDF is the only securable format is absurd, having good data is important. However, the series of questions on the virtues of XML are not all about authentication:
- Which Legislative Branch agency would be the provider of bulk data downloads of legislative information in XML, and how would this service be authorized? [Somewhat security related, but also logistical.]
- How would `House’ information be differentiated from `Senate’ information for the purposes of bulk data downloads in XML? [Both chambers live together on THOMAS as it is; again, logistical.]
- What would be the impact of bulk downloads of legislative data in XML on the timeliness and authoritativeness of congressional information? [Excellent question.]
- What would be the estimated timeline for the development of a system of authentication for bulk data downloads of legislative information in XML? [Logistical.]
- What are the projected budgetary impacts of system development and implementation, including potential costs for support that may be required by third party users of legislative bulk data sets in XML, as well as any indirect costs, such as potential requirements for Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML? [Plenty of projects are already scraping information from THOMAS. By virtue that this information is on the Internet, the ability to say 'no, you can't have that' is gone.]
- Are there other data models or alternative[s] that can enhance congressional openness and transparency without relying on bulk data downloads in XML?[Smart people are saying smart things about this very topic, worth looking into.]
- [How many roads must a man walk down before they call him a man?]
- [How many seas must the white dove sail before she can sleep in the sand?]
In these questions, I sense some apprehension. Indeed, open access to legislative data is a new thing for Congress. But, when one is unsure of how to handle a new legislative situation, one looks back at original intent to find the spirit of the law.
THOMAS was created in the 104th Congress by H.R.2492, which became P.L. 104-53, the Legislative Branch Appropriations for 1996. The 104th defined “legislative information” as ” information, prepared within the legislative branch, consisting of the text of publicly available bills, amendments, committee hearings, and committee reports, the text of the Congressional Record, data relating to bill status, data relating to legislative activity, and other similar public information that is directly related to the legislative process.” The Library of Congress was charged with “reduc[ing] the cost of information support for the Congress by eliminating duplication among systems which provide electronic access by Congress to legislative information,” and “examin[ing] issues regarding efficient ways to make this information available to the public.”
People were excited. Senator Warner even stood to publicly refute a WaPo article saying that Congress was against being open on the Internet. Once THOMAS was approved, it was built in mere weeks, and debuted on January 5, 1995 (looking like this).
It’s no stretch that the words “data,” “efficient,” and “available to the public” mean something different today than in 1995. The entire character of the Internet has changed in the last 17 years. What has not changed, however, is the desirability of eliminating redundancy, and making information efficiently accessible. That’s a worthy goal*. To that end:
Whereas elected legislators are creating legislation of, by and for the people;
Whereas legislators require the most timely and effective access to past legislation, reports, debates and hearings in order to best represent their constituents, and serve the American people;
Whereas democratically governed people, in the United States and around the world, must have access to their laws;
Whereas the nature of access and efficiency have changed since THOMAS began;
Whereas the legislative information in question is already being used by third parties in order to strengthen democracy through interaction with government: Now, therefore, let us
Resolve, That open access to legislative data is a continuation of the spirit of THOMAS to support Congress with efficient, cost effective access to legislative information, and to educate the public about the legislative process, and the laws that effect their lives.

“Is bill writing super fun, or what!?”
*Here is a good place to mention that these are my views as a librarian, budding coder, and regular old citizen, and are not in any way tied to my employer. Official views on legislative data can be found on THOMAS–and isn’t that the point of all this?