Please add your notes here. Don't forget to add your name and email address...
Name: Nathan Freitas (NY Senate CIO office - notes taken by Ken Zalewski)
Email: freitas@senate.state.ny.us aka @natdefreitas
Name: Remy DeCausemaker (CIVX.us)
Email: remyd@civx.us aka @remy_d
NYSenate Youtube Channel Link:
http://www.youtube.com/watch?v=1xpydYP5diY
Why OpenData?
- essential to democracy
- your data is not an island
- need interaction and sharing
- citizens need and want data
- avoid duplication
- focus
Other agencies:
ORPS - desire to make real estate transactions, real property transfers, parcel information throughout NY (except NYC)
How: apps for querying, snapshot, exported as CSV, KML
Why: mandate transparency
Campaign finance
How: download data from BOE and FEC and merge into massive donor DB
Why: Improve transparency - discover employers of donors
Legislative data
How: No API key, XML/JSON
Why: Implement OpenCongress for NYS
Civix.us
How: scrape arbitrary data sources (eg. NY project sunlight, opensecrets.org)
Why: because we can - we have the technology - let's use it; make data available
Give developers ability to implement interfaces to the data
Improve efficiency, cut costs
Code examples:
Utilize SQL Alchemy, Sphinx
Self-documenting, self-testing code
CSVScrubber
Scraper that grabs data from SunlightNY site
Scraper that grabs data from OpenSecrets (eg. 527s)
U.S. Code scraper (uscode.house.gov, use archive.org to obtain old version of USC)
Using Git to observe how data sets are changing
Use Python tools and libraries
Excel-to-CSV from PyAccelerator
Multiproc Python module optimizes CPU utilization
Code and data licensed under AGPL and/or GPLv3
Takeaways:
- tools, raw data, processed data all together as one offering
- version control provides accountability and credibility
Senate should be an upstream aggregator, harvesting and passing information down to those who can present and utilize it
Workflow:
Source (nysenate, upstream) ==> Aggregator (Civx, NYTimes, Sunlight) ==> Developer (downstream)
Name: Thom Neale
Email: twneale@gmail.com
Notes:
We talked about open data for developers vs end users. Many cool ideas were discussed. If we open data to developers, there's no telling what kinds of excellent experiences will emerge for end users. If the formats of data are sufficiently neutral, such as CSV files or whatever, then other formats will be available as a matter of course--even an lay person can cut and paste csv values into a spreadsheet. A great session. Kudos to the incredible and improbable (for albany) new senate team.