• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

CapCampNY_OpenDatafordevelopers

Page history last edited by remyd 14 years, 9 months ago

Please add your notes here. Don't forget to add your name and email address...


Name:  Nathan  Freitas (NY  Senate  CIO  office - notes taken by Ken Zalewski)

Email: freitas@senate.state.ny.us aka @natdefreitas

 

Name: Remy DeCausemaker (CIVX.us)

Email: remyd@civx.us aka @remy_d

 

NYSenate Youtube Channel Link:

http://www.youtube.com/watch?v=1xpydYP5diY

 

Why OpenData?

  • essential to democracy
  • your data is not an island
  • need interaction and sharing
  • citizens need and want data
  • avoid duplication
  • focus

 

Other agencies:

ORPS - desire to make real estate transactions, real property transfers, parcel information throughout NY (except NYC)

How: apps for querying, snapshot, exported as CSV, KML

Why: mandate transparency

 

Campaign finance

How: download data from BOE and FEC and merge into massive donor DB

Why: Improve transparency - discover employers of donors

 

Legislative data

How: No API key, XML/JSON

Why: Implement OpenCongress for NYS

 

Civix.us

How: scrape arbitrary data sources (eg. NY project sunlight, opensecrets.org)

Why: because we can - we have the technology - let's use it; make data available

Give developers ability to implement interfaces to the data

Improve efficiency, cut costs

 

Code examples:

Utilize SQL Alchemy, Sphinx

Self-documenting, self-testing code

CSVScrubber

Scraper that grabs data from SunlightNY site

Scraper that grabs data from OpenSecrets (eg. 527s)

U.S. Code scraper (uscode.house.gov, use archive.org to obtain old version of USC)

Using Git to observe how data sets are changing

 

Use Python tools and libraries

Excel-to-CSV from PyAccelerator

Multiproc Python module optimizes CPU utilization

 

Code and data licensed under AGPL and/or GPLv3

 

Takeaways:

  • tools, raw data, processed data all together as one offering
  • version control provides accountability and credibility
  • license: AGPL, GPLv3

 

Senate should be an upstream aggregator, harvesting and passing information down to those who can present and utilize it

 

Workflow:

Source (nysenate, upstream) ==> Aggregator (Civx, NYTimes, Sunlight) ==> Developer (downstream)

 


Name: Thom Neale     

Email: twneale@gmail.com

Notes:

 

We talked about open data for developers vs end users.  Many cool ideas were discussed.  If we open data to developers, there's no telling what kinds of excellent experiences will emerge for end users.  If the formats of data are sufficiently neutral, such as CSV files or whatever, then other formats will be available as a matter of course--even an lay person can cut and paste csv values into a spreadsheet.  A great session.  Kudos to the incredible and improbable (for albany) new senate team.