Today I was attempting to get CSV data from Estates' Alarm System into Google Docs as a spreadsheet. There were two ways to try and achieve this...
But I got stuck. I think it's because the CSV file was UTF-16 Little Endian, and my regular expressions wouldn't work.
I found bugs if you have funnily named header rows ( CamelCaseOnlyPerhaps ). That I solved by adding one cell at a time. It's slow but reliable....
I also found that if I used a DictReader() it broke, so I just iterated through the lines and the items ( which seemed to work ).
import time, urllib, csv, re, time
from pprint import pprint
import gdata.spreadsheet.service
email = 'your.name@york.ac.uk'
password = '********'
'''Warning. This means of connecting to the GDATA API is being deprecated soon in favour of OAuth'''
spr_client = gdata.spreadsheet.service.SpreadsheetsService()
spr_client.email = email
spr_client.password = password
spr_client.source = 'Example CSV To Spreadsheet Writing Application'
spr_client.ProgrammaticLogin( )
spreadsheet_key = '0Ajnu7JgRtB5CdDFQeGM2YVZBNXROcC1vZ0xCQ2tVX1E'
data_url = 'http://www-users.york.ac.uk/~admn812/alarms.csv.Active BA Alarms.csv'
# All spreadsheets have worksheets. I think worksheet #1 by default always have a value of 'od6'
worksheet_id = 'od6'
#### Examples from http://pseudoscripter.wordpress.com/2011/05/09/automatically-update-spreadsheets-and-graphs/ ####
def _CellsUpdateAction(row,col,inputValue,key,wksht_id):
'''You "can" update an entire row, or rows even with a dict(array) or list of them, but I got a bizarre error when doing so, so in an attempt to find the nasty cell, do it a cell at a time'''
entry = spr_client.UpdateCell(row=row, col=col, inputValue=inputValue,
key=key, wksht_id=wksht_id)
if isinstance(entry, gdata.spreadsheet.SpreadsheetsCell):
print row,",", col, 'updated:', inputValue
def _PrintFeed(feed):
'''Just a way to iterate through what's available'''
for i, entry in enumerate(feed.entry):
if isinstance(feed, gdata.spreadsheet.SpreadsheetsCellsFeed):
print '%s %s\n' % (entry.title.text, entry.content.text)
elif isinstance(feed, gdata.spreadsheet.SpreadsheetsListFeed):
print '%s %s %s' % (i, entry.title.text, entry.content.text)
#Print this row's value for each column (the custom dictionary is
# built using the gsx: elements in the entry.)
print 'Contents:'
for key in entry.custom:
print ' %s: %s' % (key, entry.custom[key].text)
print '\n',
else:
# THIS ONE!
print '%s %s, %s' % (i, entry.title.text, str(entry.id.text))
#print dir(entry)
def show_my_spreadsheets():
print "My spreadsheets are..."
feed = spr_client.GetSpreadsheetsFeed()
_PrintFeed(feed)
def replace(text, look_for, replace_with=''):
reg = look_for
p = re.compile(reg, re.IGNORECASE | re.DOTALL)
t = p.sub(replace_with, text)
return t
def match(s, reg):
p = re.compile(reg, re.IGNORECASE| re.DOTALL)
results = p.findall(s)
return results
def get_data(data_url=data_url):
u = urllib.urlopen(data_url)
data = u.read()
data = data.decode("utf-16 LE")
return data
def get_data_from_file(filepath):
'This should work, not tested it.'
f = open(filepath)
data = f.read()
f.close()
data = data.decode("utf-16 LE")
return data
def write_data( dict ):
'Not used'
entry = spr_client.InsertRow(dict, spreadsheet_key, worksheet_id)
if isinstance(entry, gdata.spreadsheet.SpreadsheetsList):
print "Insert row succeeded."
else:
print "Insert row failed."
def run():
filepath = '/Users/tomsmith/Downloads/alarms.csv.Active BA Alarms (8).csv'
data = get_data() # or ... data = get_data_from_file("C:/myfolder/mycsv.csv")
#Strip the first junky stuff off...
data = match( data, '"Time of last Change.*')[0]
# I chose to add the field headers by hand. You can do this on the fly, but
# I found a bug if they had uppercase letters. Grr!
fields = ["Time of last Change","Category","Technical Description","Status","Priority","Alarm Value","Alarm Message",]
# Write header row
for f,field in enumerate(fields):
_CellsUpdateAction(1,f+1,field,spreadsheet_key,worksheet_id)
## Now write the data, cell by cell
data = data.split("\n")
for l, line in enumerate(data):
if l == 0:
pass #the header line
else:
items = line.split("\t")
the_dict = {}
print "Line:", l #remember, zero-based
for i, item in enumerate(items):
#Agh, line 153 in the data doesn't have enough items
try:
the_value = item.replace('"', '') #Strip quotes off the beginning/end
#print the_value
header_name = fields[i].lower().replace(" ", "").replace("'", "")
the_dict[ header_name ] = str(the_value)
_CellsUpdateAction(l+2,i+1,the_value,spreadsheet_key,worksheet_id)
except Exception, err:
#print "\t Line", l, "only has", len(item), "items", items
#print err
pass
#time.sleep(1) #Give Google a chance to catch up a bit
if __name__ == '__main__':
#show_my_spreadsheets()
run( )
- Create an AppScript in Google that pulled a .CSV file from a web server
- Write a (python) script on the local machine that pushed the data into Google Spreadsheet by using the API.
The Google AppScript Way
As you know, my JavaScript ain't great, but it initially looked like it was going to work... Some code like this below and using the Array to CSV functions from here, looked promising.
function encode_utf8( s ){
//This is the code that "I think" turns the UTF16 LE into standard stuff....
return unescape( encodeURIComponent( s ) );
}
//This is the code that "I think" turns the UTF16 LE into standard stuff....
return unescape( encodeURIComponent( s ) );
}
function get_csv() {
var url = 'http://www-users.york.ac.uk/~admn812/alarms.csv.Active BA Alarms.csv'; // Change this to the URL of your file
var response = UrlFetchApp.fetch(url);
// If there's an error in the response code, maybe tell someone
//MailApp.sendEmail("s.brown@york.ac.uk", "Error with CSV grabber:" + response.getResponseCode() , "Text of message goes here")
Logger.log( "RESPONSE " + response.getResponseCode());
var data = encode_utf8(response.getContentText().toString());
return data //as text
}
function importFromCSV() {
// This is the function to which you attach a trigger to run every hour
var rawData = get_csv(); // gets the data, makes it nice...
var csvData = CSVToArray(rawData, "\t"); // turn into an array
Logger.log("CSV ITEMS " + csvData.length);
//Write data to first sheet in this spreadsheet
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet();
//Logger.log(sheet);
////// From: https://developers.google.com/apps-script/articles/docslist_tutorial
// I think this will write data from the 0th cell. It actually needs a line to select ALL the data and delete it,
// in case there is less data than the previous import.
for (var i = 0; i < csvData.length; i++) {
sheet.getRange(i+1, 1, 1, csvData[i].length).setValues(new Array(csvData[i]));
//this might be where you would look at the data and maybe...
// cell.offset(i,i+2).setBackgroundColor("green");
//Logger.log( "i:" + i + " " + csvData[i] );
}
}
var url = 'http://www-users.york.ac.uk/~admn812/alarms.csv.Active BA Alarms.csv'; // Change this to the URL of your file
var response = UrlFetchApp.fetch(url);
// If there's an error in the response code, maybe tell someone
//MailApp.sendEmail("s.brown@york.ac.uk", "Error with CSV grabber:" + response.getResponseCode() , "Text of message goes here")
Logger.log( "RESPONSE " + response.getResponseCode());
var data = encode_utf8(response.getContentText().toString());
return data //as text
}
function importFromCSV() {
// This is the function to which you attach a trigger to run every hour
var rawData = get_csv(); // gets the data, makes it nice...
var csvData = CSVToArray(rawData, "\t"); // turn into an array
Logger.log("CSV ITEMS " + csvData.length);
//Write data to first sheet in this spreadsheet
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet();
//Logger.log(sheet);
////// From: https://developers.google.com/apps-script/articles/docslist_tutorial
// I think this will write data from the 0th cell. It actually needs a line to select ALL the data and delete it,
// in case there is less data than the previous import.
for (var i = 0; i < csvData.length; i++) {
sheet.getRange(i+1, 1, 1, csvData[i].length).setValues(new Array(csvData[i]));
//this might be where you would look at the data and maybe...
// cell.offset(i,i+2).setBackgroundColor("green");
//Logger.log( "i:" + i + " " + csvData[i] );
}
}
But I got stuck. I think it's because the CSV file was UTF-16 Little Endian, and my regular expressions wouldn't work.
The Python Way
The python way is completely different in that it runs on the same computer as where the CSV file and pushes the data into a Google Spreadsheet.I found bugs if you have funnily named header rows ( CamelCaseOnlyPerhaps ). That I solved by adding one cell at a time. It's slow but reliable....
I also found that if I used a DictReader() it broke, so I just iterated through the lines and the items ( which seemed to work ).
import time, urllib, csv, re, time
from pprint import pprint
import gdata.spreadsheet.service
email = 'your.name@york.ac.uk'
password = '********'
'''Warning. This means of connecting to the GDATA API is being deprecated soon in favour of OAuth'''
spr_client = gdata.spreadsheet.service.SpreadsheetsService()
spr_client.email = email
spr_client.password = password
spr_client.source = 'Example CSV To Spreadsheet Writing Application'
spr_client.ProgrammaticLogin( )
spreadsheet_key = '0Ajnu7JgRtB5CdDFQeGM2YVZBNXROcC1vZ0xCQ2tVX1E'
data_url = 'http://www-users.york.ac.uk/~admn812/alarms.csv.Active BA Alarms.csv'
# All spreadsheets have worksheets. I think worksheet #1 by default always have a value of 'od6'
worksheet_id = 'od6'
#### Examples from http://pseudoscripter.wordpress.com/2011/05/09/automatically-update-spreadsheets-and-graphs/ ####
def _CellsUpdateAction(row,col,inputValue,key,wksht_id):
'''You "can" update an entire row, or rows even with a dict(array) or list of them, but I got a bizarre error when doing so, so in an attempt to find the nasty cell, do it a cell at a time'''
entry = spr_client.UpdateCell(row=row, col=col, inputValue=inputValue,
key=key, wksht_id=wksht_id)
if isinstance(entry, gdata.spreadsheet.SpreadsheetsCell):
print row,",", col, 'updated:', inputValue
def _PrintFeed(feed):
'''Just a way to iterate through what's available'''
for i, entry in enumerate(feed.entry):
if isinstance(feed, gdata.spreadsheet.SpreadsheetsCellsFeed):
print '%s %s\n' % (entry.title.text, entry.content.text)
elif isinstance(feed, gdata.spreadsheet.SpreadsheetsListFeed):
print '%s %s %s' % (i, entry.title.text, entry.content.text)
#Print this row's value for each column (the custom dictionary is
# built using the gsx: elements in the entry.)
print 'Contents:'
for key in entry.custom:
print ' %s: %s' % (key, entry.custom[key].text)
print '\n',
else:
# THIS ONE!
print '%s %s, %s' % (i, entry.title.text, str(entry.id.text))
#print dir(entry)
def show_my_spreadsheets():
print "My spreadsheets are..."
feed = spr_client.GetSpreadsheetsFeed()
_PrintFeed(feed)
def replace(text, look_for, replace_with=''):
reg = look_for
p = re.compile(reg, re.IGNORECASE | re.DOTALL)
t = p.sub(replace_with, text)
return t
def match(s, reg):
p = re.compile(reg, re.IGNORECASE| re.DOTALL)
results = p.findall(s)
return results
def get_data(data_url=data_url):
u = urllib.urlopen(data_url)
data = u.read()
data = data.decode("utf-16 LE")
return data
def get_data_from_file(filepath):
'This should work, not tested it.'
f = open(filepath)
data = f.read()
f.close()
data = data.decode("utf-16 LE")
return data
def write_data( dict ):
'Not used'
entry = spr_client.InsertRow(dict, spreadsheet_key, worksheet_id)
if isinstance(entry, gdata.spreadsheet.SpreadsheetsList):
print "Insert row succeeded."
else:
print "Insert row failed."
def run():
filepath = '/Users/tomsmith/Downloads/alarms.csv.Active BA Alarms (8).csv'
data = get_data() # or ... data = get_data_from_file("C:/myfolder/mycsv.csv")
#Strip the first junky stuff off...
data = match( data, '"Time of last Change.*')[0]
# I chose to add the field headers by hand. You can do this on the fly, but
# I found a bug if they had uppercase letters. Grr!
fields = ["Time of last Change","Category","Technical Description","Status","Priority","Alarm Value","Alarm Message",]
# Write header row
for f,field in enumerate(fields):
_CellsUpdateAction(1,f+1,field,spreadsheet_key,worksheet_id)
## Now write the data, cell by cell
data = data.split("\n")
for l, line in enumerate(data):
if l == 0:
pass #the header line
else:
items = line.split("\t")
the_dict = {}
print "Line:", l #remember, zero-based
for i, item in enumerate(items):
#Agh, line 153 in the data doesn't have enough items
try:
the_value = item.replace('"', '') #Strip quotes off the beginning/end
#print the_value
header_name = fields[i].lower().replace(" ", "").replace("'", "")
the_dict[ header_name ] = str(the_value)
_CellsUpdateAction(l+2,i+1,the_value,spreadsheet_key,worksheet_id)
except Exception, err:
#print "\t Line", l, "only has", len(item), "items", items
#print err
pass
#time.sleep(1) #Give Google a chance to catch up a bit
if __name__ == '__main__':
#show_my_spreadsheets()
run( )
The End Result
Is shocking really. There's no error checking and you have to put your credentials in for it to work... but it works! And it means Estates can continue connecting data from various applications into one visualisation dashboard. Hopefully more on that later.
I believe that it should be possible to grab data using AppScript ( the pull approach ) but I was beaten by unicode text formats and rudimentary JavaScript skills. This approach does require the CSV file to be available online, which is a difficulty, a complication or at best, a security challenge.
Tom,
ReplyDeleteI am curious as to why you didn't just use the Import function in Google Spreadsheets to import the CSV file?
Saqib
I did.
ReplyDeleteAs I said, that was the plan and we had it working that way... BUT... and this is probably ignorance on my part ( of unicode file formats ) and Javascript.
The CSV file being generated was UTF-16 Little Endian, which meant that my regex/string code to strip off a large first part of the file and parse the CSV just didn't work... We were unable to either alter in Javascript or get the software generating the CSV to spit anything else out.
So doing it in python on the client machine just became a LOT quicker... it also means we could take a web server out of the equation AND not have a CSV lying around ( the data isn't sensitive, but still ).
Valuable info. Lucky me I found your blog by accident, and I am shocked why this accident did not happened earlier! I bookmarked it & use your script on my blog's page find : Intraday Nifty Tips
ReplyDeleteFYI, to use InsertRow, three things must be true:
ReplyDelete1. The spreadsheet/worksheet must exist and be empty
2. The column headings must be filled with the cell-by-cell method
3. The column headings/dict keys must be all lower case!
InsertRow has two advantages. First, it's much faster than cell-by-cell. Second, it automatically adds rows when you run out.
Also, you can get the spreadsheet and worksheet keys via their names like this:
def get_spreadsheet_key(gc, ss_name):
''' Get the Google Docs Spreadsheet key based on the name
'''
q = gdata.spreadsheet.service.DocumentQuery()
q['title'] = ss_name
q['title-exact'] = 'true'
feed = gc.GetSpreadsheetsFeed(query=q)
key = feed.entry[0].id.text.rsplit('/')[-1]
return key
def get_worksheet_key(gc, ssk, ws_name):
''' Get the Google Docs Worksheet key based on the name
'''
ws_feed = gc.GetWorksheetsFeed(ssk)
for i in range(len(ws_feed.entry)):
if ws_feed.entry[i].title.text == ws_name:
key = ws_feed.entry[i].id.text.rsplit('/')[-1]
return key
return None
Is there a way to import the data from a CSV file on a local drive into a Google spreadsheet automatically. The name and location of the file remain the same but the data changes once everyday.
ReplyDeleteYes
ReplyDeleteHello Tom, is there any way for getting xlxs data into Google Spreadsheets automatically? I'm trying to convert gmail attachments to spreadsheets. But it seems that no existing class or function can do that. Do you have any good idea? Thanks.
ReplyDeleteIn the olden days I used to use xlrd, a python library.
ReplyDeletehttp://www.python-excel.org/
There's probably a better way by now though, maybe read the xlxs spreadsheet and create a new file with the API?
I really like you post.Thanks for sharing.
ReplyDeletegclub casino
goldenslot
Gclub จีคลับ
Nice blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge. McAfee Antivirus software then just go through the link here. Click Here:- https://www.linkedin.com/company/mcafee-antivirus-toll-free-number
ReplyDeleteThanks for such a great information and to provide us. We have the finest Intuit certified technicians and If you're facing trouble in your QuickBooks accounting software then just go through the link here. Click Here:- QuickBooks Technical Support
ReplyDeleteVery Nice Blog I Read Your Post Its Amazing It Is Very Interesting post Thank You For Post then just go through the click here :-Buying a Old Car
ReplyDeleteNice...
ReplyDeletequickbooks intuit number
quickbooks intuit support number
intuit quickbooks number
quickbooks intuit software
quickbooks payroll service
quickbooks payroll service number
quickbooks payroll support number
quickbooks payroll support phone number
quickbooks bookkeeping services
virtual bookkeeping services
remote bookkeeping serivices
Professional bookkeeping serviecs