Ver Mensaje Individual
  #1 (permalink)  
Antiguo 01/03/2016, 16:56
Zipus
 
Fecha de Ingreso: junio-2010
Mensajes: 106
Antigüedad: 14 años, 5 meses
Puntos: 1
Formato de string erroneo

Buenas, soy programador en php, acabo de empezar en python hace apenas una semana porque tengo que realizar unos proyectos. Les agradeceria que me ayudasen son lo siguiente:

Estoy intentando crear un scrapper web y que una vez extraidos los datos los pase a un excel, el problema viene en que los datos me llegan en formato xml dentro de un array y no se como reemplazar los caracteres para que sean validos para introducirlos en excel. Se que el script se puede reducir en muchas menos linias, pero aun no tengo la experiencia.

He intentando aplicando month.replace("[","").replace("(","").replace(")","").replace(")", "")
Tambien he probado algo similar con re.sub y con lstrip, rstrip pero nada parece funcionar


Código Python:
Ver original
  1. from lxml import html
  2. import requests
  3. from openpyxl import Workbook
  4. wb = Workbook()
  5. ws = wb.active
  6.  
  7. def Volume( url ):
  8.     tree = html.fromstring(requests.get(url).content)
  9.     month = tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[1]/th/span/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[2]/th/span/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[3]/th/span/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[4]/th/span/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[5]/th/span/text()')
  10.     tvol = tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[1]/td[4]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[2]/td[4]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[3]/td[4]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[4]/td[4]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[5]/td[4]/text()')
  11.     atclose = tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[1]/td[11]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[2]/td[11]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[3]/td[11]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[4]/td[11]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[5]/td[11]/text()')
  12.     change = tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[1]/td[12]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[2]/td[12]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[3]/td[12]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[4]/td[12]/text()'), tree.xpath('//*[@id="volumeDetailProductTable"]/tbody/tr[5]/td[12]/text()')
  13.  
  14.     if url == 'http://www.cmegroup.com/trading/agricultural/grain-and-oilseed/corn_quotes_volume_voi.html':
  15.         ws['A2']=month[0]
  16.         ws['A3']=month[1]
  17.         ws['A4']=month[2]
  18.         ws['A5']=month[3]
  19.         ws['A6']=month[4]
  20.  
  21.         ws['B2']=tvol[0]
  22.         ws['B3']=tvol[1]
  23.         ws['B4']=tvol[2]
  24.         ws['B5']=tvol[3]
  25.         ws['B6']=tvol[4]
  26.  
  27.         ws['C2']=atclose[0]
  28.         ws['C3']=atclose[1]
  29.         ws['C4']=atclose[2]
  30.         ws['C5']=atclose[3]
  31.         ws['C6']=atclose[4]
  32.  
  33.         ws['D2']=change[0]
  34.         ws['D3']=change[1]
  35.         ws['D4']=change[2]
  36.         ws['D5']=change[3]
  37.         ws['D6']=change[4]  
  38.     elif url == 'http://www.cmegroup.com/trading/agricultural/grain-and-oilseed/wheat_quotes_volume_voi.html':
  39.         ws['A8']=month[0]
  40.         ws['A9']=month[1]
  41.         ws['A10']=month[2]
  42.         ws['A11']=month[3]
  43.         ws['A12']=month[4]
  44.  
  45.         ws['B8']=tvol[0]
  46.         ws['B9']=tvol[1]
  47.         ws['B10']=tvol[2]
  48.         ws['B11']=tvol[3]
  49.         ws['B12']=tvol[4]
  50.  
  51.         ws['C8']=atclose[0]
  52.         ws['C9']=atclose[1]
  53.         ws['C10']=atclose[2]
  54.         ws['C11']=atclose[3]
  55.         ws['C12']=atclose[4]
  56.  
  57.         ws['D8']=change[0]
  58.         ws['D9']=change[1]
  59.         ws['D10']=change[2]
  60.         ws['D11']=change[3]
  61.         ws['D12']=change[4]
  62.    
  63.     return month[1],tvol[2],atclose[4],change[4]
  64.    
  65. ws['A1']="Vencimiento"
  66. ws['B1']="Total Volumen"
  67. ws['C1']="Op. Int. At Close"
  68. ws['D1']="Op. Int. Change"
  69.  
  70. print Volume('http://www.cmegroup.com/trading/agricultural/grain-and-oilseed/corn_quotes_volume_voi.html')
  71.  
  72. print Volume('http://www.cmegroup.com/trading/agricultural/grain-and-oilseed/wheat_quotes_volume_voi.html')
  73.  
  74. wb.save('balances.xlsx')

El output de ejecutar una vez la funcion Volume mostrando: month, tvol, atclose, change es el siguiente:

Código CONSOLA:
Ver original
  1. ((['MAR 16'], ['MAY 16'], ['JUL 16'], ['SEP 16'], ['DEC 16']), (['23,036'], ['144,696'], ['48,048'], ['18,267'], ['21,333']), (['16,103'], ['656,971'], ['287,740'], ['126,250'], ['175,102']), (['-7,746'], ['4,999'], ['3,182'], ['2,095'], ['2,315']))

El error en consola es el siguiente:
Código CONSOLA:
Ver original
  1. zipus@zipus-linux ~/Escritorio $ python scrapper.py
  2. Traceback (most recent call last):
  3.   File "scrapper.py", line 71, in <module>
  4.     print Volume('http://www.cmegroup.com/trading/agricultural/grain-and-oilseed/corn_quotes_volume_voi.html')
  5.   File "scrapper.py", line 16, in Volume
  6.     ws['A2']=month[0]
  7.   File "/usr/local/lib/python2.7/dist-packages/openpyxl/worksheet/worksheet.py", line 342, in __setitem__
  8.     self[key].value = value
  9.   File "/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.py", line 305, in value
  10.     self._bind_value(value)
  11.   File "/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.py", line 209, in _bind_value
  12.     raise ValueError("Cannot convert {0} to Excel".format(value))
  13. ValueError: Cannot convert ['MAR 16'] to Excel

Saludos!