{"id":3740,"date":"2016-02-29T05:00:02","date_gmt":"2016-02-29T10:00:02","guid":{"rendered":"http:\/\/kodegeek.com\/blog\/?p=3740"},"modified":"2016-02-28T18:40:22","modified_gmt":"2016-02-28T23:40:22","slug":"usando-formatos-de-archivo-a-la-medida-en-python-3","status":"publish","type":"post","link":"http:\/\/kodegeek.com\/blog\/2016\/02\/29\/usando-formatos-de-archivo-a-la-medida-en-python-3\/","title":{"rendered":"Guardando datos usando formatos de archivo a la medida, en Python 3"},"content":{"rendered":"<p>En Venezuela una de las pocas formas de saber la paridad entre el dolar &#8220;paralelo&#8221; y el Bolivar fuerte es utilizando el portal &#8220;DolarToday&#8221;. El sitio web (https:\/\/dolartoday.com\/historico-dolar\/) ofrece datos que van desde el 2010 hasta el presente, en los cuales puede ver la paridad entre las dos monedas.<\/p>\n<p>En un arranque de ociosidad, decid\u00ed bajarme el archivo de Excel con las tasas de conversi\u00f3n, lo exporte a CSV y de all\u00ed escrib\u00ed un programa en Python 3 que hace lo siguiente:<\/p>\n<ul>\n<li>Extendiende la clase &#8216;dict&#8217; en Python para mantener un diccionario con claves ordenadas, el cual se pueda guardar y recuperar a si mismo desde el disco duro<\/li>\n<li>Usar struct.Struct para guardar y leer data binaria (tambi\u00e9n muestro como leer un archivo comprimido con gzip).<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>El c\u00f3digo a continuaci\u00f3n:<\/p>\n<pre lang=\"python\">\r\n#!\/usr\/bin\/env python3\r\n# Simple program to save the 'Dolartoday' extra official dollar rates from CSV to a custom format\r\n# @author Jose Vicente Nunez, josevnz@kodegeek.com\r\nimport os, sys, datetime, re, gzip, struct\r\nfrom optparse import OptionParser, OptionValueError\r\nfrom datetime import datetime\r\n\r\nclass DollarToday:\r\n    \r\n    def __init__(self, ddate, value):\r\n        if isinstance(ddate, datetime):\r\n            self.__date = ddate\r\n        else:\r\n            self.__date = datetime.strptime(ddate, \"%m-%d-%Y\")\r\n        self.__value = float(value)\r\n        assert self.__value > 0.0, \"\u00a1{} para la fecha {} es invalida!\".format(value, self.__date)\r\n        \r\n    @property\r\n    def date(self):\r\n        return self.__date\r\n    \r\n    @date.setter\r\n    def date(self, date):\r\n        assert isinstance(date, datetime), \"Invalid date {}\".format(date)\r\n        self.__date = date\r\n    \r\n    @property\r\n    def value(self):\r\n        return self.__value\r\n    \r\n    def __str__(self):\r\n        return \"DollarToday[date={}, value={}]\".format(self.__date, self.__value)\r\n    \r\n    def __hash__(self):\r\n        return str(id(self))\r\n            \r\nclass DollarCollection(dict):\r\n    \r\n    __FILE_MAGIC = b\"DLR\\x00\" # Me invente este numero m\u00e1gico...\r\n    __FILE_VERSION = b\"\\x00\\x01\"\r\n    __dollarStruct = struct.Struct(\"<id \")\r\n    \r\n    def values(self):\r\n        for dateId in sorted(self.keys()):\r\n            yield self[dateId]\r\n    \r\n    def items(self):\r\n        for dateId in self.keys():\r\n            yield (dateId, self[dateId])\r\n    \r\n    def __iter__(self):\r\n        for dateId in sorted(super().keys()):\r\n            yield dateId\r\n\r\n    def save(self, file):\r\n        fh = None\r\n        \r\n        try:\r\n           fh = gzip.open(file, \"wb\")\r\n           fh.write(self.__FILE_MAGIC)\r\n           fh.write(self.__FILE_VERSION)\r\n           for dollar in self.values():\r\n               data = bytearray()\r\n               data.extend(\r\n                           self.__dollarStruct.pack(\r\n                                dollar.date.toordinal(),\r\n                                dollar.value\r\n                           )\r\n                )\r\n               fh.write(data)\r\n        except (Exception) as err:\r\n            raise\r\n        finally:\r\n            if fh is not None:\r\n                fh.close()\r\n\r\n    def readBinary(self, file, verbose=False):\r\n        fh = None\r\n        try:\r\n            fh = gzip.open(file, \"rb\")\r\n            magic = fh.read(len(self.__FILE_MAGIC))\r\n            if magic != self.__FILE_MAGIC:\r\n                raise \"File doesn't look like a KodeGeek.com binary file!\"\r\n            version = fh.read(len(self.__FILE_VERSION))\r\n            if version > self.__FILE_VERSION:\r\n                raise \"Unsupported file version: {}, expected {}\".format(version, self.__FILE_VERSION)\r\n            self.clear()\r\n            while True:\r\n                data = fh.read(self.__dollarStruct.size)\r\n                if len(data) == 0:\r\n                    break\r\n                numbers = self.__dollarStruct.unpack(data)\r\n                #if verbose:\r\n                #    print(\"{}\".format(\",\".join([str(x) for x in numbers])))\r\n                dolar = DollarToday(\r\n                                    datetime.fromordinal(numbers[0]),\r\n                                    numbers[1]\r\n                                    )\r\n                self[dolar.date] = dolar\r\n        \r\n        except (Exception) as err:\r\n            raise\r\n        finally:\r\n            if (fh is not None):\r\n                fh.close()\r\n    def readCsv(self, file):\r\n        fh = None\r\n        tempMap = {}\r\n        try:\r\n            fh = open(file, \"r\", encoding=\"UTF-8\")\r\n            for line in fh.readlines():\r\n                # 6-23-2010       9.92\r\n                (date, value) = line.strip().split(\"\\t\")\r\n                if re.match(\"Fecha\", date):\r\n                    continue\r\n                try:\r\n                    dolVsBol = DollarToday(date, value)\r\n                    tempMap[dolVsBol.date] = dolVsBol\r\n                except (Exception) as err:\r\n                    print(err)\r\n            self.clear()\r\n            self.update(tempMap)\r\n        except (Exception) as err:\r\n            raise\r\n        finally:\r\n            if (fh is not None):\r\n                fh.close()\r\n\r\n    def __str__(self):\r\n        return \"Records={},\\n{}\".format(len(self), \",\\n\".join([str(date) for date in self.values()]))\r\n\r\ndef main(options):\r\n\r\n    verbose = options.verbose\r\n    dc = DollarCollection()\r\n    if options.report == None:\r\n        dc.readCsv(options.read)\r\n        dc.save(options.write)\r\n    else:\r\n        dc.readBinary(options.report, verbose)\r\n    if verbose:\r\n        print(\"{}\".format(dc))\r\n\r\nif __name__ == \"__main__\":\r\n    \r\n    usagetext = \"\"\"\r\n%prog --read csv.file --write binary.file\r\n\r\nOr:\r\n\r\n%prog --report binary.file\r\n\r\n\"\"\"\r\n\r\n    op = OptionParser(usage=usagetext)\r\n    op.add_option(\r\n                  \"-r\", \"--read\",\r\n                  action=\"store\", \r\n                  dest=\"read\", \r\n                  help=\"Ruta completa del archivo CSV fuente, cada linea tiene una clave=valor\")\r\n    op.add_option(\r\n                  \"-w\", \"--write\", \r\n                  action=\"store\", \r\n                  dest=\"write\", \r\n                  help=\"Ruta completa para guardar el archivo en nuevo formato binario\")\r\n    op.add_option(\r\n                  \"-p\", \"--report\", \r\n                  action=\"store\", \r\n                  dest=\"report\",\r\n                  help=\"Lee el archivo binario en memoria. No es compatible con --read y --write\")\r\n    op.add_option(\r\n                  \"-v\", \"--verbose\", \r\n                  action=\"store_true\", \r\n                  default=False, \r\n                  dest=\"verbose\", \r\n                  help=\"Activar impresi\u00f3n de valores por pantalla\")\r\n    \r\n    (options, values) = op.parse_args()\r\n\r\n    main(options)\r\n<\/id><\/pre>\n<p>Un ejemplo de como correrlo:<\/p>\n<pre lang=\"bash\">\r\nDolarToday.py --read \/Users\/josevnz\/Documents\/dolartoday.csv --write \/Users\/josevnz\/Documents\/dolartoday.jose --verbose\r\nDolarToday.py --report \/Users\/josevnz\/Documents\/dolartoday.jose --verbose\r\n<\/pre>\n<p>Y la salida luce como esto:<\/p>\n<pre lang=\"bash\">\r\nRecords=1917,\r\nDollarToday[date=2010-06-23 00:00:00, value=9.92],\r\nDollarToday[date=2010-06-25 00:00:00, value=8.05],\r\nDollarToday[date=2010-06-26 00:00:00, value=8.05],\r\nDollarToday[date=2010-06-27 00:00:00, value=7.91],\r\nDollarToday[date=2010-06-28 00:00:00, value=7.91],\r\nDollarToday[date=2010-06-29 00:00:00, value=7.92],\r\nDollarToday[date=2010-06-30 00:00:00, value=7.97],\r\nDollarToday[date=2010-07-01 00:00:00, value=7.97],\r\nDollarToday[date=2010-07-02 00:00:00, value=7.98],\r\n<\/pre>\n<p>Si tan s\u00f3lo el gobierno de Nicol\u00e1s Maduro fuera tan transparente como mis programas \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>En Venezuela una de las pocas formas de saber la paridad entre el dolar &#8220;paralelo&#8221; y el Bolivar fuerte es utilizando el portal &#8220;DolarToday&#8221;. El sitio web (https:\/\/dolartoday.com\/historico-dolar\/) ofrece datos que van desde el 2010 hasta el presente, en los cuales puede ver la paridad entre las dos monedas. En un arranque de ociosidad, decid\u00ed <a class=\"read-more\" href=\"http:\/\/kodegeek.com\/blog\/2016\/02\/29\/usando-formatos-de-archivo-a-la-medida-en-python-3\/\">[&hellip;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[438,239,14],"tags":[791,790,765,789],"_links":{"self":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/3740"}],"collection":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/comments?post=3740"}],"version-history":[{"count":4,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/3740\/revisions"}],"predecessor-version":[{"id":3744,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/3740\/revisions\/3744"}],"wp:attachment":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/media?parent=3740"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/categories?post=3740"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/tags?post=3740"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}