{"id":3763,"date":"2016-03-14T06:00:31","date_gmt":"2016-03-14T10:00:31","guid":{"rendered":"http:\/\/kodegeek.com\/blog\/?p=3763"},"modified":"2016-03-15T04:22:02","modified_gmt":"2016-03-15T08:22:02","slug":"usando-data-pipes-y-coroutines-en-python","status":"publish","type":"post","link":"http:\/\/kodegeek.com\/blog\/2016\/03\/14\/usando-data-pipes-y-coroutines-en-python\/","title":{"rendered":"Usando &#8216;data pipes y coroutines&#8217; en Python"},"content":{"rendered":"<p>El concepto es similar a usar &#8216;pipes&#8217; en UNIX. Por ejemplo, en UNIX podemos combinar varias herramientas para filtrar los contenidos en un archivo de texto.  \u00bfQue hace el siguiente comando?<\/p>\n<pre lang=\"bash\">\r\n# Contar cuantos 'root' hay en el archivo de passwords del servidor Linux\r\ncat \/etc\/password| egrep -i root\r\n<\/pre>\n<p>En Python 3 podemos hacer algo como lo siguiente:<\/p>\n<pre lang=\"python\">\r\n#!\/usr\/bin\/env python3\r\n# pipeline.py \/etc\/passwd root\r\nimport sys, re\r\n\r\ndef grep(expression):\r\n    while True:\r\n        text = (yield) # Espere por el texto enviados por la otra co-rutina\r\n        if re.search(expression, text):\r\n            print(text)\r\n\r\ndef cat(file, target):\r\n    next(target) # Initialize el otro lado de la tuberia\r\n    with open(file, \"r\") as fh:\r\n        for line in fh.readlines():\r\n            target.send(line.strip()) # envie la linea a la co-rutina\r\n\r\ndef main(args):\r\n    cat(args[0], grep(args[1])) # La 'tuberia' se hace de afuera hacia adentro\r\n\r\nif __name__ == \"__main__\":\r\n    main(sys.argv[1:])\r\n<\/pre>\n<p>Otro ejemplo, vamos a modificar el programa que lee los datos del archivo de datos de DolarToday para filtrar por fecha de inicio, final, valores m\u00e1ximos y m\u00ednimos. Note como el uso de los filtros es opcional y si no se aplican entonces retornan todos los datos (para mantener el ejemplo sencillo, remov\u00ed el c\u00f3digo que convierte los datos al formato binario):<\/p>\n<pre lang=\"python\">\r\n#!\/usr\/bin\/env python3\r\n# Simple program to save the 'Dolartoday' extra official dollar rates from CSV to a custom format\r\n# Revisited to use data pipes, couroutines\r\n# @author Jose Vicente Nunez, josevnz@kodegeek.com\r\nimport os, sys, datetime, re, gzip, struct\r\nfrom optparse import OptionParser, OptionValueError\r\nfrom datetime import datetime\r\n\r\nclass DollarToday:\r\n    \r\n    __slot__ = (\"__date\", \"__value\")\r\n    \r\n    def __init__(self, ddate, value):\r\n        if isinstance(ddate, datetime):\r\n            self.__date = ddate\r\n        else:\r\n            self.__date = datetime.strptime(ddate, \"%m-%d-%Y\")\r\n        self.__value = float(value)\r\n        assert self.__value > 0.0, \"{} for date {} is invalid!\".format(value, self.__date)\r\n        \r\n    @property\r\n    def date(self):\r\n        return self.__date\r\n    \r\n    @date.setter\r\n    def date(self, date):\r\n        assert isinstance(date, datetime), \"Invalid date {}\".format(date)\r\n        self.__date = date\r\n    \r\n    @property\r\n    def value(self):\r\n        return self.__value\r\n    \r\n    def __str__(self):\r\n        return \"DollarToday[date={}, value={}]\".format(self.__date, self.__value)\r\n    \r\n    def __hash__(self):\r\n        return str(id(self))\r\n            \r\nclass DollarCollection(dict):\r\n    \r\n    def values(self):\r\n        for dateId in sorted(self.keys()):\r\n            yield self[dateId]\r\n    \r\n    def items(self):\r\n        for dateId in self.keys():\r\n            yield (dateId, self[dateId])\r\n    \r\n    def __iter__(self):\r\n        for dateId in sorted(super().keys()):\r\n            yield dateId\r\n\r\n    def __str__(self):\r\n        return \"Records={},\\n{}\".format(len(self), \",\\n\".join([str(date) for date in self.values()]))\r\n\r\n# Pipe: Read from binary file\r\ndef readBinary(file, target):\r\n    next(target)\r\n    FILE_MAGIC = b\"DLR\\x00\"\r\n    FILE_VERSION = b\"\\x00\\x01\"\r\n    dollarStruct = struct.Struct(\"<id \")\r\n    try:\r\n       with gzip.open(file, \"rb\") as fh:\r\n        magic = fh.read(len(FILE_MAGIC))\r\n        if magic != FILE_MAGIC:\r\n          raise \"File doesn't look like a KodeGeek.com binary file!\"\r\n        version = fh.read(len(FILE_VERSION))\r\n        if version > FILE_VERSION:\r\n          raise \"Unsupported file version: {}, expected {}\".format(version, FILE_VERSION)\r\n        while True:\r\n          data = fh.read(dollarStruct.size)\r\n          if len(data) == 0:\r\n           break\r\n          numbers = dollarStruct.unpack(data)\r\n          target.send(DollarToday(datetime.fromordinal(numbers[0]), numbers[1]))\r\n    except (Exception) as err:\r\n            raise\r\n\r\n# Pipe: Filter elements >= minimum\r\ndef min(min, target):\r\n    next(target)\r\n    while True:\r\n        dollar = (yield)\r\n        if dollar.value >= min:\r\n            target.send(dollar)\r\n\r\n# Pipe: Filter elements < = max\r\ndef max(max, target):\r\n    next(target)\r\n    while True:\r\n        dollar = (yield)\r\n        if dollar.value <= max:\r\n            target.send(dollar)\r\n\r\n# Pipe: filter from date\r\ndef from_date(fromd, target):\r\n        next(target)\r\n        while True:\r\n            dollar = (yield)\r\n            if dollar.date >= fromd:\r\n                target.send(dollar)\r\n\r\n# Pipe: filter from date\r\ndef to_date(tod, target):\r\n        next(target)\r\n        while True:\r\n            dollar = (yield)\r\n            if dollar.date < = tod:\r\n                target.send(dollar)\r\n\r\ndef adder(collection):\r\n    while True:\r\n        dollar = (yield)\r\n        collection[dollar.date] = dollar\r\n\r\ndef main(options):\r\n\r\n    dc = DollarCollection()\r\n    \r\n    pipeline = adder(dc) # Destination of the pipe\r\n    # Add pipes (if needed)\r\n    if options.min != 99999999:\r\n        pipeline = min(options.min, pipeline)\r\n    \r\n    if options.max != -1:\r\n        pipeline = max(options.min, pipeline)\r\n\r\n    if options.fromd is not None:\r\n        pipeline = from_date(datetime.strptime(options.fromd, \"%Y-%m-%d\"), pipeline)\r\n        \r\n    if options.tod is not None:\r\n        pipeline = to_date(datetime.strptime(options.tod, \"%Y-%m-%d\"), pipeline)\r\n\r\n    readBinary(options.report, pipeline) # Start of the pipe that reads the file contents\r\n \r\n    print(\"{}\".format(dc))\r\n\r\nif __name__ == \"__main__\":\r\n    \r\n    usagetext = \"\"\"\r\n%prog --report binary.file [--from_date YYYYMMDD] [--to YYYYMMDD] [--min amount] [--max amount]\r\n\"\"\"\r\n\r\n    op = OptionParser(usage=usagetext)\r\n    op.add_option(\r\n                  \"-p\", \"--report\",\r\n                  action=\"store\",\r\n                  dest=\"report\",\r\n                  help=\"Read the contents from the binary storage and generate a report.\")\r\n    op.add_option(\r\n                  \"-f\", \"--from\",\r\n                  action=\"store\",\r\n                  dest=\"fromd\",\r\n                  help=\"Optional filter. Start date yyyy-mm-dd\")\r\n    op.add_option(\r\n                  \"-t\", \"--to\",\r\n                  action=\"store\",\r\n                  dest=\"tod\",\r\n                  help=\"Optional filter. End date yyyy-mm-dd\")\r\n    op.add_option(\r\n                  \"-m\", \"--min\",\r\n                  action=\"store\",\r\n                  dest=\"min\",\r\n                  default=99999999,\r\n                  type=\"float\",\r\n                  help=\"Optional filter. Minimal amount\")\r\n    op.add_option(\r\n                  \"-M\", \"--max\",\r\n                  action=\"store\",\r\n                  dest=\"max\",\r\n                  default=-1,\r\n                  type=\"float\",\r\n                  help=\"Optional filter. Maximum amount\")\r\n    \r\n    (options, values) = op.parse_args()\r\n    main(options)\r\n\r\n<\/pre>\n<p> <\/id><\/p>\n<p>Una salida de ejemplo:<\/p>\n<pre lang=\"bash\">\r\nDolarTodayReader.py --report \/Users\/josevnz\/Documents\/dolartoday.jose --min 1050 --from 2016-02-23 --to 2016-02-24\r\nRecords=2,\r\nDollarToday[date=2016-02-23 00:00:00, value=1060.0],\r\nDollarToday[date=2016-02-24 00:00:00, value=1071.19]\r\n<\/pre>\n<p>Para cerrar, les recomiendo el siguiente tutorial: <a href=\"http:\/\/www.dabeaz.com\/coroutines\/index.html\" target=\"_blank\">http:\/\/www.dabeaz.com\/coroutines\/index.html<\/a>. La sintaxis es de Python 2.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>El concepto es similar a usar &#8216;pipes&#8217; en UNIX. Por ejemplo, en UNIX podemos combinar varias herramientas para filtrar los contenidos en un archivo de texto. \u00bfQue hace el siguiente comando? # Contar cuantos &#8216;root&#8217; hay en el archivo de passwords del servidor Linux cat \/etc\/password| egrep -i root En Python 3 podemos hacer algo <a class=\"read-more\" href=\"http:\/\/kodegeek.com\/blog\/2016\/03\/14\/usando-data-pipes-y-coroutines-en-python\/\">[&hellip;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[194,10,438,239],"tags":[791,801,765],"_links":{"self":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/3763"}],"collection":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/comments?post=3763"}],"version-history":[{"count":9,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/3763\/revisions"}],"predecessor-version":[{"id":3773,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/3763\/revisions\/3773"}],"wp:attachment":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/media?parent=3763"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/categories?post=3763"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/tags?post=3763"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}