{"id":4631,"date":"2016-10-30T21:22:01","date_gmt":"2016-10-31T01:22:01","guid":{"rendered":"http:\/\/kodegeek.com\/blog\/?p=4631"},"modified":"2016-10-30T21:22:01","modified_gmt":"2016-10-31T01:22:01","slug":"mis-primeros-pasos-con-mongodb","status":"publish","type":"post","link":"http:\/\/kodegeek.com\/blog\/2016\/10\/30\/mis-primeros-pasos-con-mongodb\/","title":{"rendered":"Mis primeros pasos con MongoDB"},"content":{"rendered":"<p>Para ser honesto yo ya hab\u00eda instalado MongoDB hace a\u00f1os, pero no es sino hasta ahora que me ha tocado un proyecto en el cual creo que le puedo sacar mucho provecho. En vista de esto, me sent\u00e9 a jugar un poco con la herramienta.<\/p>\n<p>La Internet cuenta con muchos lugares de donde podemos bajar juegos de datos, en este caso <a href=\"https:\/\/data.cityofnewyork.us\/Health\/Most-Popular-Baby-Names-by-Sex-and-Mother-s-Ethnic\/25th-nujf\" target=\"_blank\">me decid\u00ed bajarme la lista de los nombres de bebes m\u00e1s populares en el estado de Nueva York entre los a\u00f1os 2011 y 2014<\/a>, en formato <a href=\"https:\/\/data.cityofnewyork.us\/api\/views\/25th-nujf\/rows.json?accessType=DOWNLOAD\" target=\"_blank\">JSON<\/a>.<\/p>\n<pre lang=\"Javascript\">\r\n{\r\n  \"meta\" : {\r\n    \"view\" : {\r\n      \"id\" : \"25th-nujf\",\r\n      \"name\" : \"Most Popular Baby Names by Sex and Mother's Ethnic Group, New York City\",\r\n      \"attribution\" : \"Department of Health and Mental Hygiene (DOHMH)\",\r\n      \"averageRating\" : 0,\r\n      \"category\" : \"Health\",\r\n      \"createdAt\" : 1382724894,\r\n      \"description\" : \"The most popular baby names by sex and mother's ethnicity in New York City.\",\r\n      \"displayType\" : \"table\",\r\n      \"downloadCount\" : 4328,\r\n      \"hideFromCatalog\" : false,\r\n      \"hideFromDataJson\" : false,\r\n      \"indexUpdatedAt\" : 1465427458,\r\n      \"newBackend\" : false,\r\n...\r\n  },\r\n  \"data\" : [ [ 1, \"EB6FAA1B-EE35-4D55-B07B-8E663565CCDF\", 1, 1386853125, \"399231\", 1386853125, \"399231\", \"{\\n}\", \"2011\", \"FEMALE\", \"HISPANIC\", \"GERALDINE\", \"13\", \"75\" ]\r\n, [ 2, \"2DBBA431-D26F-40A1-9375-AF7C16FF2987\", 2, 1386853125, \"399231\", 1386853125, \"399231\", \"{\\n}\", \"2011\", \"FEMALE\", \"HISPANIC\", \"GIA\", \"21\", \"67\" ]\r\n, [ 3, \"54318692-0577-4B21-80C8-9CAEFCEDA8BA\", 3, 1386853125, \"399231\", 1386853125, \"399231\", \"{\\n}\", \"2011\", \"FEMALE\", \"HISPANIC\", \"GIANNA\", \"49\", \"42\" ]\r\n, [ 4, \"17C1236A-5778-412D-8DC9-94EBC01BB9A1\", 4, 1386853125, \"399231\", 1386853125, \"399231\", \"{\\n}\", \"2011\", \"FEMALE\", \"HISPANIC\", \"GISELLE\", \"38\", \"51\" ]\r\n, [ 5, \"F53CF696-A3F4-4EC3-8DFD-10C0A111B2D8\", 5, 1386853125, \"399231\", 1386853125, \"399231\", \"{\\n}\", \"2011\", \"FEMALE\", \"HISPANIC\", \"GRACE\", \"36\", \"53\" ]\r\n, [ 6, \"6615893F-39B8-440C-98D3-5A37CCF1C44B\", 6, 1386853125, \"399231\", 1386853125, \"399231\", \"{\\n}\", \"2011\", \"FEMALE\", \"HISPANIC\", \"GUADALUPE\", \"26\", \"62\" ]\r\n, [ 7, \"CC9BE461-34B8-4BD7-BEF2-BDB23CA1ADC6\", 7, 1386853125, \"399231\", 1386853125, \"399231\", \"{\\n}\", \"2011\", \"FEMALE\", \"HISPANIC\", \"HAILEY\", \"126\", \"8\" ]\r\n, [ 8, \"4EA2FFD4-1B1D-4859-A5C2-045949E3FD36\", 8, 1386853125, \"399231\", 1386853125, \"399231\", \"{\\n}\", \"2011\", \"FEMALE\", \"HISPANIC\", \"HALEY\", \"14\", \"74\" ]\r\n]}\r\n<\/pre>\n<p>Antes de importar el archivo en Mongo, hay que darle un masaje a los datos, para ignorar la metadata y para separar cada entrada en el archivo para que luzca como un documento individual, sobre el cual podremos hacer agregaci\u00f3n.<\/p>\n<p>Primero les muestro como arrancar MongoDB, importar los datos (no muestro como bajarse el archivo pero es trivial) y luego corremos unos cuantos consultas:<\/p>\n<pre lang=\"Bash\">\r\n#!\/bin\/bash\r\nexport PATH=$PATH:\/Users\/josevnz\/mongo\/mongodb-osx-x86_64-3.2.10\/bin\r\nmongod --fork --logpath \/Users\/josevnz\/mongo\/mongod.log --noauth --rest --pidfilepath \/Users\/josevnz\/mongo\/mongod.pid --dbpath \/Users\/josevnz\/mongo\/data\r\n<\/pre>\n<p>El programa para masajear los datos, en Python:<\/p>\n<pre lang=\"Python\">\r\n#!\/usr\/bin\/env python\r\n# josevn at kodegeek.com\r\nimport json\r\nimport sys\r\nfrom pprint import pprint\r\nif len(sys.argv) < 2:\r\n  raise ValueError(\"Missing JSON file...\")\r\n\r\n'''\r\n\/\/ From this (array of arrays):\r\n{\r\n    \"data\": [\r\n        [\r\n            1,\r\n            \"EB6FAA1B-EE35-4D55-B07B-8E663565CCDF\",\r\n            1,\r\n            1386853125,\r\n            \"399231\",\r\n            1386853125,\r\n            \"399231\",\r\n            \"{\\n}\",\r\n            \"2011\",\r\n            \"FEMALE\",\r\n            \"HISPANIC\",\r\n            \"GERALDINE\",\r\n            \"13\",\r\n            \"75\"\r\n        ],\r\n        [\r\n          ...\r\n        ]\r\n    ],\r\n    \"metadata\" : {\r\n...\r\n    }\r\n}\r\n \/\/ To this individual maps, no metadata and no extra attributes:\r\n      {\r\n       \"key\": 1,\r\n       \"id\": \"EB6FAA1B-EE35-4D55-B07B-8E663565CCDF\",\r\n       \"year\": \"2011\",\r\n       \"gender\": \"FEMALE\",\r\n       \"ethnicity\": \"HISPANIC\",\r\n       \"name\": \"GERALDINE\",\r\n       \"cnt\": 13\r\n      },\r\n      {\r\n       ...\r\n      }\r\n'''\r\nwith open(sys.argv[1]) as data_file:\r\n  data = json.load(data_file)\r\n  for item in data[\"data\"]:\r\n    newitem = {}\r\n    newitem['key'] = item[0]\r\n    newitem['id'] = item[1]\r\n    newitem['year'] = int(item[8])\r\n    newitem['gender'] = item[9]\r\n    newitem['ethnicity'] = item[10]\r\n    newitem['name'] = item[11]\r\n    newitem['cnt'] = int(item[12])\r\n    print(json.dumps(newitem))\r\n<\/pre>\n<p>Luego importamos los datos y nos conectamos listos para correr un par de consultas:<\/p>\n<pre lang=\"Bash\">\r\n.\/jsonMap.py ~\/Downloads\/NY.babynames.json > ~\/Downloads\/NY.babynames.pretty.json\r\nmongoimport --db ny --collection babynames --file ~\/Downloads\/NY.babynames.pretty.json --upsert\r\nmongo localhost:27017\/ny\r\n<\/pre>\n<p>Me encanta que no tuve que crear una base de datos, o una tabla en SQL :-). El documento se describe s\u00f3lo y ahora estoy listo para hacer consultas:<\/p>\n<pre lang=\"Javascript\">\r\n\/\/ \u00bfCuantos bebes hispanos entre el 2011 y el 2014?\r\n> db.babynames.find({ \"ethnicity\": \"HISPANIC\"}).count()\r\n4254\r\n\/\/ Muestre los total agrupando por raza, sexo y a\u00f1o de mayor  a menor\r\n> db.babynames.aggregate([ { $group: { _id: { ethnicity: \"$ethnicity\", gender: \"$gender\", year: \"$year\" }, total: { $sum: \"$cnt\"} } }, {$sort: {total:-1}} ])\r\n{ \"_id\" : { \"ethnicity\" : \"HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2011 }, \"total\" : 56236 }\r\n{ \"_id\" : { \"ethnicity\" : \"WHITE NON HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2011 }, \"total\" : 54392 }\r\n{ \"_id\" : { \"ethnicity\" : \"WHITE NON HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2011 }, \"total\" : 45120 }\r\n{ \"_id\" : { \"ethnicity\" : \"HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2011 }, \"total\" : 41532 }\r\n{ \"_id\" : { \"ethnicity\" : \"BLACK NON HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2011 }, \"total\" : 24540 }\r\n{ \"_id\" : { \"ethnicity\" : \"ASIAN AND PACIFIC ISLANDER\", \"gender\" : \"MALE\", \"year\" : 2011 }, \"total\" : 18580 }\r\n{ \"_id\" : { \"ethnicity\" : \"BLACK NON HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2011 }, \"total\" : 17624 }\r\n{ \"_id\" : { \"ethnicity\" : \"WHITE NON HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2014 }, \"total\" : 14831 }\r\n{ \"_id\" : { \"ethnicity\" : \"WHITE NON HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2013 }, \"total\" : 14537 }\r\n{ \"_id\" : { \"ethnicity\" : \"WHITE NON HISP\", \"gender\" : \"MALE\", \"year\" : 2012 }, \"total\" : 14273 }\r\n{ \"_id\" : { \"ethnicity\" : \"HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2012 }, \"total\" : 13809 }\r\n{ \"_id\" : { \"ethnicity\" : \"ASIAN AND PACIFIC ISLANDER\", \"gender\" : \"FEMALE\", \"year\" : 2011 }, \"total\" : 13672 }\r\n{ \"_id\" : { \"ethnicity\" : \"HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2013 }, \"total\" : 13312 }\r\n{ \"_id\" : { \"ethnicity\" : \"HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2014 }, \"total\" : 13126 }\r\n{ \"_id\" : { \"ethnicity\" : \"WHITE NON HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2014 }, \"total\" : 12884 }\r\n{ \"_id\" : { \"ethnicity\" : \"WHITE NON HISP\", \"gender\" : \"FEMALE\", \"year\" : 2012 }, \"total\" : 12402 }\r\n{ \"_id\" : { \"ethnicity\" : \"WHITE NON HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2013 }, \"total\" : 12303 }\r\n{ \"_id\" : { \"ethnicity\" : \"HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2013 }, \"total\" : 9755 }\r\n{ \"_id\" : { \"ethnicity\" : \"HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2012 }, \"total\" : 9738 }\r\n{ \"_id\" : { \"ethnicity\" : \"HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2014 }, \"total\" : 9729 }\r\nType \"it\" for more\r\n> it\r\n{ \"_id\" : { \"ethnicity\" : \"BLACK NON HISP\", \"gender\" : \"MALE\", \"year\" : 2012 }, \"total\" : 5965 }\r\n{ \"_id\" : { \"ethnicity\" : \"ASIAN AND PACI\", \"gender\" : \"MALE\", \"year\" : 2012 }, \"total\" : 5962 }\r\n{ \"_id\" : { \"ethnicity\" : \"BLACK NON HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2013 }, \"total\" : 5866 }\r\n{ \"_id\" : { \"ethnicity\" : \"BLACK NON HISPANIC\", \"gender\" : \"MALE\", \"year\" : 2014 }, \"total\" : 5702 }\r\n{ \"_id\" : { \"ethnicity\" : \"ASIAN AND PACIFIC ISLANDER\", \"gender\" : \"MALE\", \"year\" : 2014 }, \"total\" : 5636 }\r\n{ \"_id\" : { \"ethnicity\" : \"ASIAN AND PACIFIC ISLANDER\", \"gender\" : \"MALE\", \"year\" : 2013 }, \"total\" : 5281 }\r\n{ \"_id\" : { \"ethnicity\" : \"ASIAN AND PACI\", \"gender\" : \"FEMALE\", \"year\" : 2012 }, \"total\" : 4338 }\r\n{ \"_id\" : { \"ethnicity\" : \"BLACK NON HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2013 }, \"total\" : 4278 }\r\n{ \"_id\" : { \"ethnicity\" : \"BLACK NON HISPANIC\", \"gender\" : \"FEMALE\", \"year\" : 2014 }, \"total\" : 4255 }\r\n{ \"_id\" : { \"ethnicity\" : \"BLACK NON HISP\", \"gender\" : \"FEMALE\", \"year\" : 2012 }, \"total\" : 4243 }\r\n{ \"_id\" : { \"ethnicity\" : \"ASIAN AND PACIFIC ISLANDER\", \"gender\" : \"FEMALE\", \"year\" : 2014 }, \"total\" : 4198 }\r\n{ \"_id\" : { \"ethnicity\" : \"ASIAN AND PACIFIC ISLANDER\", \"gender\" : \"FEMALE\", \"year\" : 2013 }, \"total\" : 4012 }\r\n<\/pre>\n<p>Estoy emocionado con la herramienta, pienso usarla para ver los datos en mi trabajo de manera distinta.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Para ser honesto yo ya hab\u00eda instalado MongoDB hace a\u00f1os, pero no es sino hasta ahora que me ha tocado un proyecto en el cual creo que le puedo sacar mucho provecho. En vista de esto, me sent\u00e9 a jugar un poco con la herramienta. La Internet cuenta con muchos lugares de donde podemos bajar <a class=\"read-more\" href=\"http:\/\/kodegeek.com\/blog\/2016\/10\/30\/mis-primeros-pasos-con-mongodb\/\">[&hellip;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[194],"tags":[808,734,807,765],"_links":{"self":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/4631"}],"collection":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/comments?post=4631"}],"version-history":[{"count":5,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/4631\/revisions"}],"predecessor-version":[{"id":4636,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/posts\/4631\/revisions\/4636"}],"wp:attachment":[{"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/media?parent=4631"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/categories?post=4631"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/kodegeek.com\/blog\/wp-json\/wp\/v2\/tags?post=4631"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}