Estoy dándole vueltas a un asunto y no consigo resolverlo :(.
Tengo un texto en el cual obtengo esta string que son todas las palabra que hay y su sucesora.
Código:
Los $ me indican el principio y final de frase. Esta str la obtengo así:$ egg egg and and bacon bacon $ $ egg egg sausage sausage and and bacon bacon $ $ egg egg and and spam spam $ $ spam spam egg egg sausage sausage and and spam spam $ $ egg egg bacon bacon and and spam spam $ $ egg egg bacon bacon sausage sausage and and spam spam $ $ spam spam bacon bacon sausage sausage and and spam spam $ $ spam spam egg egg spam spam spam spam bacon bacon and and spam spam $ $ spam spam spam spam spam spam egg egg and and spam spam $ $ spam spam spam spam spam spam spam spam spam spam spam spam spam spam baked baked beans beans spam spam spam spam spam spam and and spam spam $ $ lobster lobster thermidor thermidor aux aux crevettes crevettes with with a a mornay mornay sauce sauce garnished garnished with with truffle truffle pate pate brandy brandy and and a a fried fried egg egg on on top top and and spam spam $
Código:
Hasta aquí todo bien, pero lo que quiero es un diccionario en el que cada entrada contenga una lista de palabras succesora y el numero que aparece cada palabra.for frase in ar.split(simbolos[0]): word = ["$"] + frase.split() + ["$"] for i in range(len(word)-1): print word[i], word[i+1]
De momento lo único que he conseguido de momente ha sido esto:
Código:
Que dice el numero de veces que aparece cada palabra[(u'and', 12), (u'bacon', 6), (u'lobster', 1), (u'sausage', 4), ('$', 11), (u'crevettes', 1), (u'spam', 27), (u'top', 1), (u'thermidor', 1), (u'aux', 1), (u'sauce', 1), (u'brandy', 1), (u'pate', 1), (u'truffle', 1), (u'with', 2), (u'garnished', 1), (u'a', 2), (u'on', 1), (u'baked', 1), (u'mornay', 1), (u'beans', 1), (u'fried', 1), (u'egg', 9)]
SOLUCIONADO
Aquí esta el resultado :D
Código:
[(u'and', [(9, u'spam'), (2, u'bacon'), (1, u'a')]), (u'bacon', [(2, u'sausage'), (2, u'and'), (2, u'$')]), (u'lobster', [(1, u'thermidor')]), (u'sausage', [(4, u'and')]), ('$', [(5, u'spam'), (5, u'egg'), (1, u'lobster')]), (u'crevettes', [(1, u'with')]), (u'spam', [(11, u'spam'), (9, u'$'), (3, u'egg'), (2, u'bacon'), (1, u'baked'), (1, u'and')]), (u'top', [(1, u'and')]), (u'thermidor', [(1, u'aux')]), (u'truffle', [(1, u'pate')]), (u'sauce', [(1, u'garnished')]), (u'brandy', [(1, u'and')]), (u'pate', [(1, u'brandy')]), (u'aux', [(1, u'crevettes')]), (u'with', [(1, u'truffle'), (1, u'a')]), (u'garnished', [(1, u'with')]), (u'a', [(1, u'mornay'), (1, u'fried')]), (u'on', [(1, u'top')]), (u'egg', [(3, u'and'), (2, u'sausage'), (2, u'bacon'), (1, u'spam'), (1, u'on')]), (u'mornay', [(1, u'sauce')]), (u'beans', [(1, u'spam')]), (u'fried', [(1, u'egg')]), (u'baked', [(1, u'beans')])]