Google probem: Gutierrez, Thomas D. Gutierrez, Tom Gutierrez, T. Gutierrez, T.D. Gutierrez, T. Dominic Gutierrez, Thomas Dominic Gutierrez, Tom Dominic Gutierrez<

Google Probem Writeup

Physics Today Ad for a Job at Google ("the Google FAX problem")

August 27, 2004; posted here 110805

Solution by T.D. Gutierrez.

I never did apply for this job, but had fun doing the problem over a weekend back in August of 2004. Ironically, I solved the problem in a few hours but spent the rest of the time hunting down a bug which turned out to be spelling error in my word-to-number lookup table! Ouch. Silly, silly me. Anyway, I'm only now (Nov 2005) getting around to posting my solution.

There are certainly better solutions out there. I like my solution (I keep calling it "my" solution, but it is clearly just "a" solution) for a few reasons discussed below. If it doesn't make sense to you then I'm sure there are other variations out there that do. Just Google it (which is probably how you found this page in the first place)!

For example, a really great writup by Patrick Mock (that independently used a similar method as me, but doesn't involve lots of personal ranting and asides like I'm doing now) can be found here. Also, some nice clean code by Samuel Santiago (is it python?? Sorry, I'm still living in the 90's, man) outlining and solving the problem can be found here.

Helpful code

A Mathematica notebook to do the calculation is provided below. Details of how to use it are poorly noted in the notebook as comments. I used Mathematica, but in no way do I believe this was the best code forum for this project. It was the first thing I grabbed and I just ran with it.

Be warned of a couple things. There are spoilers in the comments of the notebook. My answer is at the bottom of this page and in the header of the notebook.
Also, while I think my core solution is "reasonably" elegant, my code is not. The notebook is not well written and makes the problem look more complex than it really is. The notebook doesn't really explain how to use itself (when you see the damn thing, you'll see why I am a physicist and could never be a programmer at Google!), but if you know Mathematica, you can probably muddle your way through it and get it to work. There are some fun plots of some of the functions at the bottom of the html version. But the html conversion truncated at a fixed tab, so a bunch of information is missing. Crap. The text output version seems to show promise initially, but then starts adding lots of spacing commands, making the output unreadable. Good luck trying to sort it all out. Sorry about the mess. Look for my summary at the bottom of the page.

Note on number standards: I adopted the standard that "and" would not be used in the name of the number. For example, 1234 is "one thousand thirty four" not "one thousand and thirty four". I don't use "hundred" to describe numbers above 900 (however, I did try it for fun at one point). Nor did I count hyphens. For example, 1450 and 4576 are "one thousand four hundred fifty" (27 letters) and "four thousand five hundred seventy six" (33 letters) not "fourteen hundred fifty" (20 letters) and not "forty five hundred seventy six" (26 letters)

[googleFinA.nb][googleFinA.txt][googleFinA.html]

Given:

E(x)=number of letters in the NAME of the number, x, using American English

f(x)=3 E(x)^3-x

Find:

The four digit value of x given f(f(f(x)))=60097

The (four digit?) value of x given f(f(x))=1

Solution outline:

Create a database/lookup that converts a number to its name in American English (e.g. 5="five")

Expand the database/lookup so it counts the number of letters in the name: i.e. have a means to generate E(x) (e.g. E(5)=4)

With f(x)=3E(x)^3-x=a where a is a given constant, create a hypothesis value of x, xh, by selecting E (e.g. as an iterater in a loop): xh=3E(xh)^3-a -> xh=3E^3-a.

For the hypothesis, xh, to be the "real x" that solves f(x)=a, the statement E(xh)==a must be true. As you iterate over E, when you find this value of x, then you have found the inverse of f(x)=a.

Apply this method again on the output of a previous iteration for nested versions of f(x). For example, let f(f(x))=a. First find y such that f(y)=a. However, y=f(x) so then let a=y and repeat.

Sounds pretty convoluted, but let's do an example:

Let's say you are given f(x)=71 and you want to find x. In the case a=71. The test value of xh=3E^3-71.

Now, iterate over values of E and calculate xh. Then, with a given value of xh, calculate E(xh) and compare to a.

E=1; xh=-68; E(-68)=something nasty (x will probably be greater than zero anyway) is E(-68)=71? No.

E=2; xh=-47; E(-47)=same as above; is E(-47)=71? No.

E=3; xh=10; E(10)=71; is E(10)=71? Yes! We have inverted f(x)=71 and got x=10.

The advantage to this method is you cover a lot of ground quickly by iterating over E and using the constraint of a rather than doing a brute force iteration over x.

The brute force method, even in this simple case where x was only 10, would require you to do 10 calculations and a logical comparison (to a). Iterating over E only required 3 interations and a logical comparison.

The reason this method is generally faster is that the number of letters in a number grows very slowly with the value of the number. Even really giant numbers might only have 300 letters in it (more on this below). This, of course, assumes there is actually a name for the number you are interested in.

Here is another example:

If you were given f(x)=-997000, a=-997000 so the test xh=3E^3-(-997000) or just xh=3E^3+997000.

E=1; xh=997003; is E(997003)=-997000? No.

E=2; xh=997024; is E(997024)=-997000? No.

...

E=10; xh=1000000; is E(1000000)=-997000?. Yes! So the inverse of f(x)=-997000 is x=1000000. After only 10 iterations, you have found an answer. There could be degeneracy (multiple x answers for a given value of a). But this method would quick find them. If you were willing to iterate up to E=300 you would cover all answers for x less than about 10^27 (assuming your database had a number-to-name lookup table that went that high).

Issues and tricks:

Be sure and spell all the numbers' names correctly!

If you are going to construct the database of number names from scratch, you have to select some minimal set of words to describe a number (not hard). You will naturally have to pick some maximum number your algorithm can process. This is fairly straightforward. Above a certain number, the numbers stop having coherent names, so this isn't necessarily an infinite process (unless you decide "a trillion trillion trillion" is a reasonable "American English name for 10^36).

There are some degeneracy issues in language that could make a difference in the solutions depending on what numbers you were using. For example, is 1400 "fourteen hundred" or "one thousand four hundred"? E(x) is obviously different for each of these and will lead to different answers when solving for x. Are negative numbers required? If you want, you can try and include them.

One could, of course, just brute force the solution by simply testing f(x)==a (or f(f(f(x)))==c or whatever) and iterating over x, evaluating f(x) each time. For numbers less than x=10k, this is actually fine. Since you are given that the solutions are four digits, a brute force effort pays off. However, mine iterates over the NUMBER of letters parameter, E. This quirky function scales very, very crudely as the log of the number (of some base). So, for example, by iterating E from 0 to 200, you pretty much cover all numbers from zero to, well, pretty big. Take the number 777,777,777,777. This is both a pretty big number and it has the most letters possible between zero and numbers less than one trillion. But there are only 121 (or so) letters. If you are sick, you can add a minus sign and call it "negative seven hundred seventy seven billion ..." and pick up another 8 letters.

To help really set the scale with the size of the names and the size of the number (as discussed above), the number with the longest name in English is tied between:
777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777, 777
and
888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888, 888
and the rest of the 2^65-2 permutations of 7s and 8s. Seven and eight have the most letters for numbers less than ten. The second number above is same number with sevens replaced by eights. The name for the first number is

Seven hundred and seventy seven vigintillion seven hundred and seventy seven novemdecillion seven hundred and seventy seven ocodecillion seven hundred and seventy seven septendecillion seven hundred and seventy seven sexdecillion seven hundred and seventy seven quindecillion seven hundred and seventy seven quattrodecillion seven hundred and seventy seven tredecillion seven hundred and seventy seven duodecillion seven hundred and seventy seven undecillion seven hundred and seventy seven decillion seven hundred and seventy seven nonillion seven hundred and seventy seven octillion seven hundred and seventy seven septillion seven hundred and seventy seven sextillion seven hundred and seventy seven quintillion seven hundred and seventy seven quadrillion seven hundred and seventy seven trillion seven hundred and seventy seven billion seven hundred and seventy seven million seven hundred and seventy seven thousand seven hundred and seventy seven

According to my count this has 822 characters. Compare this to the 10^65 order of magnatude! By iterating E over less than a thousand numbers, you cover all the numbers in the English language that have writable names. Of course, there are numbers like google (10^100), centillion (10^303) and googleplex (10^google). But since there aren't names for 10^3 steps down from these, you can't easily give a name to a number larger than or equal to 1000 vigintillion (you could, naturally, start saying "one thousand vigintillion", "one quadrillion vigintllion" etc., but that convention isn't used elsewhere -- even if you did, the scaling rule-of-thumb I describe would just continue). You could include negative numbers (with the word "negative" or "minus" out front), but this would just be a perturbation. You might try getting fancy and introducing complex numbers too (1+2i="one plus two i"), but this will give you only roughly a factor of two in name size

An even better way would to be only iterate E over allowed values and not waste your time with values of E that don't exist. For example, E=1. No number has a name with one letter. The smallest is E=3. But as you get to bigger numbers, it is hard to say what is allowed versus not.

My Answers:

The four digit value of x given f(f(f(x)))=60097; x=2465; writeup coming soon

The four digit (tacit from first part) value of x given f(f(x))=1; x=6808 (there are other solutions that are not 4-digits)

First find f(y)=1 for y
For numbers still in my database, there are three values of E that accomplish this. Notice you only have to iterate up to E=47 to get actual numbers as high as 300k.

E=27, y=59048 (i.e. "fifty nine thousand forty eight" has 27 letters in it), and 3*(27)^3-59048=1

E=41, y=206762 (i.e. "two hundred six thousdand seven hundred sixty two" has 41 letters in it), and 3*(41)^3-206762=1

E=47, y=311468 (i.e. "three hundred eleven thousand four hundred sixty eight" has 47 letters in it), and 3*(47)^3-311468=1
Keep on slogging. Now go through each of the above values of y and solve f(x)=y for x. It turns out in this case that y=206762 and y=311468 give numbers outside my database, if the inverse,x, exists at all for those numbers. There are two values of E for y=59048 that returned answers:

E=28, x=6808 (i.e. "six thousand eight hundred eight" has 28 letters), 3*(28)^3-6808=59048

E=53, x=387583 (i.e. three hundred eighty seven thousand five hundred eighty three" has 53 letters), 3*(53)^3-387583=59048
This means f(f(6808))=1 and f(f(387583))=1. Since you are told in the statement of the problem that the answer is 4 digits, the answer is 6808.
For the nested problem f(f(x))=1, using my method, since you are only going on to do new iterations on the values you find solutions for, there are only a total of 27+28=55 iterations before you bumped into a 4 digit solution. The brute force method (i.e. just systematically test f(f(x))=1 for all values of x) would have required 6808. If you were told it would be a 6 digit solution, my method would have required 47+53=100 iterations while the brute force way would have required 387583. For larger and larger number of digits in the final answer, my method wins almost exponentially.