Siri: Apparently no local learning

Siri does not appear to be learning anything at all about what I typically ask "her" to do. Although one of my most frequent "targets" for text messages is my wife, today, when I asked Siri to "Send a text to Elen Callahan," it took three attempts before "she" got what I was saying correctly. I also frequently send messages to myself, for instance, if I am driving and get an idea, I email it to myself. But every few attempts, Siri tells me "I don't see a Jean Callahan in your contacts"! And if I have no Internet connection, Siri does not function at all. I am baffled why no aspect of my previous interactions with the software seems to be cached locally.

So what are we to make of Apple's claim here:

"Apple claims that the software adapts to the user's individual preferences over time and personalizes results, and performing tasks such as finding recommendations for nearby restaurants, or getting directions."

The Wikipedia article on Siri (linked above) has no information about the architecture of the software. Any insights?


  1. The entire speech to text conversion operates on their servers so there wouldn't be much to be done without an internet connection. Since it stores two years of past usage, also there presumably, there wouldn't be much to be done locally, especially when the results would require a search or information transfer (upload, download, connect, dial, email, text). It probably doesn't queue requests for later since there is such a high probability of misunderstanding, full interactivity is necessary.

    1. "The entire speech to text conversion operates on their servers so there wouldn't be much to be done without an internet connection."

      Yes, I realize it does. That is exactly my complaint. Why not build a two-tier architecture?

      "It probably doesn't queue requests for later..."

      Huh? Who asked for "requests queued for later"?

    2. There is no need to cache much locally if the account on the device can be used to identify the stored material on the server. You can just look it up by id. Lord's point is that this is probably the right way to do it, as otherwise there is too much data to send and send and send and send. So local cache loses out to central cache. Air time is pricey.

      The process later bit is this I expect. The software can only learn interactively, by getting you to respond. So there is nothing that can be queued up for later to train the software with. You need person and server together. A local cache to help with training would not be useful if a connection to the mother ship is required anyway. Once we have the connection use the server cache.

    3. Two tier buys you next to nothing if everything relies on an active internet connection which all those applications quoted do. You could do dictation like Dragon (which also relies on servers) or aural accessibiity or aural listening like Talkler, or queuing. That would be the universe of possibilities for a two tier architecture. Maybe in the future they will predict what you will ask and download it before asking while you have a connection but that is a bit much to ask for.

    4. Lord:

      1) I actually use Siri so I know what I would like it to be able to do when it can't connect; and

      2) I've built multi-tier realtime trading apps with layers of caching, so I have some idea of how some local intelligence could help do those things.

      One important thing to note: it is OFTEN the case that one has a network connection but can't get Siri to work: I don't know if the Siri servers are overloaded or what, but this happens a LOT.