Closed Bug 868530 Opened 11 years ago Closed 11 years ago

nsDOMIdentity cannot expect windowID to be unique identifier on FirefoxOS

Categories

(Core Graveyard :: Identity, defect)

22 Branch
All
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:tef+, firefox21 wontfix, firefox22 wontfix, firefox23 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

RESOLVED FIXED
mozilla23
blocking-b2g tef+
Tracking Status
firefox21 --- wontfix
firefox22 --- wontfix
firefox23 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: jedp, Assigned: jedp)

References

Details

Attachments

(1 file, 1 obsolete file)

Currently, the nsDOMIdentity module uses the outer window id as a unique identifier for RPs (relying parties).  This works on firefox desktop and also FirefoxOS desktop builds, but not FirefoxOS mobile.

On FirefoxOS mobile, tabs in the browser app are all iframes.  As such, they share the same window ID.  Thus in it's present state, the native persona implementation is only communicating with the tab in the browser that most-recently called watch().

The fix for this should be simply to provide a more unique unique id :)
FYI - I moved tracking over to the new followup bug in PayId-v1next.
Blocks: PayId-v1next
No longer blocks: basecamp-id
Ben, what do you think?  I tried to change the minimum number of code lines (hence leaving in outerWindowID).  I wanted to get your feedback and if possible your r+ before asking a dom module owner for r+.

Thanks!
j
Attachment #745289 - Flags: feedback?(benadida)
(In reply to Jason Smith [:jsmith] from comment #1)
> FYI - I moved tracking over to the new followup bug in PayId-v1next.

cool - glad it's on your radar.  Thanks.
Comment on attachment 745289 [details] [diff] [review]
A more unique unique id for firefoxos mobile (dom)

Review of attachment 745289 [details] [diff] [review]:
-----------------------------------------------------------------

looks good overall from identity perspective, should go to DOM folks for further review.

Question: once you're doing the UUID, is there a need to still factor in the window id?
Attachment #745289 - Flags: feedback?(benadida) → feedback+
(In reply to Ben Adida [:benadida] from comment #4)
> Comment on attachment 745289 [details] [diff] [review]
> A more unique unique id for firefoxos mobile (dom)
> 
> Review of attachment 745289 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> looks good overall from identity perspective, should go to DOM folks for
> further review.
> 
> Question: once you're doing the UUID, is there a need to still factor in the
> window id?

Thanks, Ben.  My main reason for leaving in the window id is that it's interesting debugging information.  If we have to trace where things are coming or going to, this gives something concrete to follow, and it won't be reset from invocation to invocation.  Maybe I'm being overly paranoid, but it seems like a useful paper trail.

An alternative would be to print it into the messages that get passed around, but that would touch more code.
:jst, would you mind reviewing this?  Thank you, j.
blocking-b2g: --- → tef?
Ben, Jason, Ed, I want to make sure this is on your scope as a blocker

The on-device testing sequence we'd been using actually missed this all along.  Without this fix, persona works for apps nicely, but not for the browser app on firefoxos, which is a critical component.
Something strikes me odd on this bug. I actually don't think the gecko patch alone is fixing anything here.

Wouldn't this use case only happen if you tried to pop open the identity dialog across two different tabs in the browser?

If so, I don't think that use case is currently possible in the browser app. When I try to open a trusted UI dialog within the browser app, the user is moved to the trusted UI context away from the browser entirely. Until that trusted UI context is removed - whether that be completed through user interaction, canceled, or killed by the process manager, the user will not be able to return to the browser app.

So I don't think this gecko patch is all that's needed here - you would need to allow the browser to support multiple trusted UI contexts per tab. However, for 1.01 at least, that's probably going to have to be OOS at this point. Feel free to go after fixing it for 1.1, though.
(In reply to Jason Smith [:jsmith] from comment #9)
> Something strikes me odd on this bug. I actually don't think the gecko patch
> alone is fixing anything here.

Since I didn't realize the problem until just recently, I completely agree that it's very hard to spot what's going on here.

And what you describe about the trusted ui is true: There can be only one trusted ui session for the browser (or any app) at a time.  But that's not actually what I'm trying to solve here.  I'm trying to make sure that tabs that have already called watch and request still work after new tabs do the same.

Currently, if you login on tab A, and then you open tab B and login somewhere, and then you go back to tab A, the logout button does nothing.  It has been disconnected from its callback.  This is what's broken in the current implementation.

Why? Because of how callbacks are stored for callers of the persona api.

The key thing is that the native dom identity service stores a set of callbacks for each caller of the navigator.id.watch() function.  That code has always identified the caller by the unique id of the window, which on firefox desktop *and* b2g desktop is guaranteed to be unique per tab.  The problem on b2g mobile is that each tab is a frame in the *same* window - they all have the same window id!

What this means is that the last tab on b2g mobile firefox to call navigator.id.watch() *replaces* the callbacks for that window id (the entire browser app), and doing so completely disconnects the previous tab that called navigator.id.watch().

So all that the patch changes is the id.  On firefox and b2g desktop, windowId was unique.  On b2g mobile, it's not unique enough.

Therefore I append a uuid generated by each caller of watch to the window id.  (I'm keeping the window id in the mix for better traceability; it doesn't contribute to the uniqueness of the id any more.)

Because we're dealing with the id of the initial calling context, the patch only touches the content interface to the dom identity module.

A simple fix, but crucial! :)
Okay, I think I understand now. The end-user use case will happen when navigator.id.watch is being used across two or more tabs with the oldest tabs losing their callbacks, given that we're not keeping unique IDs across tabs.
(In reply to Jason Smith [:jsmith] from comment #11)
> Okay, I think I understand now. The end-user use case will happen when
> navigator.id.watch is being used across two or more tabs with the oldest
> tabs losing their callbacks, given that we're not keeping unique IDs across
> tabs.

Precisely.
(In reply to Jed Parsons [:jparsons] from comment #0)
> Currently, the nsDOMIdentity module uses the outer window id as a unique
> identifier for RPs (relying parties).  This works on firefox desktop and
> also FirefoxOS desktop builds, but not FirefoxOS mobile.
> 
> On FirefoxOS mobile, tabs in the browser app are all iframes.  As such, they
> share the same window ID.  Thus in it's present state, the native persona

Cc:ing jlebar here who did most of the work on the browser guts in Firefox OS, but I don't believe the above statement is true. Each iframe has its own outer window (and id), and each page inside each iframe (or thus tab) has its own inner window (and id), all of which is unique *per process*.

If the window ids are passed to the parent process here, then yes, you'll have possible conflicts that you need to deal with, and there's two ways to deal with this, one of which is to do what you do here, use something else that's more unique, but even the UUID approach does not guarantee uniqueness. The other approach is to track the child process where the window id came from (i.e. most likely the message manager associated with the child process) along with the window id itself. That guarantees correct behavior, and I'm wondering if we can do that here as well? I would personally prefer to *guarantee* correct behavior here rather than relying on something that's very likely to be unique across processes, but not guaranteed.
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #13)
> ... I don't believe the above statement is true. Each iframe has its own
> outer window (and id), and each page inside each iframe (or thus tab) has
> its own inner window (and id), all of which is unique *per process*.
> 
> ... even the UUID approach does not guarantee
> uniqueness. The other approach is to track the child process where the
> window id came from (i.e. most likely the message manager associated with
> the child process) along with the window id itself. That guarantees correct
> behavior, and I'm wondering if we can do that here as well? I would
> personally prefer to *guarantee* correct behavior here rather than relying
> on something that's very likely to be unique across processes, but not
> guaranteed.

Thanks, :jst.  Absolutely.  I want a guaranteed unique id.  I did not realize the the per-process unique inner-window id.  But that totally makes sense now that you describe it.  So concatenating the process id and the innerwindow id would be unique.  Would not concatenating the outer and inner window ids also be unique?  :jlebar, can you enlighten?

Thanks to all.
Flags: needinfo?(justin.lebar+bug)
(In reply to Jed Parsons [:jparsons] from comment #14)
[...]
> sense now that you describe it.  So concatenating the process id and the
> innerwindow id would be unique.  Would not concatenating the outer and inner
> window ids also be unique?  :jlebar, can you enlighten?

Yes, the combination of process id and inner window id would be unique, but the combination of outer window and inner window is *not* guaranteed to be unique. Two child processes can have the same inner and outer window id combinations.
What is the correct way to get the child process id?
jst is totally right about everything he's said here.

> What is the correct way to get the child process id?

The piece of data you're looking for is ContentParent::ChildID() or ContentChild::GetID().  I'm not sure this ID is actually exposed to JS, but we can of course do that.
Flags: needinfo?(justin.lebar+bug)
FWIW I have no problem using a UUID here if that's easier.  The chance of a UUID collision is many billions of times lower than the chance of a random bit error in the machine causing us to incorrectly believe that two (window-id, process-id) tuples are the same.
(In reply to Justin Lebar [:jlebar] from comment #17)
> I'm not sure this ID is actually exposed to JS, but we can of course do that.

That would be excellent.

(In reply to Justin Lebar [:jlebar] from comment #18)
> FWIW I have no problem using a UUID here if that's easier.  The chance of a
> UUID collision is many billions of times lower than the chance of a random
> bit error in the machine causing us to incorrectly believe that two
> (window-id, process-id) tuples are the same.

lol.  good point.  jst, given Justin's feedback, are you ok with the patch as it stands?

I'd like to get the parent/child id properly sorted out, but it sounds like there may be extra work involved.  Perhaps that can happen for leo?

Many thanks,
j
(In reply to Jed Parsons [:jparsons] from comment #19)
> (In reply to Justin Lebar [:jlebar] from comment #17)
> > I'm not sure this ID is actually exposed to JS, but we can of course do that.
> 
> That would be excellent.
> 
> (In reply to Justin Lebar [:jlebar] from comment #18)
> > FWIW I have no problem using a UUID here if that's easier.  The chance of a
> > UUID collision is many billions of times lower than the chance of a random
> > bit error in the machine causing us to incorrectly believe that two
> > (window-id, process-id) tuples are the same.
> 
> lol.  good point.  jst, given Justin's feedback, are you ok with the patch
> as it stands?

I'm ok with it, but I don't like it. I'm a fan of designing systems that work as expected assuming no hardware issues, but in the interest of getting this tef+ blocker landed, I'll review the patch.
Comment on attachment 745289 [details] [diff] [review]
A more unique unique id for firefoxos mobile (dom)

+    // On firefox desktop, the window id suffices as a unique id.
+    // However, on FirefoxOS devices, browser tabs are actually
+    // frames, not windows, and as such all have the same windowID.
+    // Therefore we mix in a uuid to ensure that we can identify
+    // each tab, be it on desktop or mobile, uniquely.

Fix this comment to match our discussion in the bug, you're avoiding cross process windowID collisions here.

+    this._windowId = util.outerWindowID;
+    this._uuid = uuidgen.generateUUID().toString();

I don't see this._uuid used anywhere other than right after this code, maybe not store it on this?

+    this._id = this._windowId + this._uuid;

Why concatenate _windowId here? The uuid is unique enough here per Justin's comments in the bug.

r=jst with that fixed.
Attachment #745289 - Flags: review?(jst) → review+
Filed follow-up bug 869182 to resolve the questions in comment #17 and use process id + innerwindow id as the identifier.
Addresses questions in comment #21; r=jst
Attachment #745289 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/cd374ec0bd26
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #20)
> but in the interest of getting
> this tef+ blocker landed, I'll review the patch.

What tef+ bug does this block?
(In reply to lsblakk@mozilla.com from comment #26)
> (In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #20)
> > but in the interest of getting
> > this tef+ blocker landed, I'll review the patch.
> 
> What tef+ bug does this block?

Without this patch, Persona will not work properly on the firefoxos browser.
Lukas, do you need any more context?  This is a significant bug for us.
Thanks
j
Flags: needinfo?(lsblakk)
tef+ for comment 27. Please add verifyme to the keywords with a description of what to test to verify this fix and check for regressions.
blocking-b2g: tef? → tef+
Flags: needinfo?(lsblakk)
(In reply to Alex Keybl [:akeybl] from comment #29)
> tef+ for comment 27. Please add verifyme to the keywords with a description
> of what to test to verify this fix and check for regressions.

Edwin - Can you find an assignee to test this?
Flags: needinfo?(edwong)
(In reply to Jason Smith [:jsmith] from comment #30)
> (In reply to Alex Keybl [:akeybl] from comment #29)
> > tef+ for comment 27. Please add verifyme to the keywords with a description
> > of what to test to verify this fix and check for regressions.
> 
> Edwin - Can you find an assignee to test this?

fwiw, Edwin is here with me in Vancouver this week, and we have dedicated a chunk of time to test this (and everything else :)
Flags: needinfo?(edwong)
Reopening because we need 1.0.1 uplift for tef+
Status: RESOLVED → REOPENED
Keywords: checkin-needed
Resolution: FIXED → ---
(In reply to Jed Parsons [:jparsons] from comment #32)
> Reopening because we need 1.0.1 uplift for tef+

We usually mark bugs as fixed to mean they've landed on trunk. For uplifts, that's handled by the status flags. They aren't set, so someone will come by and uplift this.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
(In reply to Jason Smith [:jsmith] from comment #33)
> (In reply to Jed Parsons [:jparsons] from comment #32)
> > Reopening because we need 1.0.1 uplift for tef+
> 
> We usually mark bugs as fixed to mean they've landed on trunk. For uplifts,
> that's handled by the status flags. They aren't set, so someone will come by
> and uplift this.

Apologies.  I can't help being a little antsy :)

So I should uncheck 'checkin-needed', too, eh?
Keywords: checkin-needed
Nah, it's fine to leave the flag to indicate this needs an uplift.
QA Contact: edwong
(In reply to Ryan VanderMeulen [:RyanVM] from comment #36)
> In fact, I only look for FIXED bugs for uplift, so reopening it is a recipe
> for trouble. Relax, big guy ;)
> 
> https://hg.mozilla.org/releases/mozilla-b2g18/rev/4592c485dd45
> https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/824436f606ff

LOL thanks :)  Yes, this is a hell of a time for me to be getting fully educated on the details of the b2g uplift procedures.  Thanks as always for your help, Ryan.
Reopening this as a fatal regression.  

The uuid is regenerated when the screen is reloaded.  It changes.

Since the marketplace flow involves reloading the screen in the sign-in flow, this will break all payments.

We have to back this out of 1.0.1 and b2g18

I don't see a way to fix this.  If I can't get at the process ID from javascript, I don't see how I can get a unique fix on each tab in the browser.

Is there anything we can do?

Completely dejected, I bow in abject humility.  You may kill me now.
Status: RESOLVED → REOPENED
Keywords: regression
Resolution: FIXED → ---
Ryan - Can you back this out?
Flags: needinfo?(ryanvm)
I'm rebuilding now to run all tests again with the patch backed out, to confirm that marketplace will function again.
Wait - hang on.  The regeneration of the id should be ok.  These assertions are generated according to the userid and the origin - persona knows nothing of these ids, nor does it care.  It ought to be possible to regenerate the id whenever you want.

I've spent the past two hours trying to debug this in my hotel room here, and even with the id patch backed out of my build I'm getting intermittent successes and failures with log messages like "s3.amazonaws.com : server does not support RFC 5746, see CVE-2009-3555" and warnings about facebook connect being called wrong, and deltahotels.com having bad js.  I'm staying in the vancouver delta hotel.

Is all this fail simply a symptom of the hotel injecting a bunch of javascript into all my content and breaking everything?
I'm trying to find out from mozwebqa what their selenium logs look like for 1.0.1 this morning
Backing this out is not fixing my situation, and as I described in Comment #41, there is something *very* weird going on with the http that's coming to my phone.

My initial reaction was one of panic, but given the circumstanes, I would ask that wait a few more hours until everyone's awake and we can test this for real in a non-hotel-jacked-wifi environment.

It may still need to be backed out, of course, but let me have positive controlled tests to be sure.

I'm on IRC now with zac, who's helping me diagnose the 1.0.1 build from this morning.
Flags: needinfo?(ryanvm)
Keywords: checkin-needed
zac has confirmed that getting silent assertions on v1.0.1 is busted.

It works on v1-train and b2g18.  I am still looking for the source of the problem.
Talked with Jed. The root cause is not this patch, so lets track the work over in bug 870412.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: