Open
Bug 24957
(dupesinsearch)
Opened 25 years ago
Updated 11 years ago
include duplicates in search, but return only their originals (-> better search results, less duplicates)
Categories
(Bugzilla :: Query/Bug List, enhancement)
Bugzilla
Query/Bug List
Tracking
()
NEW
People
(Reporter: derikson3, Unassigned)
References
Details
(Whiteboard: [relations:dupl])
Attachments
(3 files)
(deleted),
patch
|
jouni
:
review-
|
Details | Diff | Splinter Review |
(deleted),
image/jpeg
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
It's sometimes very difficult to check the database for a paticular bug before filing a new bug so that a duplicate isn't created. Sometimes the text of the bug is too technical to match up with the search, but bugs marked as duplicates of that bug may contain the text searched on. A on RESOLVED and VERIFIED and DUPLICATE might find that text, but it will also return duplicates of fixed bugs and such. For these reasons, I think it'd be very beneficial to have a checkbox for including duplicates of the bugs in the search of summary and description fields. For example, searching for NEW or ASSIGNED or REOPENED bugs in the "bugzilla" component with description containing "duplicate", will select all those NEW or ASSIGNED or REOPENED bugs in the "bugzilla" component, and then search the description of those bugs as well as their duplicates for the text "duplicate". If the query matches against the duplicate, then the bug that it duplicates is returned. There may be a better way of achieving the same results, but something similar may help reduce the number of duplicates submitted.
Comment 1•25 years ago
|
||
Duplicate bugs have never been represented in the database very well.
Status: NEW → ASSIGNED
Comment 2•25 years ago
|
||
tara@tequilarista.org is the new owner of Bugzilla and Bonsai. (For details, see my posting in netscape.public.mozilla.webtools, news://news.mozilla.org/38F5D90D.F40E8C1A%40geocast.com .)
Assignee: terry → tara
Status: ASSIGNED → NEW
Comment 3•25 years ago
|
||
As far as I can see this should be possible by searching on all statuses with resolution of --- or DUPLICATE.
maybe what you want is to do someting clever like "if this bug is marked as a duplicate, save the summary off into a seperate searchable table" comment?
Updated•24 years ago
|
QA Contact: matty
Whiteboard: Future-Target
Comment 6•24 years ago
|
||
-> Bugzilla product, Query component, reassigning.
Assignee: tara → endico
Component: Bugzilla → Query/Bug List
Product: Webtools → Bugzilla
Version: other → unspecified
Updated•24 years ago
|
Whiteboard: Future-Target → [relations:dupl]
Comment 7•23 years ago
|
||
See also bug 105295, query should show closed duplicate bugs that have the main/parent bug open.
Comment 8•23 years ago
|
||
*** Bug 105295 has been marked as a duplicate of this bug. ***
Comment 9•23 years ago
|
||
i'm the reporter of the 105295 dupe 8) i have searched for dupes but as i searched with the word "query" and "find", not with "search" i didnt find it... a good exemple why a solution for this is needed Thanks Jesse Ruderman for point this out below is the text of my submit: the sumary usually is limited to some words and there are many diferent ways for naming bugs one of the main reason that people post dupes is because they cant find the open bug because they are using the "wrong" words that arent in the sumary one option turn on by default would search the dupes bugs(all states?!) too, but instead of showing all dupes bugs, it would show up only dupes that have their main /parent bug is still open so if one bug have many definitions, dupes will popup, but stop after the main and the dupes have all the words combinations in their sumary if reporters cant find the main bug because of the "wrong" words, they will find it in the closed "open" dupes, and the dupe will point to the main open bug
Comment 10•23 years ago
|
||
*** Bug 107982 has been marked as a duplicate of this bug. ***
Comment 11•23 years ago
|
||
NOTE: I'm not familiar with the bugzilla code nor the database structure it uses. If the user does not explicitely select to search DUPLICATEs or if we add a special checkbox that says something like "Find matches by resolving duplicate targets" and be on by default (Note: the text would have to be far less confusing then what I have =), then we do the following: (I'm assuming that bugzilla has a seperate relational table that relates bugs and their duplicates) We perform a JOIN between the bug number in the search results and the bug number in the duplicates table so that the result set has a column with a bug number of a duplicate. (I don't know how much of a performance hit the JOIN would be in mysql) With bugs that are duplicates, we display the target bugs in their place. (Obviously we'd need to filter out multiple bugs that point to the same bug and only display one in their place.)
Comment 12•23 years ago
|
||
Ironically I logged a dupe of this bug too. I think the problem is that Bugzilla equates bug reports with bugs and this is not the case. Bug reports should exist separately from bugs. For a given bug there may be many reports, so it's a one to many relationship. In fact it's a many to many relationship as a report may involve several different bugs, although this is usually because it's a badly written report, so I would say make it one to many and allow reports to be broken up if they need it. Having a one-one relationship leads to the whole DUPLICATE problem. DUPLICATE is not a valid resolution, the bug may or may not be resolved. You are actually using the resolution field to flag that this row has some sort of parent-child relationship with another row. This relationship should be made explicit in the schema, with a table for bugs and a table for reports and each report is linked to a bug and each bug is linked to 1 or more reports. Ideally, reports would come in and be assigned a report number, an expert can then assess each report and either create a new bug or simply attach the report to an existing bug (including a special type of bug called the "non-bug"). Reports should be mobile. If after some research it turns out that bug A is just bug B in disguise then you could drop bug A and assign all its reports to bug B. Not only do you have 1 less bug but you also have all the reported information available from a central point. Keyword searches are done by searching in the individual reports which then lead back to bugs. This eliminates DUPLICATE and now all the resolutions actually are resolutions. Under this scheme, bugs have a status, resolution, priority, owner etc. Reports have a reporter, a parent bug, platform, OS, build ID and by examining the linked reports, Bugzilla can figure out things like what platforms are effected by the bug. This allows you to record multiple build IDs and OS versions against a single bug (one in each report) it also allows the possibility of other types of reports like success reports (confirmation that the bug does not effect a given platform). By combining bug reports and success reports, a bug can know what platforms are and what aren't effected by the bug. Success reports were previously entered as comments, now you can use the drop downs. Yes, I know this is heaps of work and I'm not sure how (or even if) you can migrate from one scheme to the other, I'm just throwing in my .02 euro.
Comment 13•23 years ago
|
||
*** Bug 151964 has been marked as a duplicate of this bug. ***
Comment 14•22 years ago
|
||
Is there some sort of hack to fix this on bmo ? By a rough estimate, this is causing around 20% of the dupes. That translates into thousands of the UNCO bugs sitting there.
Alias: dupesinsearch
Comment 15•22 years ago
|
||
duplicates should be searched for by default! (the proposed checkbox should start checked.) this would significantly decrease the number of dupes. think of a dup as a pointer or symlink; it should be treated as an aide in searching and indexing. ...isn't much other use for them ;^) perhaps we need some better way of identifying where they point to on buglists, like having the buglist link point to the comment of the dup'ed bug (which should probably mirror the dup's summary and comment count). ...and why is the cool <a ... title=bugname> link-hover feature on the buglist?
Comment 16•22 years ago
|
||
I started to work today on a patch for this. The essential idea is: 1) Add an 'include duplicates' checkbox under the 'Status' field of the query form. 2) If it's checked, then add the following to the query: i) in the tables list, add: LEFT JOIN duplicates ON duplicates.dupe = bugs.bug_id LEFT JOIN bugs parentbug ON parent.bug_id = duplicates.dupe_of ii) in the where clause, change (bugs.bug_status IN (...)) to (bugs.bug_status IN (...) OR parentbug.bug_status IN (...)) So the meaning for the user is, I believe, very simple and straightforward: if they select NEW, ASSIGNED, and REOPENED and check "include duplicates", it means "search for bugs that are NEW, ASSIGNED, REOPENED, or duplicates of bugs that are NEW, ASSIGNED, or REOPENED".
Comment 18•22 years ago
|
||
Here it is. I have the following things to note about it: First, that's my first larger-than-a-line Bugzilla hack. I'm far from fully understanding Bugzilla::Search->init, Template::Toolkit, or from having a clear general picture of Bugzilla. So, although I believe the patch works, I understand that it may well be a bad hack. If it is, give me some clues so that I can improve it. Second, I can't test it adequately. On my test Bugzilla installation with six bugs :-), it seems to work alright. I don't know how such patches are tested in real conditions, but we may be confident that, if the new field is unchecked, Bugzilla will work OK; except that the joins and "where" expressions may be in a different order, the query will be exactly the same.
Comment 19•22 years ago
|
||
Comment 20•22 years ago
|
||
As far as I can tell from reading the description, this will only add immediate dupes to the search - dupes which are one or more levels removed won't get included. I have issues with the UI as well, and I think this will also negatively impact performance. You are also all thinking about the search page as a tool for bug filers and QA, and this is not its only use. So I'm not convinced that this is a good idea... Gerv
Comment 21•22 years ago
|
||
I don't think database performance should be too much of a consideration, the goal should be making the database more effective (whatever that means). Reporters and QA people find it hard to find dupes because the interface sucks. My standard way of searching is to do a limited search for open bugs in the right component, and if that fails, then to search open and resolved bugs in all components using keywords in the summary. I'm doing the performance-hurting queries anyway because I know how. Reporters that don't know how will just fail to find anything (as they do now) and file dupes. but this seems a rather kludgy way of solving a specific instance of a general problem... we don't actually need to find the dupes, we just want to find the original bug based on summary/component info from the dupes. it would make sense for the database to facilitate that, so it's not necessary to trawl the whole database each time.
Comment 22•22 years ago
|
||
screenshot says 'search for duplicates' ... as I see it, the main use of this will not be to find duplicates, but rather to find bugs those dups point to. that said, I think the text should be something like 'search duplicates' or 'include duplicates in search' I would also like to repeat my request that this be checked by default.
Comment 23•22 years ago
|
||
Comment 20: That's right, it does not search duplicates recursively. Not only would this be difficult to implement and bad for performance, but, I believe, unnecessary as well. You have bug A with 15 duplicates, and one day it is discovered that A is a duplicate of B. I have a feeling that this happens late in the lifetime of B, when filing duplicates (which is what we are trying to avoid here) is not so important. But it might be good practice to change the 15 dups so that they are dups of B instead of A. (I have no opinion on the more general remarks.) Comment 22: The default for bugzilla.mozilla.org is the administrator's decision; the default query is specified in the bugzilla operating parameters page. What _is_ the developers' decision is the default for new Bugzilla installations, which I believe should be off; that's why I did not change defparams.pl.
Comment 24•22 years ago
|
||
> I don't think database performance should be too much of a consideration
Believe me, in the real world, database performance is very much a
consideration. Or are you happy with the speed of bugzilla.mozilla.org? :-|
Gerv
Comment 25•22 years ago
|
||
The problem is the structure of the database. Right now buzilla thinks an bug and bug report are the same thing. They're not. Some reports are not bugs at all and some bugs have many reports. If there was a table for bugs and table for reports and a 1 to many relationship, that'd remove the need for recursive queries. A new report comes in, someone who knows what they're doing attaches it to an existing bug or creates a new one for it. Obviously dups will still occur by accident but now you can just transfer all the reports from the dup to the original bug and close the dup. It's a big change but it seems to me that the dup issue is getting bigger all the time and distinguishing between bugs and reports of bugs corresponds better to the reality that bugs and bug reports are 2 very different things. For even more about this see comment number 12 earlier.
Comment 26•22 years ago
|
||
You're right, the two concepts should be separated. By the way, this would also solve bug 121805 ... But I guess migrating existing bugs into that new structure would be a little bit tricky.
Comment 27•22 years ago
|
||
Re #26. I think the migration would require careful work but is kinda straight forward. As things stand every (or almost every) field which is filled in when creating a new bug is really a bug report field - OS, version, description of symptoms, how to reproduce etc. All other fields belong to the bug, like status, votes, assigned, resolution. For migratation: 1: Go through all the non-dup bugs and add a row in the BUGS table and the REPORTS table, filling the fields in the new tables with data from the old table. Link the row in the REPORTS table to the row in the BUGS table. 2: Go through all dup bugs and add a row in the REPORTS table for each one. Do not create an entry in the BUGS table. Find the master bug" that this is a duplicate of and link the report to the master bug in the BUGS table. When converting dups, you will have to throw away some information, basically anything that doesn't fit into the bug_reports table but if the fields have been chosen well then nothing important will be lost. For some info, like votes, it can be added to the master bug. This scheme also allows for new kinds of reports like success reports, analysis, extra symptoms etc, solving #121805 also.
Comment 28•22 years ago
|
||
Strictly speaking, the discussion about separating bugs from bug reports is off-topic. It is true that this bug, and bug 121805, and bug 145588, and the idea about separating reports from bugs, are four different attempts to solve the same problem: "Do something to reduce duplicates." I think I'll start a discussion on netscape.public.mozilla.webtools about this. I briefly searched for such a discussion and found http://groups.google.com/groups?hl=el&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=aa4c53a552207cba&seekm=an26vi%24dl0%241%40news5.svr.pol.co.uk#link1, which the people new to this bug may want to read. Meanwhile, I think it would be nice if we focused this discussion on including duplicates in search. Most of the on-topic discussion has been skepticism on performance impact. The thing is that it is impossible to know the performance impact just by looking at the query. Even if someone can tell that turning on the duplicates option will take three times more machine time, what does this tell us? That machine response time would become nine times longer? For all we know, the option might help people find what they're searching for with fewer queries, thus actually improving performance. So, if we agree on the user interface, if we agree that the feature may be worth, and our only problem is performance, let's put it there, have it off by default, mark it experimental, discourage its use, and try it. See how much longer it takes, how much better the returned results are, and make some better-informed guesses as to its implications.
Comment 29•22 years ago
|
||
Re #28 - off-topicness. I disagree. Separating bugs and bug reports allows you to eliminate the whole concept of duplicates. With no dups, there's no need to include them in searches, The triagers now deal with reports. When the reports is for a previously unknown problem they create a new bug which will have the report attached. When someone reports an already known problem, the triager just attaches the report to the existing bug. This second action is what used to lead to a dup but now just leads to an extra report attached to the bug. The only problem is if 2 existing bugs A and B later turn out to be different aspects of a single bug or if a triager creates bug B from a report when they should have attached it bug A. Then we need to merge A and B together. The mergeing process would attach all the reports from bug B to bug A and delete bug B. Maybe some other data needs to be merged also, like votes and CCs. Also whoever is merging may have to choose a status and who to assign it to. This mergeing process should be fairly rare if bug reports are assigned correctly in the first place. Keyword searches are done on fields in the bug reports and since all the reports about bug A are attached to bug A there's no need to worry about dups. The performance should be no worse than before and now no one needs to even think about dups when doing their query.
Updated•22 years ago
|
Attachment #108780 -
Flags: review?
Comment 30•22 years ago
|
||
Re: searching duplicates recursively, that's what bug 68611 would fix (by storing the end of the duplicate chain in addition to the next dupe in the chain), at which point this fix to switch over to using that "end of chain ID" instead of the "dupe_of" ID. See also bug 204209 about simplifying the duplicates schema, since the current schema is unnecessarily complex. Re: performance, performance matters. This feature needs to be disablable for installations that can't afford it (f.e. b.m.o at the moment; I hope that b.m.o becomes an installation that can afford it, but that depends on us getting new machines, how fast they are, and how much our traffic grows). Re: the UI, I haven't had a chance to look at it closely yet, but in general I agree with Michael that the goal of this fix should be to find the original bugs to which the duplicates refer, not the duplicates themselves, and the UI should reflect that. Ideally the user shouldn't have to select an option at all, Bugzilla should just do the duplicates search and return the matching original bugs, but we would have to be careful to limit this to a known set of fields for which it doesn't matter if we find duplicates (i.e. if someone searches for open bugs assigned to a particular person, we don't want to find open bugs assigned to someone else but which have duplicates assigned to the person being searched for). Perhaps this means that we should limit this feature to searches where it's clear that the user is looking for a specific bug via non-specific criteria (i.e. key words) ala bug 145588. Re: the implementation, again, I haven't looked closely yet, but MySQL 4.0 provides UNIONs, which would probably make this much easier (especially for returning originals instead of duplicates) and more performant. We will probably start requiring MySQL 4.0+ soon after Bugzilla 2.18 ships (bug 204217), which is likely to happen in the next few months, so it's worthwhile looking into doing this with UNION, i.e.: (SELECT <columns> FROM bugs WHERE <conditions>) UNION (SELECT <columns> FROM bugs INNER JOIN duplicates ON bugs.bug_id = duplicates.dupe INNER JOIN bugs AS originals ON duplicates.dupe_of = originals.bug_id WHERE <conditions>) ORDER BY <order columns>;
Updated•22 years ago
|
Hardware: PC → All
Comment 31•21 years ago
|
||
Comment on attachment 108780 [details] [diff] [review] Patch Bitrotten, although I'm surprised how little.
Attachment #108780 -
Flags: review? → review-
Updated•21 years ago
|
Assignee: endico → nobody
Comment 32•21 years ago
|
||
Here's a work in progress that returns duplicates for fulltext searches, attempting to aggregate the relevances of the duplicate bugs so that bugs with more duplicates show up higher in the list. A test of this functionality is available on b.m.o by running a fulltext search and then changing "buglist.cgi" in the URL to "buglist-bm.cgi".
Comment 33•21 years ago
|
||
i submitted a duplicate of this bug :O Basically, if we did a search of the duplicates, and then ranked the bugs that the duplicates pointed to by the number of matched duplicates to the user querry, We'd have a nice system.. another way of dealing with this would be to take the summaries of duplicate bugs, and attach their extra words to the summary of the origional bug. I don't know which one would be easier. But I'm overwhelmed by the numbers of duplicates and i'm only an end user!
Comment 34•21 years ago
|
||
*** Bug 249372 has been marked as a duplicate of this bug. ***
Comment 35•20 years ago
|
||
This bug seems to be the closest match for my suggestion so I'll start here. I think the easiest way to include dups is to add a checkbox to the Quicksearch: Enter a bug # or some search terms: __________________ [ Show ] [Help] [ ] Include RESOLVED, VERIFIED, and CLOSED bugs. This doesn't resolve the whole issue of tracking and backreferencing dups in a search, but it'd be an easy fix since it'd include all bugs (including WONTFIX, INVALID, DUPLICATE, etc.).
Comment 36•20 years ago
|
||
*** Bug 216360 has been marked as a duplicate of this bug. ***
Updated•19 years ago
|
QA Contact: mattyt-bugzilla → default-qa
Updated•18 years ago
|
Target Milestone: Future → ---
Updated•16 years ago
|
Assignee: nobody → query-and-buglist
Priority: P3 → --
Comment 39•14 years ago
|
||
The benefit of this long-standing bug is immediately obvious: -> easier to find just the right bug as we exploit the potential of duplicate bugs' data to find their originals (while not cluttering results with duplicates) -> less duplicates will be filed Therefore, I'd suggest changing the following flags: Priority: P1 or P2 Target Milestone: something as near as possible
Summary: include duplicates in search → include duplicates in search, but return only their originals (-> better search results, less duplicates)
You need to log in
before you can comment on or make changes to this bug.
Description
•