Page 1 of 1

Advanced search

Posted: Wed Apr 03, 2013 6:48 am
by terkio
I do not understand.
I search "cave story" with "Search for all terms or use query as entered".
Expecting posts which have "cave" AND "story"
I get tons of posts ( 15 pages ) most, only have "cave" , such a search result is about useless !
Am I doing something wrong ? What is the right query ?

Re: Advanced search

Posted: Wed Apr 03, 2013 8:37 am
by beowuuf
I believe you need to use +cave +story

Re: Advanced search

Posted: Wed Apr 03, 2013 8:39 am
by beowuuf
Hmm, the search does seem borked, it does seem to be doing an 'or' search despite all the comments to the contrary

Re: Advanced search

Posted: Wed Apr 03, 2013 8:57 am
by terkio
Thanks,
Indeed, I had tried +cave +story to see it does an OR too.

I hope the advanced search will be fixed, it is frustrating when you do a search on words which are likely to be in all sorts of posts.

Re: Advanced search

Posted: Wed Apr 03, 2013 9:06 am
by beowuuf
It's a basic function of phpbb, so sadly it would only get fixed when phpbb fixes it and we update the forum software to that version. Anyone else know of this issue/know if it's being addressed?

Re: Advanced search

Posted: Wed Apr 03, 2013 11:41 am
by Gambit37
phpBB (the forum software) has a list of common words that are ignored in a search. This is to prevent overloading the database with searches for things like "and, of, then, it, as" etc.

If you search for "story" on its own, it's actually listed as a common word and is ignored. This is why your search doesn't work.

Common words are determined by the forum software automatically based on the contents of the entire forum. So because we've clearly used the word "story" thousands of times over the 12 years this forum's been alive, the software considers it common.

The only thing I can do is rebuild the search index and see if that helps. This can take hours and it may not even work. Fingers crossed...

EDIT: There is a bug tracker for this issue with some manual solutions (editing the database directly) -- I'll look at that as a last resort if the index rebuild doesn't work. The bug is closed as "Won't fix" so it's not something that will be improved.
http://tracker.phpbb.com/browse/PHPBB3-8175

Re: Advanced search

Posted: Wed Apr 03, 2013 11:50 am
by Gambit37
UPDATE: There's actually a simple table in the database that lists all the indexed words and whether or not they are classed as common. I found "story" in there set as common, so I unset it and have now started to re-index the forum. Once the re-index finishes, the search for "cave story" should work (I think!)

Re: Advanced search

Posted: Wed Apr 03, 2013 11:58 am
by terkio
Thanks,
I understand the reason of the PhpBB guys about common words, but what can I do to see wether the game cave_ story was already discussed on the forum ?
A search request for cave_story is changed into cave story
what can one do with common words like cave and story ? Obviously I cannot use other words, renaming the game I am talking about.
The best I can do is: Give me posts or topics where there is both the common words cave and story.
I am afraid the medecine from the PhpBB guys is killing the patient.

Re: Advanced search

Posted: Wed Apr 03, 2013 12:06 pm
by Gambit37
Wait and see if the changes I made actually work first ;-) Then we can address any other issues.

The index re-creation is still running and may take another 30 mins or so.

Re: Advanced search

Posted: Wed Apr 03, 2013 12:47 pm
by Gambit37
OK, index is rebuilt and a search for "cave story" and "cave_story" now works.

At some point I'll probably have to go through the entire common words list and remove anything else that shouldn't be classed as common. Doubt I'll ever get the time though..... ;-)

Re: Advanced search

Posted: Wed Apr 03, 2013 1:03 pm
by Lord_BoNes
Glad the search is fixed. Sounds like a fair job you've got ahead of you there Gambit... wish you the best of luck :)

Re: Advanced search

Posted: Wed Apr 03, 2013 1:46 pm
by terkio
Thanks, I hope this fix will be good for everybody, for more than just my personal search about cave story.
Sorry about the time you spent to fix it.
Is'n it simpler to remove completely the common word list feature invented by the PhpBB guys.
I think it cannot work because there are and will be, legitimate searches that cannot avoid common words.
Am I missing something ?

Re: Advanced search

Posted: Wed Apr 03, 2013 2:07 pm
by beowuuf
Interesting. Can someone cripple the site for a while running a search for 'and' or 'the'?

Re: Advanced search

Posted: Wed Apr 03, 2013 2:30 pm
by Gambit37
phpBB cross references words against posts where they are used, and builds an index list of all the words on the site. It uses some algorithm to calculate which words are "common" based on frequency of a word's use. So if you use a word many times, it becomes "common" as far as phpBB is concerned, even if we wouldn't typically consider the word "common". At least, that's how I understand it.

In theory it's possible to overload a site with regular searches for common words, and this is why the feature was built to limit that overloading problem. I think some words are automatically added as common before the index is even built, plus you can't search for words of 3 letters or less. So there are plenty of things built in to prevent overloading and to remain optimised, but some of these measures could be seen as "breaking" search -- 'cos you can't search for all words.

I'm not really concerned about it on this forum as we have such a small user base it's rarely a problem. If we were running a huge complex forum that contained tons of TLAs (Three Letter Acronyms), then yeah we could have a "broken" search that wouldn't find those TLAs, but for now I'm not even worrying about it :-P

Re: Advanced search

Posted: Wed Apr 03, 2013 3:14 pm
by Lord_BoNes
So that means that both RTC and DSB can't be searched for? Bummer.

I can see 1 major weakness to the "common word" approach... if someone were to bombard the forum with posts containing nothing but "break break break" etc... then the "common word" tactic would block the word "break" from being searched for. But, I'd imagine the posts containing such text would quickly be stomped by admins.

Re: Advanced search

Posted: Wed Apr 03, 2013 3:16 pm
by Gambit37
Actually, DSB and RTC come up just fine -- perhaps it's only 2 letter words that by default are ignored.

Plus I think my explanation is wrong: if it were correct, then Dungeon and Master would not show up in results, but they do.

Re: Advanced search

Posted: Wed Apr 03, 2013 3:39 pm
by Lord_BoNes
Fair point! Something tells me people would notice...

Re: Advanced search

Posted: Wed Apr 03, 2013 7:04 pm
by beowuuf
A brief look in to it makes it seem that as Gambit said the database should compose the common word list from our posts. I guess we use 'DM@ far more than dungeon or master :D

And the 3 / 2 word limit is a setting in the control panel, we can lower it more if we want, but it's currently set to 3 letter words.

I notice that we can disable it fully by telling it to make common 0% of the words then disabling the common word search, however I'm loathe to do that given this is the first time in many years we've encountered an issue. I'll leave it up to the more tech savvy admins to figure out the downsides of it all.

Re: Advanced search

Posted: Wed Apr 03, 2013 7:22 pm
by Gambit37
I just read all the comments on the bug report link above, the last one explains things a bit better:

"Common words only start to get marked after 100 posts."
http://tracker.phpbb.com/browse/PHPBB3- ... ment-29251

And this comment above it gives some more info on how these common word errors can creep in
http://tracker.phpbb.com/browse/PHPBB3- ... ment-29250

Re: Advanced search

Posted: Wed Apr 03, 2013 7:22 pm
by Lord_BoNes
1st hiccup in numerous years = "pretty damn good" in my opinion :P

Re: Advanced search

Posted: Wed Apr 03, 2013 7:45 pm
by beowuuf
Ah, I thought it meant the forum had 100 posts, not that the word had occurred in 100 posts! Interesting...

Re: Advanced search

Posted: Wed Apr 03, 2013 8:14 pm
by terkio
I see I opened a can of worms. I am sorry.

I had a look to a forum which has a lot of users on line, to see what it does against the search service overloading.
DIYaudio a site for audio and electronics geeks http://www.diyaudio.com/forums/
I did a search for the , it was rejected because the is a too common word. So far so good.
I did a search with amp, it was accepted giving near 100 000 posts, the search time was 40 seconds. amp is definetly a very common word, however the search doesn' t consider, it is a common word. :shock:
I made no more test searches, I do not want to hog their forum.

I don' t know wether this helps or brings more confusion. I am lost :oops:

Re: Advanced search

Posted: Wed Apr 03, 2013 8:17 pm
by Gambit37
Don't worry about it :-)

Re: Advanced search

Posted: Thu Apr 04, 2013 8:54 pm
by Gambit37
Actually, this is weird:

If you search for "dungeon master" as a phrase, you get lots of results returned and both words are highlighted in the matches results.
If you search for "master", you get a similar result.
But if you search for "dungeon" on its own, your get the "no results, word too common" response

....!?!?!?

Re: Advanced search

Posted: Thu Apr 04, 2013 9:01 pm
by beowuuf
Well, think about where most of the custom games are set....indeed what we call them... and think about the fact that I also ran a D&D game set in the same sort of environment....

I think 'dungeon' gets thrown around far, far more as a word than master does :D

Re: Advanced search

Posted: Thu Apr 04, 2013 9:48 pm
by Gambit37
What I meant was it's weird that "dungeon master" as a phrase is matched just fine, but "dungeon" isn't. Especially considering the matches for "dungeon master" are the same as for "master".

Doesn't make any sense. phpBB is weird! :D

Re: Advanced search

Posted: Fri Apr 05, 2013 12:12 am
by terkio
Sure, a real time shared system must be protected against resource hogging.

I do not understand why they invent weird schemes where it is so simple to use quotas.
So simple to just set a maximum amount of responses to a request. ( and a minimum delay between requests from a user ).

Re: Advanced search

Posted: Fri Apr 05, 2013 5:46 am
by Lord_BoNes
@Gambit: If "dungeon" is considered a common word, then I'd reset it like you did for "story" above.