Help needed in deciphering strange App Store rating activity

Discussion in 'Public Game Developers Forum' started by WalterM, Sep 1, 2014.

  1. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    That's a safe bet; the store would measure how alive your app is and an overal stagnation of ratings would be treated as a penalty against the apps that get a constant stream of fresh ratings.
    That's almost a given considering Apple needs to come up with dynamic charts. It could not just count the ratings' values without looking at when they had been posted.

    I don't understand how small changes would make said difference. The self-sustaining level supposes that the app attracts new gamers almost regardless of ratings, reviews or else.
    It cannot be both at a self-sustaining state and suffering of small scale, irrelevant statistical alterations.

    One rogue element wrecking hell from within a large company to his/her own benefits wouldn't be a first, but that's a grave accusation nonetheless.

    This would mean it already happened before October 2013. Logs, maybe? :)
     
  2. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    There are certain things that need to be pointed out, and most of them will be done thanks to App Annie's statistical service.

    First of all, I observed the performance of your app in the following countries: United Stated, UK, New Zealand, Australia, Canada, Russia and Germany.

    In the US, one of the hardest yet most lucrative markets to crack, your app has been hovering in the top 50 apps in the Word category since Jan 1st. In Australia, the app also gets a good position in the same category, often within the first 100.
    Over its entire life, the app has been doing rather well in those countries. It slightly started to go down, but nothing really massive, and after the 3.1 update in October 2013, your app actually started to go upwards.
    More importantly, we can see that your app's ranking has abolutely not suffered at all. It's even doing better, and has for example reached its highest spot of #21 during September 11th and 12th for this year.

    In other countries, your app ranks lower but we don't see a drop following May 21st or 24th.
    So basically, even if the rating net gain was of zero, we can see that it has zero influence at all on your app. In fact, from this alone, we might infer that review removal might have zero relevance at all and at best play a small part in Apple's ranking algorithms.

    Also, App Annie reproduces the list of all ratings and reviews, per country. For September thus far, you've gotten more than 15 reviews with ratings in the US alone.

    If the removal is done interally, whether by a rogue element at Apple, or one of their routines, why haven't you created a few test accounts and given yourself a couple three stars ratings, waited a few hours or days, and checked if their ratings were removed or not?



    Mind you, if one wanted to hurt an app, I guess it could be possible to script a process that does the same thing as posting multiple ratings, but in small increments: it would initially help your app a bit (you wouldn't gain that much ratings), but all of sudden, all accounts would be ordered to remove their ratings at once at a key time.
    That would surely provoke a swelling of your ranking. That, however, would be far fetched and highly unefficient since the slow drip would happen over the course of several months, enough to actually allow the app to rise and become successful and easily tank a sharp loss in their overall quantity of ratings.



    Also, how can we be sure that you haven't tried to cheat Apple's App Store and that you got caught, or that you're not tried to generate some kind of buzz?
     
  3. Jez Hammond

    Jez Hammond Well-Known Member

    Oct 20, 2012
    50
    0
    0
    I have a couple more questions which would be nice to eliminate. I hope that isn't off topic as the conversation looks quite deep right now!

    Consider the user is asked to rate, selects yes, AppStore loads up, user can now do many things which might result in not actually posting/achieving a rating. Examples might be: no password / pursued alternative AppStore usage / immediately pressed the home button thinking they have 'tricked' your game to stop nagging / received a phone call.

    Also, uninstalling a game offers removal of Game Center; does uninstalling also remove reviews? - even if not immediately as they certainly wouldn't appear immediately... (no device to test/confirm atm)

    And here's the thing: Does your statistic generator take this in to consideration at all, and if so how could it possibly know if anyone had completed the proposed action (rate this app), uninstalled, or just did nothing except for playing again next time? It does sound like your Flurry(?) type gathering of actions is presuming completion. You might need to incorporate your test in to a third-party app and see if similar results turn up or not. Though I might be misunderstanding how you count the differences and such data could be freely available for other apps already? - in which case my questions could be eliminated.

    I can confirm that reporting a broken AppStore apparently achieves a function-key-selected-auto-message. I once went ahead and coded an update to fix a black icon on the store as nobody seemed to care to even acknowledge it was happening! A week later v1.0.1 appeared (went from charting to not charting). Now that was years back and I am done trying to find out what happened, so good luck with this issue because nobody cares about the small guy. Oh, that started out humble but ended up reminding me of a nightmare, sorry about that.
     
  4. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    Pixelosis, thanks for continuing to have a critical eye towards this thread and raising some interesting questions that provides me the opportunity to clarify. This kind of a scrutiny of the App Store goings on is very good for the developer community as a whole.

    In this particular post I will address the meta-discussion on the discussion type of comments since it may not be of interest to many readers and cover the discussion on the data and its interpretation in a separate post because of the length constraints in the forum.

    I have to address this first.

    I would appreciate not mis-characterizing my statements in this manner because there are legal implications of such. I have not made any accusations grave or otherwise. On the contrary, I have explicitly stated that it could happen for any number of speculative reasons but that it is not knowable for us but Apple can determine it easily and therefore NOT worth going into here.

    If it is obviously interpreted as an accusation by an average reader, then I apologize.

    We can go back and forth on this forever for argument's sake but I will stop here on this aspect in the interests of progressing the study of the data.

    I hear what you are saying and it is a valid point. Problem is such speculation may not lead to anything meaningful because such guesses just aren't provable or knowable (unless the data itself provides a proof of it or explains it well). It makes straw arguments like the following possible, "It is because X might be doing this but X has no reason to do this and therefore the data is not an anomaly". I am sure you see the fallacy and danger in such arguments.

    The main motivation is to arrive at observations based on the data that provide meaningful support that it cannot be from a normal legit user activity (because of patterns, probability, access to ratings before they are published, etc) that should be sufficient to prompt Apple to investigate.

    If you believe that a good theory of why and how leads to an explanatory behavior of the data, please feel free to post the theory and why it is necessary for such a theory to explain the data and that would certainly help but without that connection, it would be a detraction and easily dismissed.

    Agreed. Would welcome any such observation/thought that would prompt/help Apple to do such an investigation.

    Not saying here whether we have or we have not within what would never be construed as illegal or against Apple policies (don't want to run that risk). Anyone could check this on their own too.

    You are correct but this is not what is happening as indicated by the data. As I mentioned, there are no net removals except on May 21 which is likely connected to the published event when Apple cleaned up. We don't believe that it is related to this activity at all. If that was part of the activity, we would have seen more net removals either before or after and we have not seen them.

    This is what I mean by coming up with speculations that are consistent with, supported by or having an explanation for observed data. Otherwise, it is easy to propose any number of such theories that can be easily shot down.

    Great question!

    How do we know you aren't behind or vested in this activity and trying to obfuscate it with tangential and strawman arguments that remove the focus on what is going on in the data or calling our credibility into question? :)

    The reason for both is the same - it would be counterproductive to such a motive.

    If we had dome something to be caught by Apple, we would have received a friendly letter by now especially after we have been trying to provoke this public interest which would be kind of dumb to do if we had anything to hide or anything that could let Apple boot us out of the App Store.

    In the same way, your participation keeping this thread active in the forum is raising more interest and keeping it alive than just letting it get buried so it would be counterproductive to a motivation of making this "go away". :)
     
  5. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    Continuing with discussion on the analysis of the data itself

    On the contrary. The first reaction from people looking at our logs was NOT that it was a single account. Second, such an activity would not be obvious even to us if we hadn't scraped every hour unlike app aggregation companies that sweep once a day. The algorithm may not have anticipated that and only depended on end of day results/consequences OR as you suggested what could be assumed as the normal Apple monitoring activities. All shady activities have a weak point obvious in retrospect and reason why they are caught!

    Yes, roughly. Our report will make it much more clear and bring out some nuances but we are still analyzing a bit more and will present it to Apple first before posting it here. Meanwhile, I am crowd-sourcing this analysis here which has helped immensely from tips received. There are no net removals of ratings in our theory since that is not consistent with the data overall.

    Not sure I understand your thoughts in the above. Can you please explain independent of the other points in its own post? Thanks.

    The self-sustaining state is a crowded space (and often a fixed pie) and one can be knocked off of it easily (not necessarily because of manipulation). An analogy I can think of (to be taken as a very rough analogy) is soaring in a thermal. In a self-sustained mode in the App Store, the app is like a glider or a bird circling in a thermal and gets a lift without actually having to do anything. But it is also easy to fall off of a thermal with some small deviations.

    This can happen naturally with change in demographics, competition, app updates, change in user behavior or potentially by manipulation if you are close to the "edge of the thermal". Once off the "thermal", apps tend to fall off rapidly depending on their "glide ratio". Don't wish to go too tangentially on this aspect but that is what I meant by small changes can make large differences.

    In theory, one can envision a manipulating activity designed to knock an app "off of the thermal" with just enough changes and let the natural "sink" qualities of the App Store design take care of the rest. So, it would only be needed for a small period (unless the app developer is doing things to counteract that). Such an algorithm is consistent with what we have seen in our download numbers and the ratings since early December (before the start of the holiday season). The performance of an app is not liner in its rankings but almost exponential after the first 10 or so in each category.

    Thanks for looking up the App Annie statistics. That is indeed one of the sources we have used along with our download statistics that only we have access to.

    Will just point out that each country behavior is different and for an app like ours that caters to US-centric type of crosswords (rest of the world has different kinds of crosswords and they don't appreciate the Americanism in US Crosswords much). So, nothing can be inferred from other countries whose traffic is in the noise compared to the US and typically from US ExPats. The rankings work independently in each country.

    Not saying here whether we have or we have not within what would never be construed as illegal or against Apple policies (don't want to run that risk). Anyone could check this on their own too.

    I also need to clarify something that may have created confusion

    I meant Sept 24th, 2014 (last week) above. And I meant the number of user star ratings reported (sometimes, ratings and rankings are confused for each other). With rating, I only mean the reported user * star ratings in the App Store which is what we are measuring.

    One of the motivations for us to post this in public is to see if it would have an effect on the rating data observed. And we may have succeeded.

    The freezing I mentioned is in the same logs I linked for activity. There have been no new ratings (read ZERO) added to our app since Sep 24 despite averaging about 2.5 ratings a day since May 21 and even more before then for the current update along with absolutely no +1, -1 flipping.

    Reasonable people will agree that it is highly improbable that this is explained by normal user behavior unless you are willing to postulate that all users decided to stop rating in unison on Sept 24 and the +1, -1 activity also stopped exactly about the same time.

    There are no significant changes in our download data before and after Sept 24, no significant difference in the number of people being taken to the App Store in our sampling log. And no, it is not iOS 8 use being different. From the logs most of our audience including new downloads (which tend to be more conservative and non-tech savvy than typical gamer audiences) have not switched to iOS 8 yet.

    Possible theories are:
    1. App store ratings are not working since Sept 24. Not true since it is working for other apps.
    2. Apple became aware of this and suspended rating updates to our app while they investigate. This, we would welcome very much since that is the main intent.
    3. IF there was an entity affecting our rating from inside, they shut off the algorithm after becoming aware of the public discussion but not restoring the normal processing or they took the long weekend off for the Jewish New year with new ratings backing up. :)

    I think most reasonable people would agree that there is something not quite right in what is going on even if they disagree with what and how. Apple can solve this mystery in a few minutes by investigating. We will continue to build our case until they do.
     
  6. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    Hi Jez, I will explain briefly as this instrumentation is not our creation but something we picked up a while ago in a forum or a blog or somewhere and the rationale made sense to us.

    This is correct, which is why you might never see 100% "conversion rate". Note that people can rate by going directly to App Store as well which will cancel out some of those, so the number of ratings averaged over a week or more will be some percentage of that log count. But the key statistical insight is that if there is a "conversion rate" of about X%, then this conversion rate will not change much over time UNLESS there is a significant design change in the App Store that affects user behavior. For example, if the URL for the app page suddenly changed or stopped working, you would see it almost immediately in your statistics as the "conversion rate" plunges.

    If hypothetically, only 1 in 10 such people going to the App Store were leaving a rating the rest not doing so for one or more of the reasons you have listed, the same ratio would likely continue to do so with some small variation regardless of the number of people doing so (which would vary with the number of downloads - more people that download and use, more or likely to see and tap on that link). This is how statistical sampling techniques are designed.

    As far as we know uninstalling an app does NOT remove the ratings or reviews. You can see this for yourself by rating an app, uninstalling it and then going to your account in iTunes on a Mac and looking at your list of reviews.

    We also do not believe that deactivating an account removes the rating UNLESS Apple does some housekeeping occasionally. We believe this is what happened on May 21, 2014 related to some clean up of suspect accounts along with deactivated accounts.

    To answer your question, the statistical sampling method does not assume anything about whether someone actually rates or not. But if you had a consistent 1 in 10 conversion over time, you would expect a certain amount of ratings for any number of such entries in the log. So for example, for 43 entries one would have expected 4.3 ratings averaged. When you suddenly see zero ratings, the log tells you that it is not because of reduced usage of your app or the lack of intent to rate the app but something else is going on. Without it, you have no visibility and anyone can claim that no one wants to rate your app. Everyone should do this for their app.

    This rationale made sense to us and so we incorporated it. It also provides rich information about user behavior that you can use to help improve your app design. For example, how much time does it take to get to that stage? What fraction of people are reaching that stage (relative to downloads)? Can you change the design so that more people reach that stage if it means they are enjoying it more, or has a design change resulted in less people reaching that stage, etc.

    In fact, we have had times when that fraction was greater than 1 because people can rate the app by going directly to the App Store before they get to see our link. That would not be logged. So they cancel out some or all of the people that never rated even after being taken to the App Store. But for a given app and demographics and no changes to the App Store design to affect "conversion rate", that fraction should remain the same, even if it is very different for different apps.

    Thanks for the anecdote and support. Very appreciated.
     
  7. Jez Hammond

    Jez Hammond Well-Known Member

    Oct 20, 2012
    50
    0
    0
    Hi Walter, I am familiar with analytic software though the popular one which I used seemed to have the majority of actions 'not get through'. I think a more direct approach (hosting a server) would be far superior than a patchy free service (my experience is again a couple of years old fashioned).

    Agreed there should be at least some conversion rate from a proposed rating. I can only imagine that what you are reading is (again) out of synchronisation. It's like a bubble-sort where the data has a multi-pass journey that first must be fulfilled before the result is valid (for whatever reason on the store, maybe extremes are considered differently than 2s and 4s). Better still, think of screen-tearing (when no V-sync is present) except on multiple levels: what you would see is (at least one) random sample(s) from at least two moments in time. So the same would apply to prematurely reading the results of a sorting algorithm. Now if you [rating provider] have a network of incoming changes then they will arrive in a random order (timing of sample location, latency, etc.) where no single sample will contain the full picture, it can only adjust the current 'big picture' which is approximately based on that particular sweep [and adjusted every 400 years plus every midnight in a European country house haha ; ]. Though I still find it to be extremely unlikely to find 10s of missing ratings over a day or two, over an hour though I think is quite plausible because at the end of the day your average is not diminishing at all. It would be futile for anyone to try and adjust an average either way, though I could speculate on exceptions I don't want to give any potential malicious types any ideas (I presume we all write here with this in mind, and hence continue to keep an open mind about the whole scene and even mankind).

    Agreed that (normal usage) uninstalling is not what is going on here. Deactivating account would not be it either, unless big A did so with an array of suspect accounts. It would be in everyone's interest not to disclose success rate, else we could end up in a situation where 'the invisible men' survive by natural process! Better to play them at their own game and mostly conceal investigations imo.

    I think it is safe to presume that nobody rates *iOS* apps in iTunes-Mac/PC except for people that hate touch-screen-keyboards! Yes the conversion rate would remain the same, which brings me to another consideration where some obscure app change could provoke a major change in rating habits - that might be something like "ask me later" timeout was adjusted and now falls out of the sweet spot. I've seen players change ratings on my game from 5* to 1* yet claiming they used to love it :/ it's like expecting common sense conversation from a parrot haha.

    Glad to chip in on this, and happy to help discourage any malicious things in existence because they ruin people's lives big time and it's just plain unsportsmanlike if we lose our current level-playing-field that allows everyone a chance to publish their efforts. Keep up the investigating, I think people are curious to see a result either way - but most will not comment on such topics unless something has happened to them personally. This is life.
     
  8. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    Leaving no doubt that there are things going on within the App Store that is affecting the app, the user ratings for the app completely froze (no ratings left during this week were added to the app ratings) last week for exactly a week and 29 ratings appeared in the same sweep this morning after all of them disappeared for a short period. Perhaps the algorithm/entity "managing" our app ratings took a vacation. :)

    Code:
    Oct 1 08:30:01 |18 9 1 0 1 |4.6104
    Oct 1 16:30:01 |0 0 1 0 -1 |4.61142
    
    and the +1, -1 pattern has resumed as you can see above

    Jez, regarding your comment about sync problems, the above is not a sync problem. What we are measuring is more like packet loss than packet delay and not affected by delays when measured and averaged over time.

    Also regarding your comment on other developers not commenting, perhaps not publicly but we have been quite heartened by the private communications we have received that has helped us progress on this.

    For example, a helpful and smart reader of a forum where we posted, imported the data to an Excel spreadsheet to visualize it that showed a better way to look at it than the raw numbers and combined with the tip from another that there may just be obfuscating behavior, allowed us to process the data that led to a breakthrough.

    As a sneak preview, if we plot the average ratings over the lifetime of that log including all ratings registered, we get the following graph

    [​IMG]

    The downtrend is visible but not much details of how exactly it happens.

    However, if we filter out the ratings from the single account moving from rating to rating at almost every sweep with no net additions or removals, we get a much clearer picture like the following:

    [​IMG]

    This allowed us to notice some very interesting repeating patterns (in the shaded areas) based on what appears to be a triggering activity and a good idea of how the ratings declined algorithmically (and clustered) during that time. The details of what this reveals will be the center-piece of our report.

    Tips received from forum members have indicated a lot of suspicious patterns they have noticed for themselves before I posted this thread, not all of which we will be able to explore or verify on our own. But even allowing for some paranoid suspicions, there appears to be enough of a feeling amongst more than a few that what is happening in the App Store ratings is not strictly on the level both for apps doing well and apps who mysteriously lose downloads/ratings.

    We are just not sure what is real and what is not real in the App Store any more!

    Some of the more credible observations (which can be verified in a few instances) and unrelated to our logs we have received from readers include:

    1. App rating totals that happen to go up in each sweep with such regularity (seems almost always multiples of 2) that it seems automatic. Reviews appear like a quota system with regularity (e.g., exactly 2 or exactly 3 reviews per day) often all in the same sweep. Apps go through this phase for a fixed period and then suddenly nothing even if nothing else has changed.

    2. One liner generic reviews that appear with regularity for certain apps which appear to be automatically generated and very often connected to the quota in 1 above. Looking at the review history of the names associated with these reviews (in iTunes) show a pattern of reviewing apps with similar review quota in operation and these reviewing accounts all appear to have similar delays between reviews (typically many months). There is a strong possibility that many of these are all fake accounts and come from a large pool of accounts used to create those reviews. We are working on a script to crawl through these kinds of short reviews and form a graph of the apps related via such reviews (and their rankings) just out of curiosity.

    3. Atypical rating activity clustered around Tue evening/Wed early mornings or month ends. Not sure what happens in the App Store on Wednesdays or at the beginning of the month. Coincidentally, the freeze for our app reported above also happened Wed-Wed.

    It gets curiouser and curiouser...
     
  9. Xammond

    Xammond Well-Known Member

    Mar 22, 2014
    168
    0
    0
    UK
    Quote "4.6"

    Alright, so you are utterly convinced about something which is fair enough. I do think though, that scraping data is going to be flawed depending on when exactly the scrape begins for example. How do you know that by the time you complete your sweep, that the data hasn't changed even in those seconds? (not an immediate repeat scrape from you, but from Apple). Even a person attending a barbers would find that the first hair that was cut will have grown in length [by the size of 3 Olympic sized swimming pools divided by a gazillion...] by the time another hair is cut. Though on average they will all look correct even to the trained retinas. I mean in code a mutex would be required to perform such a task as reading/writing to same memory at same time, else the app would crash to desktop.

    I guess you must just hammer the servers looking for 'magic frequencies' where the blanking period might be or whatever. I think only Apple could provide a (as close as possible) perfect scrape, or at best offer a time slot when user-scrapes should be performed - so that could be something like 5 minutes to begin a sweep, the next 5 to allow all to finish, the next 60 as current (scrape if you like), so that would by a 70 minute cycle. But I expect there is some mystery to such a cycle and for whatever reason it remains an unknown and probably inconsistent.

    Your diagram isn't really doing much for me if I'm honest. I would like to see just the deltas when subtracting normal usage behaviour, the little 0.0s which are adjusting the 'downtrend' that you mention of hundredths of a percentage over six months... Sorry if it sounds a bit loose of me but this is for sure analytical. I guess your resolution of telemetry is much more tell-tale than the shaded image posted. I would say try to show a discrepancy regardless of polling frequency. And then examine the same data without the observed 'adjustment bureau' changes, does doing that over time return the missing percentage, or any change at all. Because -0.03 percent 'aint too shabby over half of one Sol lap!

    I wonder, what if you were able to use the +/- as deltas for re-syncing (if it was/is a sync thing). So a +1 would mean that your rating either increased or caught up, and a -1 would mean either unlucky or a 'blanking' period passed during that scrape. If my sync theory was correct then eventually things could settle to an expected amount - though it seems that we expect different things.

    I am not saying for one moment that there aren't criminals on the net! If this many people existed in my street then I would not have any doors or windows on my house because for sure even opportunists would milk the place in no time. So of course there are going to be many external cases of numbers not adding up as expected. I'm just not convinced that this is one of those cases due to the negligible amount of change being reported.

    Finally, some specific thoughts on those external cases which you mentioned: 1) Could this be like when a new app doesn't show the stats initially ("there are too few blah") - Or "we are leading this dance, please wait". 2) If often connected with 1's quota then hm this reminds me of when I had a WinPhone. 3) So I should release my game on Thursday like every one else then :)
     
  10. Xammond

    Xammond Well-Known Member

    Mar 22, 2014
    168
    0
    0
    UK
    (sorry, logged in to wrong account -> same person though!)

    Jez
     
  11. Xammond

    Xammond Well-Known Member

    Mar 22, 2014
    168
    0
    0
    UK
    #31 Xammond, Oct 2, 2014
    Last edited: Oct 2, 2014
    A couple more observations which my sub-conscious churned out...beets thinking about 'other things' on the side.

    Rough extrapolation predicts that the rating will fall from a visible 4½* (4.6) to 4* (4.24) in just 4 months (Jan 2015), and to 3½* in May - so effectively losing 1 star per year. That might be understandable if no updates were posted, but perhaps not with updates.

    However, the negative effect itself appears to be weaning/fading itself away over time, so I predict 4* in May, and 3½* in Sep'15 -> by then you could use these efforts to produce a masterpiece...either a new game or new forensic methods should emerge at least!

    Observing the (unshaded) graph while listening to Aphex Twin's latest is interesting! It's almost like handwriting (regardless of spline)...You could also probably flip it and find that a trace overlays a potential perpetrator's stats. Or at least detect algo fingerprints (as above). BUT if the proposed algo exists and is internal to Apple (albeit unknown to them or not) then the prints could be as hard to find as my evidence was: I mean if I had a store which simply showed the top 200 then I would have a few screenshots laying around right, so if someone said "oh look at day 2 of x's [BubbleSand] release and you will see that the icon magically changed to iTC's placeholder image" then it would be mere moments to confirm right?...nope, such technology (prtsc) must've not existed. Gah, good luck with it.
     
  12. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    Hi Jez,

    The ratings data processing is not as complicated as you think it is.

    Apple servers update the rating once every hour at the most at the top of the hour through some kind of batch processing/update. The servers are also in sync 99%+ of the time. But we sample at 30 minutes past just to be sure all servers are in sync.

    It is not real time and stays constant through the hour. So the minimum sampling period is an hour and you can sample it at any time during the hour. No race conditions, or sync or inconsistent state issues here.

    It other words, not much scope for measurement errors here.

    As to the magnitude of the change, there are two points to note.

    1. Without knowing the sensitivity of the consequence to the changing parameter, the magnitude itself does not mean much. From science, we know that the low magnitude of change in the inclination of the earth, tectonic plate movements, observed Doppler shifts in light, temperature increases at the Poles, etc., don't make them irrelevant. It is the sensitivity of the consequence to that change that is important. We already know from published statistics that change in rankings have an exponential decay in downloads/sales. The behavior during the holiday period for this app (where one would normally expect a bump in the volume) indicates a significant sensitivity.

    2. The change is small at the moment because of the good reception this app had at launch. But if a similar manipulation were to happen at an update before the app had substantial good and real ratings, the app could be swimming in the 3-4* range very quickly destroying its visibility due to ranking. So, if this manipulation could happen, it brings into question the viability of doing an update where an update can potentially kill an app unless one took sufficient defensive measures in terms of marketing, encouraging real users to rate it, etc. We suspect many apps get killed this way very early. It is in all of our interests to not have such activity be possible in an App Store.

    We are just following a scientific approach to this than indulge in conspiracy theories or unsubstantiated claims.

    1. Design the observation infrastructure and check its validity and repeatability. In this case, the observation has been checked with alternate means of looking at the same data as well as consistency with third party scanning that is the basis of considerable business models for those third parties. This observation is repeatable and verifiable by anybody and the data continuing.

    2. Once you have confidence in the observation process, do you notice any anomalies in the observed data? The answer here is yes indeed particularly because in the period since May 21. Can this be an artifact of the measurement itself? The answer is no because of use of control subjects (other apps) of varying characteristics. Any measurement artifacts should occur in the control subjects as well. The more such control subjects you have the greater the confidence level. Is there enough of a sample space to provide statistical significance? Yes.

    3. Is there a hypothesis to explain the anomaly? Yes, we do have a strong fit. Are there alternate hypotheses that can also explain the same anomalies? Not to the same extent unless you are willing to ignore some glaring anomalies and even then they involve many more assumptions that cannot be proved or supported by observed data.

    4. More important, does the hypothesis provide a predictive model of future behavior that can be tested/observed (not what real users will rate it for which there is no predictable pattern but what the manipulative algorithm hypothesis will do in response to the number and magnitude of ratings coming in and the changes in average rating)? And this is the ultimate test. We certainly do.

    The latest pattern like

    Code:
    Oct 2 02:30:01 |1 1 -1 0 0 |4.61233
    Oct 2 07:30:02 |-1 0 0 1 0 |4.6108
    
    is a predictable pattern in this hypothesis and the hypothesis explains exactly why and when this is likely to happen before it happens (as long as Apple does not freeze ratings like recently).

    This is the case we will make privately to Apple first and if we get nowhere to the media outlets.
     
  13. Xammond

    Xammond Well-Known Member

    Mar 22, 2014
    168
    0
    0
    UK
    Hi Walter,

    Thanks for listening to my comments I have been fascinated by it all, but it seems quite specialised even on the public forum so enough from me, lots to re-read with new perspective. Cheers for the detailed posts.
     
  14. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    Hi,

    What are the other forums you've posted on?
     
  15. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    Just for confimation's sake, in the logs, the columns on the left correspond to the four and five stared ratings, or the low ones?
     
  16. augustiner

    augustiner Active Member

    Jul 7, 2011
    27
    0
    1
    Be careful of any assumption that a user can know whether their review is included in the average for an app. Apple has a number of compelling reasons to not tell a user whether their review is part of a computed average even if it looks like it's still active. For one, it would only make the arms race against app store manipulators that much harder.

    Likewise you don't know for sure whether every text review coupled with a star rating is included in average. To further complicate things, Apple might weight some reviews greater than others.

    Putting myself in Apple's shoes, my ultimate goal would be to have the most accurate rating system so that end users have the best possible experience. Any "manipulation" done would be purely to counter some other artificial influence. Any smoke screen would be employed solely to make it harder for the unscrupulous to understand how they are being detected.

    You mentioned Occam's Razor, but seem to prefer very convoluted explanations over the simple. I'm reminded of those who claim that professional sports are rigged when the leagues have everything to lose if credibility is lost and virtually nothing gained (individual players are a somewhat different story).

    Remember, Apple has everything to gain by promoting quality apps and taking their cut of paid conversions / IAP. They have everything to lose if they promote garbage apps. On a personal note, I have had much worse experience with the Android Play store in terms of being able to trust the rankings compared to the App Store.

    The last possibility you should consider is that Apple wants something of a Bell curve (or other arbitrary distribution) of ratings across the store or categories and will make small adjustments over time to maintain or achieve this.
     
  17. Jez Hammond

    Jez Hammond Well-Known Member

    Oct 20, 2012
    50
    0
    0
    Seems they throw out the subjective trash so I doubt it's that.
     
  18. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    The review, as part of an average? The text?
    I understand Apple would try to make things hard to get, but the simple fact that they don't explain how the ratings work is already enough on itself, and you probably don't want to have to deal with the lawyers of a company that grosses three digts billions a year.

    I think there's a simple way to know that: you calculate all sums of each column and see what average you get. It should be settled very quickly.
    If anything, it would easily reveal if Apple attaches a coefficient to each rating based on the presence of an attached review or not, its content's value, the posting date, the value of the user who posted it, etc.

    Rigged is a bit extreme, but if you want to pick real life examples, we have enough drug-using scandals by now in the euro football leagues to know that there's nothing wrong at all about being a lil' paranoid and asking yourself such questions.

    Still, digression aside, I don't think Apple themselves or some rogue bloke inside Apple are behind that.

    Err no, that I don't buy. It would be easily spotted and can you imagine the onslaught if it's found that after claiming ratings and reviews are the be all and end all of the ranking systems, and went ape shit on some devs for cheating the store, they reveal that they take the liberty to "adjust" yours? That would be a direct manipulation of the market itself, something that could actually balloon into big troubles for big A.
     
  19. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    @augustiner, that seems like an admirable defense of Apple for charges never made!

    If I understand what you said correctly, Apple would never do any ratings manipulation because their reputation will be on the line but if they do it will be for good reasons like preventing gaming the system and their own business reasons like creating a bell curve (even if it might be at odds with developer interests).

    First, I think it is premature to speculate who and why let alone defend it before we establish the what. None of the issues you have raised affect the determination of the what. I am not sure what convoluted theories you are referring to. Occam's razor was mentioned in the context of an explanation for the +1, -1 continuing activity as having a single account behind it (which is the simplest explanation) since any alternate explanation would require a rather improbable set of coincidences to explain the data as described in the thread posts. Try formulating such concrete alternatives and you will see it right away.

    Second, as Pixelosis pointed out very well, I doubt most developers will be OK with your latter assertion not to mention Apple opening itself to the largest class action lawsuit in history if they did the kind of things you suggest they might be doing. It is one thing to promote apps they want, but if they drop legit user ratings or filter them or count some more than others to make a curve fit that will in effect demote some apps and does not truly reflect user popularity and feedback for such apps, and this affects revenue for the developer (which is easy enough to establish), they will be inviting hungry lawyers salivating at the deep pockets.

    So, I doubt very much that they will consciously and intentionally try to manipulate ratings in any of the ways you are suggesting.

    One user, one rating. What you see is what a legit user rated/reviewed. No more AND no less. This is the best system for all concerned and the implicit understanding of the App Store. We are documenting what appears to be a deviation from this ideal. The Who and Why can come later.

    @Pixelosis, yes the order is 5* to 1* in that order as detailed in the first post.
     
  20. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    There aren't any risks to be taken. All you need is to use valid accounts and do the very quick tests. For example, each one of the team members of your development team, plus friends and family members, can rate the app and you keep an eye on what happens.
    Try to have ratings grouped at the same time to see if the removals rise accordingly or not.

    So would be the point of a bait-thread, that at least some people respond, before perhaps some posters would consider the plausibility of a trick, you see?
    The buzz doesn't require any good analysis, it just needs some rather wild claim in the hope of attracting attention that way, drowning readers in a mass of data and opaque claims. And if your claim was proven wrong, what's so bad about it? Oops, you just panicked because you didn't read your numbers properly.
    Now, let's be clear, I never claimed that you were good at generating buzz. ;) If anything, by the looks of it, I'd consider that it has massively failed (aside from me responding here and bumping what looks like a hopeless thread by now). I couldn't blame you though, times are harsh and such methods are funny and perhaps worthwhile.
    My final word on this subtopic is that I give you the benefit of doubt.
    For the moment.

    By the way, I asked you twice what two other fora you posted your thread in and thus far you missed (or dodged?) the question both times.
    Is here any reason why, despite my precautions to post this question in such a way it wouldn't get drowned in a wall of text?

    This isn't important anymore. It was a mere tangential idea but I have since acquired more information that nicely allows me to discard this consideration.

    This activity would only be possible with a centralized action-system operating right at the base of Apple's servers. You don't even need to lose good ratings, merely need to stagnate while other apps get more ratings, which will be mostly positive anyways. It will make your app look dead and largely ignored if no one appears to be opinionated about it.

    So basically we may largely limit the study of your case to the US market and people who talk english or english-spoken markets of influence.
    As I pointed out earlier, the app is certainly not suffering from the multiple removal of ratings in the US market. If anything, it's getting higher into the self-sustaining region, away from the dangerous edge.
     

Share This Page