Help needed in deciphering strange App Store rating activity

Discussion in 'Public Game Developers Forum' started by WalterM, Sep 1, 2014.

  1. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    Dear Friends,

    This is an appeal for crowd-sourcing help in trying to understand the strange behavior in user ratings for an app of ours happening since May 21, 2014. We know that there are some brilliant minds out here that can detect patterns and discern algorithms much better than we can.

    Background: We suspected that there was some persistent "manipulative" activity going on against our app for more than a year (not the typical occasional hit reviews every app gets) and started some extensive logging and testing of ratings and reviews. We will have a full report later this year on our findings, which to us are disturbing concerning the integrity of the Apple app Store rating/review system.

    But, for now, rather than predispose you to any conclusions, just want to present one type of raw data we have been collecting to leverage your minds/skills to arrive at an explanation for this data and what it might suggest to you. It will be obvious when you look at it that it is very odd at the minimum. Any thoughts on what might be happening, ideas, suggestions for further testing/logging are welcome and much appreciated.

    If developers here can do similar logging of data for their own apps and confirm whether it is happening to them as well (which might indicate a bug rather than an intentional activity) or not (which would imply it is targeted), it will be helpful. Getting to the bottom of this is for the good of the entire community especially if malicious activity is the cause. We have benchmarked against a number of apps and have not found this in any of those apps.

    We will be happy to publicly credit any help/findings/insight here from individuals in our report to be publicized later in the year.

    We have tried to bring this to the attention of App Store folks at Apple but have only received form responses ("Users may delete or change their ratings at any time"). You can convince yourself that this is not the case here.

    The app is a niche market one in crosswords: Across Lite Crosswords (not something that would be an obvious target for ratings manipulation)

    https://itunes.apple.com/us/app/across-lite-crosswords/id480513184?mt=8

    The logging we have been doing in real time and continuing can be found at:
    (although once we have made this public here, the behavior will likely change if what we suspect is true and so the patterns prior to this date and time will be the most valid to study)

    http://icrossword.com/monitor/

    The logging is done as follows:
    1. At 30 minutes past every hour we fetch the number of ratings for each * for the current version of the app via http call to iTunes (in the same way third party app store listing services do, nothing unusual here).
    2. If there was any change in the ratings (the ratings sweep at the App Store typically happen at the top of the hour) from the previous fetch, we record the deltas for each * ratings along with the time stamp and the current average rating in each line listing deltas from 5*s to 1* in that order.
    3. The logs for many weeks prior to May 21, 17:30 PDT (exactly when this odd behavior started with a large number of ratings disappearing in the same sweep) can be found at

    http://icrossword.com/monitor/archive.log

    The prior archive does not have time stamps but is the same procedure with before and after average ratings for many weeks. This log provides a baseline case for the kind of ratings we were getting prior to this date, so you can see the stark change in behavior since May 21.

    The odd behavior obvious in the above is the persistent behavior of a rating hopping from one rating to another in almost every sweep that appears algorithmic and intentional than random or related to normal user behavior of changing their ratings which occurs very rarely in practice let alone several times every day for more than 3 months.

    We have noticed several patterns here that rule out randomness and show particular intent but will leave it to the best minds here to come up with their own independent conclusions. Be happy to share our observation in an ensuing discussion.

    Here are some specific questions for now:
    1. Do you think this is random or a bug or an intentional automated/algorithmic activity? What is your rationale for that opinion?

    2. Do you think there is an intentional attempt here to affect the average ratings either to hold it down (sort of a "rev limiter") or even to slowly walk it down in a way that is not easily detectable (note that average ratings affect search rankings on the app store and so can affect the downloads of a product that depends on App Store visibility)? We have noticed some patterns in this "bot behavior" correlated with an increase in average rating and in appearance of ratings that uncharacteristically appear to reduce the average and totally out of character with the weeks before May 21.

    3. If this appears to be an intentional activity, is it possible for someone to do this from OUTSIDE the App Store rather than it happening from inside the App Store itself? Not concerned with why anyone might want to do it for now.

    Many thanks for your help and indulgence in this curious matter.

    Look forward to hearing your best ideas and brilliant deductions....

    Our apologies if you saw this in another forum. We selected the top 3 forums which we thought would reach the maximum numbers in the community that can help in this effort, not to spam any and all forums.
     
  2. OnlyJoe

    OnlyJoe Well-Known Member

    Sep 29, 2013
    114
    0
    0
    Auckland
    I am not sure how to really read the data you have given. But it looks like your average rating is pretty much the same. So how are you matching up individual ratings to see if they have actually changed? I would assume you are using the username of who did the rating, not the order you get them back (because the order they get returned is going to change every time, its a very active database). And it makes little sense for someone to run a bot that keeps your rating the same.

    I know that apple is always removing reviews for various reasons, most common is that the user has given a high review, and then deleted the app. Pretty sure apple expects the app to remain on the persons phone for awhile after giving a 5 star review. This is to stop fake reviews, or companies that offer mass review services.
    Or a review is given, but the app is never actually opened, or the review is done in iTunes on a computer, and the app never installed on a device. Apple seems to have made a lot of improvements to how the reviews work recently.
     
  3. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    Thanks for the reply and the opportunity to clarify. Sorry if my post was not clear on this.

    Each line represents the new number of ratings removed or added since the last sweep when there was a rating added/deleted. So, for example,

    Sep 1 09:30:01 |1 0 1 -1 0 |4.61651

    denotes 1 new 5* added, 1 2* removed and 1 3* added since last change in ratings count and this happened at the ratings sweep at the top of the hour at 9AM on Sept 1. The total number of ratings do not matter. It is the deltas when new ratings get added or deleted.

    The latter may happen if for example, someone changed their 2* rating to a 3* rating OR someone's 2* rating was deleted for whatever reason and someone else added a 3*.

    While one might think this latter case is possible (and it does happen perhaps occasionally), such a matching addition/deletion happening several times a day, everyday for more than 3 months and with more frequency than actual user ratings is extremely improbable to the point of being impossible. As I mentioned, in my earlier post, this does not happen for any app we have similarly tracked. Why would our app be different?

    Moreover, it started to happen exactly from the sweep on May 21 at 5:30PM for our app but not for months before it.

    If this was "normal", then it should be happening to more apps than not, no?

    On the other hand, if you take a hypothesis that it is a single rating that is hopping around algorithmically, it makes a lot more sense, especially compared to the period just before that May 21 "event".

    So the more likely explanation is that it is a single bot rating that for whatever reason is hopping around. It is that reason we are trying to determine and asking for help.

    We have noticed certain patterns around this hopping that appears to be algorithmic. Just this frequent hopping observation alone should raise eyebrows.

    It is not random and how it hops around is influenced by the ratings from users as they come in. So, for example, if the ratings start to go up from good ratings, the hopping behaves differently than if the ratings were not going up. When it hops to a 5* rating itself and jumps back, there is a correlation with what happens to ratings (there are some uncharacteristic lower ratings that immediately follow to lower the average) and it happens regularly enough that it does not appear to be random. How does this "bot" know that there will be lower ratings in the next few sweeps when it moves to a 5* unless.... :)

    This is what we are trying to determine, because it appears to have some carefully designed logic/reasoning behind that hopping and not as crude as spamming an app with 1* ratings. And there is enough data in there to get statistical validity that it is not random. For example, it has never hopped to a 4* and back with one exception over 3 months that may be an actual user changing the rating. We have our own hypothesis why but do not want to pre-dispose others to it yet. A random behavior over that length of a period would make it highly improbable.

    The ratings have been monotonically decreasing which can happen to any app but with this strange activity, we would like to determine if this activity is contributing to it or at the least limiting the ratings from going up.

    It is statistical forensics.

    We do not need to know who left what ratings for the above observation. There is no way to know.

    You are right, except that it makes sense if the activity contributes to the ratings not going up (and hence affecting the app's search ranking which does not round it like what is published on the app page but the latter is a different topic) or it contributes to the rating going down slowly over time so as to be unnoticed and mistaken for normal activity while it should be going up to a higher level.

    We do not depend on search ranking inside the App Store for downloads (although it is affected by it) as we have brand awareness outside the App Store that leads to downloads within our niche market. But for an app that entirely depends on being found via search in the App Store, such a behavior that reduced its ranking slowly would death spiral very quickly as lower rankings reduced its visibility which led to less downloads which resulted in less ratings reducing ranking further, etc. This is the reality for most indie apps because they do not have marketing presence. So, if this activity is manipulative, it can quickly make apps die in the App Store. This is why we believe it is important to investigate, not just for our sake but for all developers.

    It would be unnoticed except for this odd activity that is targeting our app.

    You are absolutely right on this. Except that it does not explain why such "user behavior" should be happening to our app with such frequency but not to any app within our benchmark sample that allows for similar audience in demographics. .

    Anyone can verify this for their own app, which is why we have asked people to check if it is happening to their own apps. If this was part of the normal Apple cleaning up, then it should be happening to more than just one app regularly, no? We have not found one yet within our sample space. That is very odd.

    How many people have seen 1* - 3* ratings appear and disappear for short periods for their apps every day and more often than ratings that stick around? We would like to see if anyone notices the same.

    Thank you for taking the time to look at the logs and reply. Appreciated.
     
  4. HLW

    HLW Well-Known Member

    Sep 2, 2010
    1,355
    0
    0
    UK
    Is this true? If so, that's a shame - I pretty much only rate games in the App Store once I've completed them so I often delete them soon after (or even before) I rate them.

    Sorry WalterM I can't help you with your problem.
     
  5. OnlyJoe

    OnlyJoe Well-Known Member

    Sep 29, 2013
    114
    0
    0
    Auckland
    There is another possibility. And that is fake review bots. Not saying that you are using them, but maybe another app on the same search terms, or category is.
    What fake review bots do is give a number of random games a review, say 5, to try and mask the actually game they are trying to review. Would look kind of obvious which app paid for the review if all the bots just reviewed that one app. So they tend to give random reviews to a number of other apps. I have had some random reviews show up on some of my games before, which then later get removed (by apple, I assume).
    So what you might be seeing is fake reviews on your app, and then apple slowly catching up to remove them.
    And because your app has a reasonable number of reviews, and is in a category where the get rich quick app builders are going to go, there is a high chance that your app might be used to try and mask fake review bots, or to make them seem more real.
     
  6. Destined

    Destined Well-Known Member

    Aug 11, 2013
    1,063
    0
    0
    That was my immediate thought.

    Apple isn't doing anything to intentionally hurt your app. Not in their interest.
     
  7. Jez Hammond

    Jez Hammond Well-Known Member

    Oct 20, 2012
    50
    0
    0
    This sounds close to what I think is happening. A player makes a spontanious decision to rate low quite early on - say when some bug effects their experience - then uninstalls the app. The system still processes the actions over a longer period of time, rather than in real time. That's my feeling.
     
  8. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    As an update, we have progressed quite a bit in analyzing the data thanks to some helpful reader tips and an extensive analysis of the data from a member of another forum.

    We are able to establish fairly conclusively that an automation/calculated activity of some sort is at work here and have a couple of hypotheses that needs to be tested to see what the intent might be and where the source of this might be (e.g., from outside the App Store or from inside). We have been able to rule out normal user behavior being behind this. Just a matter of time before we get to the bottom of this.

    We have also contacted a number of media outlets that find the data very odd and interesting and some willing to publish the evidence and conclusions if and when we have a compelling report. That should get Apple's attention for sure, since we have not been able to do so by direct e-mail because this is such a niche app from their perspective.

    If anyone wants to get involved in this group effort and help test out a couple of hypotheses that will make this process go faster, please PM me. We need people with US App Store accounts for this. People who are willing to actively work on this rather than those that are just curious to know. We will have updates here as we learn more for the latter and will not be able to answer a lot of e-mail for that purpose. Hope you understand and very much appreciate any offer of help and active co-ordinated participation. Nothing that violates any Apple policy to do here.

    If we were using any fake ratings/reviews ourselves, we would hardly be the ones putting out the entire logs and asking for analysis and trying to get Apple's attention to look at our rating activity. :)

    I think you mean user * ratings than (text) reviews. What you suggest is possible. However, the evidence from the pattern seems to point to a single source rather than multiple sources. Otherwise, statistical probabilities with multiple sources acting independently would result in numbers more than just 1 and -1 in this pattern with this consistency over more than 3 months. Moreover, the removal and re-rating in the SAME sweep period (several times a day for over 3 months with a correlation close to 1) would preclude Apple's action in removing ratings subsequently. How would these sources know in which hour's sweep the rating is going to be removed, so they can re-register again exactly in the same sweep, not before and not after?

    The pattern indicates that it is the same source/account removing and adding a rating in each sweep. Otherwise, statistical probability would stagger these between sweeps if they were from two independent sources rather than the addition and deletion in the same sweep with a correlation close to 1.

    We agree with you. But we are not assuming or ruling out who or why at this point. It is for Apple to investigate and find out what is going on when this activity is made evident that questions the integrity of the App Store and the lack of protection around access to their rating web API not to detect and block what appears to be a single source of continuous activity, several times a day.

    Perhaps, our app happens to be just a randomly picked test app for some external entity perfecting an algorithm in stealth using a niche app for wider scale ratings manipulation later, in which case investigating this now might prevent a large scale attempt later on. The reason isn't really important for us as much as establishing that there is user unrelated activity that violates App Store ratings integrity in a particular fashion.

    If it is coming from inside the App Store, it could be any number of other reasons - a buggy or a runaway ratings or approval software that has got stuck for some reason with our app, a rogue App Store employee with a vested interest that has found a way to do this undetected for whatever reason even if getting found might result in being fired, etc. Not saying any of these are the case. Apple can investigate and determine this in a minute or two at any time. We just need to catch their attention with strong evidence and perhaps a media report that it is anything but normal user rating activity.

    This is what we thought at first, but this can be statistically eliminated. Some of the probabilistic reasoning was provided above that precludes multiple sources of ratings acting independently. The statistical correlations (e.g., one single rating added or removed per sweep) are too high to indicate multiple sources.

    1. Users for our app are not different for other similar apps and there are a lot of apps similar to ours with their own bugs. So, one would expect to see this or something similar for many apps. The fact that we have not found a single instance of this yet (anyone can do this scraping for any app since it is all public) seems to indicate that it is something targeted at specific app(s). And why would they suddenly start behaving differently starting exactly May 21 when there was no update to the app or any device/iOS release that may have changed the experience?

    2. We have also found a reasonable but not perfect statistical correlation between the average rating and the direction of this activity (addition and deletion to a higher or lower rating). The difficulty is the "noise" introduced by normal user activity. This is why the logs in the prior period (before May 21) activity is important. They allow statistical subtraction of this "noise" from the later period, to expose the pattern more clearly for analysis.

    But it does needs some hypotheses testing to deduce the intent or algorithm, hence our request for participation in this investigation.

    Anyone who finds scientific/mathematical forensics interesting and have an aptitude for it should like this challenge... :)

    Thanks for your interest in this matter.
     
  9. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    #9 Pixelosis, Sep 25, 2014
    Last edited: Sep 25, 2014
    First of all, thanks for the clarification on how to read the data presented in the logs.
    First log: damn lots of -1 everywhere, for any score. I'm not used to see so many ratings being removed per day.

    Another odd pattern is that every single time you have at least one rating removal, there's one rating that's been posted (a +1 in other words).
    You never get a -1 that sits alone in a row with zeroes in all other columns.

    As for the -2s:

    Jun 27 04:30:02 PDT |-6 -2 0 0 0 |4.62069
    Jul 13 12:30:01 PDT |-2 0 0 0 1 |4.61506
    Jul 25 05:30:02 PDT |2 1 -2 0 1 |4.61739
    Aug 6 21:30:01 |-2 -1 -1 1 0 |4.61305
    Sep 4 10:30:02 |-3 -2 0 -1 1 |4.61409
    Jun 27 04:30:02 PDT |-6 -2 0 0 0 |4.62069
    Jul 13 12:30:01 PDT |-2 0 0 0 1 |4.61506
    Jul 25 05:30:02 PDT |2 1 -2 0 1 |4.61739
    Aug 6 21:30:01 |-2 -1 -1 1 0 |4.61305
    Sep 4 10:30:02 |-3 -2 0 -1 1 |4.61409

    They tend to happen in timeframes when there aren't enough rating gains to counterbalance the loss, contrary to the -1s.
    So everytime you had a -2, you were losing at least one rating in total.

    Also, the occurences of zero sums are also numerous: you'll often get just a+1 in one column matched by a -1 somewhere else in the same row. In fact, I don't think I've spotted one single -1 in a row which didn't sport at least a +1 as well. But these two events could be unrelated. It could be simple fluke that every single time a rating is removed, someone else also posted a rating.

    However, globally, this does tend to make me think a change of mind attitude that repeats itself way too many times.
    That's kinda absurd, I concur that humans wouldn't do that, it's beyond silly. It's way too regular. Sometimes you get a refresh every 8 hours, sometimes sooner (like 3 hours for example), leading to a greater concentration of rating loss per day.

    Either someone's playing games with your ratings (what would the effects be? -1s are founds in any column: bad scores are removed as well as medium or good ones) or there's a constant correction occuring from Apple but that would be odd, meaning they'd be targeting your ratings because they think something's fishy about them (and I think you'd have gotten an official notice by now).

    Besides, your app isn't free and quite expensive. This usually turns out to be a huge pay wall. I'm not sure how many ratings you get per download in bought games, but if it's anywhere like in free games, I think you'd get a very few rating per hundred downloads a day.
    There aren't peaks of ratings either, save perhaps for Jun 27th, July 26th but it's all relative and overall this might normal. Again, I don't have your d/l numbers before my eyes.

    Other notes, regarding the first log document.
    Large quantity of rating removals:

    May 21 17:30:02 PDT |-13 -3 -1 -1 -1 |4.63345 //what the hell happened there? did you buy some ratings and they all got nuked by Apple? :p
    Jun 27 04:30:02 PDT |-6 -2 0 0 0 |4.62069 //interesting that one of your peaks of high # of ratings also matches a massive loss of them; this didn't happen for July 26th's peak of rating gain.
    Aug 6 13:30:02 |-6 -4 0 1 -1 |4.61322
    Aug 6 21:30:01 |-2 -1 -1 1 0 |4.61305 //bad day, you lost a total of 15 ratings on these two chunks alone, to a total of -19!
    Sep 4 10:30:02 |-3 -2 0 -1 1 |4.61409

    Also, if you find a -1 in a row, in most cases, but not always, you'll find at least one rating gain in the same column in the former row. That's the thing I'd retain most (let's call that point A).
    I'm not sure if the bot targets anything, although the algo might reveal things I can't see. The -1s really seem to target bad and good ratings.

    I suppose you already know how many removals you get per each column in total. Have you lost a majority of good scores, average ones or low ones?

    Besides, the higher negative losses (-2s and more) don't seem to try to stalk gained ratings. They don't discrimnate and can attack columns which have zero rating gains that day.
    But unless they're the result of Apple, they can only be removals from third party raters. Which means you have to look at columns total from much earlier rows. That's point B.



    With the older log, you never suffered any rating loss until this line:

    4.65285 |1 -1 0 0 0 |4.65378 //there aren't dates so it's hard to say which day this corresponds to.
    In total, you lost a total of seven ratings for the period covered by this older log.
    Also, contrary to the freshier log, you can get -1s without any single rating gain for the same timeframe (same row I mean). And -1s can be found in columns where there haven't been one single rating gain earlier on in the same day.
    So at least for this older log, we don't get the feeling that someone's swapping ratings.
    The losses also only occur in 5* and 4* columns though.


    It's definitely hard to guess what went on. Some odd phenomenon of rating loss did take place at some point and intensified the 21th of May I suppose.

    However, what I can offer as a conclusion (and I think that's what you've observed) is that there might be a rodent in your stats. :)
    Point A: an attack on any rating you obtain. Very few manage to "survive". The phenomenon doesn't care about the score, it cares about erasing the ratings.
    The gain and loss of ratings happens to influence ranking and I don't think Apple is removing anything from you in particular (as if they had identified you as a bad publisher per se) because with such a low amount of ratings per day, you could not be using a farm bot of some kind.
    So these ratings would seem legit, but how could someone remove ratings without having posted them in the first place?
    So since you almost always get a -1 in a row, and since that -1 generally tends to match a +1 from the same column from a couple hours earlier, the process seems to aim at plaguing your app efficiency with a negative delta from one row to another and only a handful extra ratings nullify this phenomenon.
    You might say but when I had a 0 in say the first row of a day and then got a +1 in the same column in the following day's row, that's a positive overall. Sure, but it seems the idea is to have "the final word" at the end of the day, and making sure this one stays negative enough. That's the feeling I get, *if* mischief is to be found (remains to be seen though).
    However, sometimes you get days that finish on a positive, so it would be most interesting to count the numbers of days when you did get an overall positive impression at the end of said days, in comparison to the days that did end with negative "stains". There might be a pattern there that matches weekends for example, or something else.
    With point B, the larger rating losses look like cheap pot shots, but they cannot be anything else but removal of relatively older ratings.
    All in all, the -1s seem to be minor adjustments made to grind the app's ranking down, while the heavier losses really seem to be "critical hits" after letting enough ratings to build up, enough to give the impression that you globally gain ratings if you wouldn't be comparing the dates of losses on much larger timeframes. But it could also be Apple's cleaning lady being very active at that moment. Large amounts of rating or review removals is something that bears the mark of Apple.

    And this happens too often to be something like an automatic cleaning process on Apple's part that would remove ratings if the app's been removed from the user's device. Not to say that it would mean Apple wouldn't pay attention to how long the app would have remained on the machine, which is contrary to the point of looking for short lived installs that smell like ranking manipulation. In other words, we'd have to believe that Apple also deletes old ratings just because the user has cleaned his phone. :|
    Hogwash.

    Needless to say, I'm a bit weary about this, whatever it is. Please keep us informed!
     
  10. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    #10 Pixelosis, Sep 25, 2014
    Last edited: Sep 25, 2014
    That's a possible explanation, indeed. Basically, our poster's app is not gaining that many real legit ratings and is only being used as a sort of cache for fake ones so the fake ratingers appear legit.
    Just like with the massive fraud with facebook ads, which attract fake accounts' likes but never get caught because those accounts are active on a wide range of pages and topics.
    But that's still terrible because if you're used as the equivalent of toilet paper to collect part of a faker's crap ratings, there's only one way your app can go and it's not up.

    Besides, in order to protect innoncent victims, Apple's algorithms would have to be able to differenciate between the apps that do benefit from fake ratings and those that are only used as collector bins as part of the faker's smoke screen.
    That said, one way to filter the good from the bad is in the quantity of identified fake ratings: a very few a day is certainly not the proof that you're using bots, since you'd be using them to reach thousands of downloads a day.
    As for the games that would get tons of legit ratings per day, they would easily soak up those bad rating removals from Apple's hand.
    In the end, it might only affect unfortunate indie apps that get few ratings a day.
    But perhaps Apple doesn't take these changes into account and your app isn't getting negatively affected by those removals? Hard to believe though, since even if Apple knew they were not your fault, they're still removing ratings at the end of the day, and the lasting sign is a minus, not a plus.
    Sucks.
     
  11. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    Pixelosis, thanks for studying this so carefully and sharing your observations. After the initial disbelief and denial, that is very similar our assessment when we first noticed the odd goings on.

    We have progressed considerably since then. We believe we have an explanation of what exactly is going on and the intent and algorithm being used which seems to be directed at this app specifically (but not necessarily exclusively). We will have a report soon when we have the writeup along with the best visualization of the activity that makes the intent clear.

    The ratings (including legit user ratings) currently seem frozen in the App Store for more than 24 hours for some reason ... or the manipulation is halted for Jewish holidays :)

    Some brief inline comments to your observations

    There is a very simple explanation for this using Occam's razor alone. It is a single account switching between ratings in each sweep. That is why you get the matching 1 and -1 with 99%+ correlation in each sweep (sometimes the appearance of a legit user rating cancels out the -1 which we caught during the data scrubbing). More important, the -1 in the next such activity coincides exactly with the previous 1 again 99%+ of the time. The probability of this being a coincidence amongst multiple legit users in almost every sweep over 4 months is too small to be a possibility. We will have an explanation of what the role of this activity is even if it appears pointless in not contributing to the ratings over time. It serves a definite purpose.

    Most of the entries with a number of ratings appearing in the next sweep being almost exactly the same number disappearing in the previous sweep, seems to be due to occasional sync problems between Apple servers. When we scrape for the ratings every hour, we reach one of many Apple servers to process the request. Occasionally, some servers are out of sync and behind in their ratings database copy (or in the middle of a restore). So, it looks as if we have suddenly lost a number of ratings which immediately appear in the next sweep when the request reaches another server or the same server that is up to date. They are not related to the activity here. We have had to scrub the data manually for artifacts like these to reach our conclusions.

    Exactly. Further, there is virtually no loss of ratings over time when we scrub the data for the artifacts mentioned above. Most users really don't seem to go and change their ratings with any regularity let alone do this for just this app over almost every sweep cycle. Except for the May 21 disappearance that coincides with the start of the -1 flipping activity, there is virtually no persistent removal of ratings for this app at all.

    This log is for our free app, not our paid app which gets far fewer downloads and there is no similar activity there.

    This removal seems to be related to the mass scrubbing of ratings Apple did during May related to some bot activity in apps. This was reported in TechCrunch on June 13.

    Apple Is Taking Action Against Fake Ratings On The App Store

    We suspect Apple also cleaned out ratings from closed accounts as part of that one-time sweep some of which were in our app. This removal actually bumped our app ratings up slightly because of the lower ratings removed so if we had used our own fake ratings earlier, we did a very poor job of distributing those ratings. :) But seriously, we would hardly be trying to get maximum attention to our app rating activity and get Apple to take a look at it if we had engaged in a single fake rating.

    All of the above are Apple server sync problems immediately added back in the next sweep as you can observe in the logs. There have been no net loss of ratings except for that May 21 clean up.

    Yes, the reasons will become clear in our report.

    We do not seem to have lost any net over the entire period after scrubbing the data for the server scraping issues mentioned above which is also consistent with the older log

    The above line is likely the one legit change of ratings from a user. Like I said above rating changes don't happen much at all for any app (in the same version of the app), let alone ours. Perhaps one or two a month for an app like ours at most.

    Exactly. We believe the -1 swapping was a new automated algorithm that was put in place on May 21 suspiciously coinciding with the Apple sweep to remove fake ratings. We suspect that the activity became more mechanical since May 21 to create some obfuscation with what might appear to be regular/random movements to hide the actual algorithm. This is what we have uncovered.

    Your concluding conjectures (for which I thank you for spending time on to put them together) are like ideas we put together as hypotheses to test on the data and try to eliminate one at a time. We are still working one some of them as we learn more. This is why it is taking a long time.

    Absolutely, since we believe this is of serious concern to the entire developer community.

    We believe this manipulation works to affect app ranking and downloads. It is also so carefully constructed that it cannot be a one-off for just our app in this niche space. There are also some odd observations that cannot be explained by activity from an entity outside the App Store. But this possibility is difficult to establish with absolute certainty.

    It is like sending your beloved child to school and finding that someone has been manipulating the reported grades to prevent your child from succeeding and the school does not investigate when evidence is presented privately. :)

    Thanks again for your time and the thoughtful comments. We will include any hypotheses from you that we have not already considered for our testing.
     
  12. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    At this point, we might infer that something is being done from Apple's side, although I wouldn't treat that as a given.
    For what purpose and how, that's interesting to know.

    Let's say their algorithm scans review and rating stacks to purge the illicit ones. OK, fine.
    How do they know they're illicit? Because the reviews themselves (their content) contain odd words which are deemed typical of fake reviews.
    As for ratings, it's rather difficult, say impossible to detect anything fishy about a mere score, unless they try to detect a pattern on the way a rating is posted and manage to see something from the rating posting alone, but that's massively open to extrapolation and would most likely make the algo random. I mean, what is there to see from a single rating posted at some random time, giving some random score to an app?
    Well, nothing, really.
    What is doable, though, is to keep a tab on the user who posted the rating or review. His behaviour can be tracked over several apps.

    In your case, if that is what happening, then fake ratings are either posted by the fake accounts of one single rating provider, or by several providers who are having a "party" with your app.

    Let's go with the idea that you're being used by one single illicit rating provider, and he decided to use your app (among several others) to give his several accounts some legitimacy by reviews a wide variety of apps beyond the one he's paid to push up the charts.

    So in theory, what we'd observe is such a provider adding random ratings to your apps through multiuple accounts.
    Eventually, in order to avoid pulling attention on its activity, he would post ratings evenly so they don't make the app go down or up: any variance in the app's performance would almost look like a glitch on Apple's radar, nothing odd unless perhaps you zoomed in pretty hard on the graphs (still not a given that you'd see any pattern solely based on the analysis of the variance alone).
    Then, the other thing we'd observe would be a game of cat and mouse where Apple has identified this provider and tracks each fake review/rating and erases them.

    But there's a problem with that.
    In order to be hit by so many -1s, several and presumably always unique accounts must have posted ratings.
    If Apple identifies a counterfeiting user, fine, his account might be frozen or closed. But that is just one account.
    Said rating provider must use several accounts for his scam, and if he does his job well, there's virtually no way Apple would, especially with such near perfection and diligence, accurately manage to track and recognize him behind each account he switches to.
    There would be many hit and misses, with a hefty amount of ratings flying under the radar.

    The only way for having such a regular amount of -1s per day means that the provider would be using accounts that follow a pattern, one which Apple would easily recognize and detect.
    Even if said provider would create new accounts, he'd have to assume he hasn't changed his method much so Apple would immediately recognize the value of those accounts and immediately auto-stalk them.
    Then, Apple could keep blocking those accounts, the provider would keep making new ones following the same (and now unefficient) construction pattern and all Apple would have to do is react ASAP.

    It would be simple bad luck that said lousy counterfeiter has picked your app (and some others) to somehow boost the legitimpacy of a variety of fake accounts he uses.
     
  13. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    If I get you right, are you saying Apple has created a secondary "smoke screen" algorithm that kind of creates confusion to cover the activity of the first "hunter cleaner" algorithm?
    This seems rather complicated. Why would they feel they need to justify themselves and rather keep hush hush about it and decide to obfuscate their primary algo's activity if they're hunting down fake ratings?

    In other words, while their hunter algo would be removing some ratings detected as fake here and then, the other algo would be like padding the holes by filling the rest of the ranking evolution, by adding and removing fake ratings posted by Apple themselves, and be careful in fact in appearing random but actually, on the long scale, cancelling each score out by making sure that they actually all average out to a median score of 3 when scores are compared, and 0 when rating gains and removals are compared?
    My skittles are melting.
    Needlessly complicated, especially since this huge rating removal activity would have more chances to attract attention. Talk about being conspicuous.

    I'd rather think some tracking code routine at Apple is perhaps running on some too strict set of parameters and being over zealous, thus killing what you perceive as legitimate ratings within a few hours after their posting.

    What a mess. We're not kidding when we say Apple should do without ratings entirely. They're a sour joke at this point anyway.

    Obviously the addition and removal of ratings is of influence.
    However, I can't see how some third party could have such an influence on your ratings unless for some bizarro reason, a new kind of scam involved creating fake accounts, posting fake and random ratings and then removing them in the following hours, for some final goal I would be at pain to understand.

    In my first post, I said that this removal activity left a sort of negative stain on your ranking because, essentially, many +1s were transformed into -1s and that would be like in the stock exchange when closing, people would say it ended on a loss, even if both up and down resulted into a pure zero variance. People would say, yeah but it went down when it closed, and that would stick in their mind. But that's just psychological.
    It would mean that somehow, "going down" before each server syncing and the acknowledgement of the hourly rating postings.
    This would need to be translated into some kind of equation as to make a removal more penalizing, somehow.
    So I think it would only be of importance if, for some reason, rating removals weighed more than rating additions.
    In other words, the multiplier attributed to the value or the removed ratings would be greater than that of the gained ratings.
    For example, you get a 4 stars rating, your app gains +4 pts (let's say the multiplier associated to gain is of 1). But if this rating gets removed, you lose 4.8 pts (removal multiplier = -1.2, in order to translate into mathematics the negative bitter taste effect of a removal).
    And this would be how a removal is hurting your app and doing more than just making its raking stagnant.

    But this would be a black hat method, the work of some competitor, not necessarily targeting your app uniquely but say a group of apps of the same type to make room in a given app category.
    So the new weapon is not to pimp your app with fake ratings but apply some blanket ranking reducer to a grape of other apps with similar interests.

    This seems to imply Apple fully knows what's going on and they didn't take action after you presented them solid evidence.
    It's a heavy company with some inertia, I don't think they can react that fast either. Not to say that they don't necessarily totally enforce what they say (see Flappy Bird and the clones' removals and interdictions of keyword use).
     
  14. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    We have no evidence to infer that something is being done from Apple's side that is behind the observed data. Besides, such an assumption is not necessary.

    A simpler and more credible assumption is that the activity is designed to evade any official Apple scrutiny in place for App Store. I am also being careful to not make any specific assertions regarding whether such activity is necessarily happening from outside the App Store or inside (say as some unofficial or rogue activity). We do not need any such assumptions to determine the algorithm behind the activity.

    We are as curious as you and others on who and why but I believe only Apple can determine this and any guesses from outside would be pure speculation. These speculations cannot be inferred from the data and there isn't enough information from the opaque App Store process to come to such conclusions.

    I posted this thread to ask people for help in analyzing the data to help us come up with an algorithmic explanation of the activity rather than make any assertions/speculations about who is involved and why

    Our goal is to assess if there is an intentional activity and that there is an impact on the rankings (and hence financial damage which would make such activity actionable).

    We have received several tips from such requests for help that helped us progress considerably.

    We are NOT saying Apple is behind this or doing something to fight external activity or any such thing.

    All I have said in that regard is that we do not know how some of the observations can be explained by activity from outside the App Store but that it is difficult to establish that it is indeed done from inside the App Store. This is for Apple to investigate and act accordingly.

    All we want to do is to establish the unusual activity and the most apparent algorithm that fits the activity and get Apple to take a look at it and hopefully explain it (whatever the causes may be). It is in their interests as well as Developers' interests to remove all doubts about the integrity of the App Store and prevent any activity that threatens the integrity.

    It only requires one account to explain the -1 flipping since you never see two of the +1s outstanding (after eliminating what are likely legit user ratings) without a corresponding -1 any time in the 4 months. Or in other words, one can reproduce that behavior with a single account switching ratings over the entire period. Given millions of apps, I doubt Apple would necessarily detect this on their own unless they were specifically looking for it. A user changing their ratings is legit. Not sure Apple is necessarily trying to detect some account doing this so often.

    This is where I need to be clear. I have made no assertion that Apple or anyone in particular is doing something. Just establishing that the activity exists and an algorithm that explains it.

    The reason for that is specifically to avoid the kind of inferences made above. That Apple must necessarily be doing this which appears incredible or or that they must be engaging in some regular hunting of fake reviews on our app. We do not need any such assumptions to establish the algorithm and the simple explanation is that it is simply flying under the radar.

    The simplest credible assumption one can make if one needed to do so is that the obfuscating activity is designed precisely to avoid detection by official Apple monitoring. The latter may not be designed to observe the behavior over multiple days as we did. Such activity has likely gone undetected by Apple. It is our goal to draw attention to this. It requires no additional assumptions of regular hunting algorithms, etc. Occam's razor applies.

    We agree it is a mess.

    The freezing of ratings since early May 24 is curious. Unless the entity responsible for this rate flipping takes Jewish New year holidays (in which case it will start again tomorrow) :), either Apple has noticed this and frozen the ratings to investigate or something is broken.

    We know that it is not likely because of nobody rating the app in that period because our app logs when someone taps the rate/review the app explicitly and is taken to the App Store. This is redirection to the HTTP URL that brings up the App Store app, not the download overlay which has disabled the ratings.

    We wouldn't know who it is or whether they eventually leave a rating but there is a "conversion rate" (fraction of people who are so redirected AND leave a rating) amongst such reports that in theory should not vary much over time from a collection of legit users. We have seen excellent correlation between the number of downloads and the number of such taps which makes sense.

    This is not a nag that they have to explicitly dismiss as any additional step from regular app usage, becomes visible only after they have used the app successfully for a while and clearly explains what happens before they tap it, to eliminate false positives from people who tapped it by mistake or confusion. Once they tap it, it never appears again and avoids logging multiple such visits from the same user.

    Since May 24, there have been 43 such unique user redirections to the App Store from users of the app. A "conversion rate" of 0% ratings from that number is unrealistic. It is possible that there may be zero written reviews from such a sample since only a very small fraction of the people write reviews as opposed to just rate it. But not zero ratings.

    Fake bots/accounts that could be automatically filtered by Apple do not appear in this log since it requires considerable smart usage of the app before they see it. Typically, it takes about 2-3 days of usage since the download before people see it and can tap on it. A large number of them might never see it. So, it is a good and robust sampling technique.

    We are beginning to think that the App Store is seriously broken, compromised or our app has some special processing status at the App Store. :)
     
  15. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    #15 Pixelosis, Sep 26, 2014
    Last edited: Sep 26, 2014
    Are you saying that crosswords are a niche market?
    I don't know a single person above 25 who hasn't played at least one crosswords in his or her life. There even are versions for kids.
    Perhaps the premise that it's a niche might already misguide you in your research?

    Secondly, that rather simple method you used to extract that data, are you perfectly sure it is 100% reliable to begin with?
     
  16. Jez Hammond

    Jez Hammond Well-Known Member

    Oct 20, 2012
    50
    0
    0
    There's one statistic which keeps getting my attention: 4.6... I would say that anything malicious would surely want that around a point lower?
     
  17. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    You are correct, it is relative I suppose. However, that premise is not affecting anything we are doing since we are ignoring who or why someone might want to manipulate this. If you see different in what we are doing above, please do let me know.

    If you mean the getting the rating numbers from iTunes, this is the same method used by all the app sites that keep track of iOS apps and the only public means available.

    We have verified this with the numbers reported in iTunes on a Mac to make sure there are no errors in our script. In addition, we had also contacted a number of well-known app aggregation/rating sites with this data to see if any of them can explain this odd behavior and one of them shared their collection of rating data archives for the app which is consistent with our data collection.
     
  18. WalterM

    WalterM Member

    Aug 31, 2014
    16
    0
    0
    There are 5* rated apps that get very few downloads/sales and there are 4* rated apps that are raking in. The relative number may make more difference than the absolute number depending on what the app is competing with for its downloads.

    This is off on a tangent from this thread and do not want to turn it into an App Store discussion thread but just to explain why we are pursuing this and unhappy about what we have found ...

    From our experience of studying the App Store even before we introduced our app to iOS (based on which we have had to take several defensive measures), there seem to be several ways to destroy an app if one wanted to using the design of the App Store itself to work against it. It depends on the normal user activity and ratings of the apps and the level it is at.

    For very low volume of downloads/ratings, the most vulnerable period is right after an a new launch or update where the appearance of one or two bad reviews/ratings can completely skew the average and affect user perception on whether to download. As developers know, it is very difficult to build up volume one user at a time even without such hit ratings/reviews. This is the kind of manipulation that most small developers worry about but not what really seems to make a difference in higher-stakes gaming on the App Store.

    Such hit ratings and reviews don't make much difference for apps that have a higher volume of downloads (from big marketing, brand recognition or loyal customer base over time, position in the ranking, search result ranking, etc) and do not have serious problems that make legit users give bad ratings. It is mathematically difficult to move a rating down much when the app has several hundred or thousand ratings and a few hit reviews after the first 10-20 make no difference. They affect developer egos more than affect downloads/sales.

    The manipulation at the level where the ranking, App Store placement, search result ranking, third-party app rankings and so on create a self-sustaining, high-download, high ranking app, seem aimed more at affecting those metrics (up or down) exploiting the App Store design than influencing individual users downloading the app. Users don't seem to care all that much about 4 or 4.5 or 5 stars or the appearance of some hit reviews. The rest is just meaningless gaming amongst developers and cause of much unnecessary angst.

    From what we have been able to understand from our studies, success in those metrics are a function of the velocity and direction of downloads/ratings/revenues for an app than the absolute numbers. Because of the number of apps and the clustering of apps in the combined metrics in those parameters, small changes seem to make a huge difference at that self-sustaining level.

    If you are manipulating your app up, you can try to influence the downloads/ratings and sales. If you are manipulating a competing app down, you can only affect the ratings directly to affect the metrics which snowball to knock an app off of the self-sustaining level (this is what we believe happened in our case). You only seem to need small changes at that level. The design of the App Store takes care of the rest to bury it since the natural consequence of the design is to kill off apps (not saying this design is intentional but it is designed like music/movie industry which is to create a few big hits all the time and a lot of "flops" which fail by not reaching sufficient core mass even if very good). Either an app is at that self-sustaining level or it is headed down to mediocrity/invisibility in downloads/sales.

    Going off on a tangent here and don't want to turn this thread into an App Store design discussion, perhaps better to continue this on a separate thread.

    This is also not to suggest anyone should try to manipulate their apps or others' apps but learning how it works allows one to take defensive measures and not be upset by things like obvious hit reviews that can hurt egos but likely have little or no impact on the app except in very low volume situations.

    What disturbs us the most in what we have observed and why we have started on this to precipitate a review of the process by Apple is the suspicion that the App Store may itself be compromised from the inside (intentionally or not) and not just external bot activity. There are no defensive measures one can take against that.

    For example, our ratings count is frozen since the 24th. Things like this do not happen because of events from the outside. This has happened before in our previous update. When you contact Apple about things like this, you get the usual brain-dead form letter - asking for iOS version and device where this can be observed, notice that users can register and remove ratings at any time and if there are any issues that we should send them the iTunes account name (as if your users share that with you or even care about problems like this) of the affected accounts. :)

    There are no effective means to get Apple's attention beyond this brain-dead firewall front-line (that we know of) and report abuses and the only thing that seems to work is something that can potentially create bad PR outside. Not something we want to do but we seem to have stumbled on something that has that potential and have documented it extensively.

    We will still send that report to Apple first before publishing it and see if they do something about it or we just get another cut and paste form letter.
     
  19. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    I know of only one other forum where you have posted this thread. What are the two other forums beside Touch Arcade?
     
  20. Pixelosis

    Pixelosis Well-Known Member

    Jan 28, 2013
    157
    0
    0
    Your OP didn't seem clear on this question. To wit:

    You certainly have at least a theory and a grave accusation formulated.
    So are you accepting that other people might challenge or expand on your theories or not?


    I'm not certain if asking if the removal is intentional or not to be the right thing to do.
    Aside from a bug from a routine that keeps deleting new ratings, whatever the reason for that bug would be, it could only be intentional. The motives, the 'why' precisely, are what make the difference. Are they good or bad?
    Good or bad helps us guess where it comes from. It helps us guess if one would try to fly under the radar or wouldn't care. It'd help us guess what would happen next (best thing for a theory is to actually provide a "forecast" of behaviour or phenomenon).
    I believe that having good theories as to what's going helps moving further into the understanding and reading of the data, which in the end would allow people to detect patterns that they didn't formerly see because they didn't have a good hypothesis beyond the mere observation of data.
    However, what the goal behind the process is could be unrelated to any real malign intention. For one, you might be the target of one of Apple's routines that for some reasons has gripes with one or more of your app's parameters or your account and is stuck in hunter cleaner mode. Somehow, we'd have to say in this case that the AI's gone mad. Or silly.
    The question isn't if there's an intention or not, but if your app is targeted for mischievous reasons, directly against your interests.

    Requiring an algorithm that would precisely mimick the process you observe is overkill. A simple set of solid statistics would do.
    It would also rather be easy for Apple to know if the reviews have been removed by peers, or if they've been removed by a moderating function activated from within Apple.

    One single account doing all the rating posting and removal would stick out like a sore thumb. You wouldn't need any algorithm here.

    This supposes that Apple doesn't have a sensor homing on any condensed activity of +-+-+-+- of ratings. That's possible, since Apple would largely be concerned about looking for odd increase of ratings, not keeping an eye on removals.
    Or that if such a tool existed, if might be altered from within to ignore what's happening to certain apps (rogue dude from inside keeps the tool from doing its work properly).


    So as far as it goes, if I read you correctly, your theory is that the goal of this process is to make ratings stagnate and that it relies on some obfuscation so it doesn't get spotted.
    Obviously, it's not the act of removing ratings that would obfuscate anything, but how they are removed.
    The removals look like they target scores at random (instead of going only for the 4* or 5*) and appear more natural, plus something so small and constant that it doesn't shine like a thunder bolt in the dark.

    By freezing, do you mean all the ratings removals have, in fact, made sure that your overall rating count never moved? At least, never up?

    To summarize, any user downloading and playing your game will be asked to rate the app only after a couple days of playing.
    Then, a popup appears in game, asking them if they want to rate your app.
    If they tap on the button to rate your app, they'll be taken to the appstore webpage directly, to leave a rating.
    And by reading your logs, you see that on a collection of 43 users who indeed did press the rate button (or whatever takes them to the rating form), you are not left with one single rating.

    I'm wondering if this rating removal process can be scaled up, not by magnitude, but by frequency: instead of going for major -10s, -15s, -16s or else at once, it keeps doing +1s and -1s, but at a much faster rate, so there's a very, very small oscillation.
    This could only happen if the process knows how to limit itself and when to activate as to always produce a unique removal during the smallest possible timeframe between two server updates.

    Thing is, if this process can be intensified in frequency and keep the same magnitude per action, it could literally turbo-grind the ratings of apps that usually get larger quantities of ratings per day.
     

Share This Page