NOTE TO computerconsultantsforum.com and forums.techcareerfubar.com USERS: This is the same site. Your login will work here. Use the "forgot password" function if you need help recovering your password.

Obvious fact: You're not logged in.

Therefore, you're only seeing the tip of the iceberg of great discussion threads on this site.

Get rid of this big black message box by joining here: http://mature-it.pro/register/

Who We Are: A collection of IT, engineering and sciences professionals, in a variety of current circumstances with a variety of career backgrounds. Including System admins, Developers and programmers, Freelancers and "gig" entrepreneurs, Contract, job shopping and FTE-employed contract house IT workers, Web developers, Inventors, and artists and writers with tech backgrounds.

We're smarter than the hive mind you've experienced on large tech discussion forums and groups. So register on the board - your email is NEVER sold or provided to third parties. Then LOGIN FREQUENTLY to see new stuff daily.

Join by Registering here: http://mature-it.pro/register/

Author Topic: How does Google handle storage?  (Read 278 times)

unix

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 4219
How does Google handle storage?
« on: November 12, 2018, 05:12:50 am »
I realize Gulag stores all our data, like searches, your map location, etc... maybe even a lot more than that, like emails. Supposedly yahoo never deletes any emails, even if you delete them, they store it.. not sure

But where do they get the space to hold all this stuff??

The requirements must be insane, Tera-Giga-Bazillion bytes. That's the *entire* world that uses Gulag and stuff. They are probably writing at the rate of 1 terabyte per second. Or something close to it.

(Gorn corrected Unix's completely unhelpful thread title to something meaningful :P )
« Last Edit: November 17, 2018, 04:03:34 pm by The Gorn »
Brawndo. It's got what plants crave.

ilconsiglliere

  • CCF Winner's Circle - Supporter
  • Wise Sage
  • *
  • Posts: 3248
Re: How do they do it?
« Reply #1 on: November 12, 2018, 05:50:22 pm »
They might be using compression algorithms of some sort to store it. If you think about it, the data collecting will never stop. Where do you put it all.

ArnoldW2

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 531
To figure it out ...
« Reply #2 on: November 12, 2018, 06:56:14 pm »
Unix asked where Google stores all those hard disks it uses to store all the internet's data.

I did some research and some math.  If you want to skip all the calculations I did, here's the bottom line:

1.  I figure that Google spent a little over $9 Billion for new hard disks to store all the data that the internet generated in 2017.  Google can easily afford that.

2.  I figure that Google had to buy a building the size of one of the largest hotels in the world to house all those new hard disks (or a bunch of smaller buildings that add up to the same number of square feet).  But that building - even at about $2 Billion - costs a lot less than the hard disks, so Google can afford that too.


Now for the research (links to web sites) and details (calculations).  We need some numbers.

How many dollars per TB does hard disk storage cost?
The cheapest option [dollars per TB] on the web page linked below is an external 8TB external hard disk for $160.

Fry's Electronics Prices for External Seagate Drives

That works out to $20 per TB -- retail price.
Wholesale is probably about $10 per TB.  Google may buy storage cheaper than that, but probably not much cheaper.

So multiply the internet's daily data production by $10 for every TB to calculate Google's daily expense for new hard disks (assuming Google retains 100% of all new internet data -- a dubious assumption).


Now, how many Tera-Bytes are generated on the internet each day?
I did a little searching.  One web site says that, in 2017, 2.5 quintillion bytes per day were created.

https://www.iflscience.com/technology/how-much-data-does-the-world-generate-every-minute/

That means that the internet generated 2,500,000 Terabytes per day.  How do I know that?

Let's detour into the names of large numbers.
Note that I'm using powers of 10 instead of powers of 2 in order to keep the math easy.

   One Million is 10 to the sixth power = 10^6 = 1 Megabyte
   One Billion is 10 to the ninth power = 10^9 = 1 Gigabyte
   One Trillion is 10 to the 12th power = 10^12 = 1 Terabyte
   One Quintillion is 10 to the 18th power = 10^18 = 1 Exabyte = 1 Million Terabytes

Returning to the internet's data production, 2.5 quintillion bytes per day = 2,500,000 Terabytes per day.

Multiply by $10 per Terabyte to get $25,000,000 dollars per day to buy hard disks for all the new data in 2017.
Then multiply by 365 days per year to get $9,125,000,000 dollars per year.

Conclusion:
Google can easily afford to buy all the hard disks it needs to store all internet data production.


HOWEVER, we're not quite done yet.  Unix asked where all those hard disks are stored.

Well, my Seagate backup drive enclosure (sitting on my desk) is 8 inches X 5 inches X 1.5 inches.
Google mounts all its drives on racks, and space between the drives is needed for cooling air flow.
And all the wires take up a significant amount of space too.

I'll assume 10 X 7 X 3 inches of space per drive.

Now let's assume each rack has four stacks and each stack is 20 drives.
The height of the rack would be 20 drives X 3 inches per drive = 60 inches = 5 feet.
The width would be 7 inches X 2 stacks = 14 inches
The length would be 10 inches X 2 stacks = 20 inches.

Overall rack dimensions:  60 inches tall X 14 inches wide X 20 inches long.

Since people need to walk among the racks, we need 2 feet = 24 inches between rows of racks.

Let's assume that each row is 36 racks long = 36 X 14 = 504 inches = 42 feet

Let's assume 9 rows = (20 + 24) X 9 = 396 inches = 33 feet.

Total area = 42 feet X 33 feet = 1386 square feet for one room with 9 rows of racks.

Total number of drives in that room = 80 drives per rack X 36 racks per row X 9 rows = 25,920 drives.

Let's make the math easy and assume 10 TB per drive, giving us 259,200 TB.

10 rooms would give us 2,592,000 TB -- more than enough for one day's internet data.  And 3600 rooms would give us more than enough to store a full year's internet data.  Three hotels in Las Vegas have more rooms than that, according to this web page.

https://en.wikipedia.org/wiki/List_of_largest_hotels

Bottom Line:
Google has to buy a building the size of one of the largest hotels in the world to house all the new hard disks it buys every year (or a bunch of smaller buildings that add up to the same number of square feet).  But that building - even at about $2 Billion - costs a lot less than the hard disks, so Google can afford that too.

unix

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 4219
Re: How do they do it?
« Reply #3 on: November 13, 2018, 05:18:20 am »
But does it pay off?
Brawndo. It's got what plants crave.

pxsant

  • CCF Winner's Circle - Supporter
  • Wise Sage
  • *
  • Posts: 1692
Re: How do they do it?
« Reply #4 on: November 13, 2018, 07:03:38 am »
Somehow the estimates of Google storage requirements does not make sense.  For 10 years worth of data they would need the equivalent of a small city full or hard drives.   There must be a bit more than appears at first glance on their data capture scenarios.

The Gorn

  • I absolutely DESPISE improvised sulfur-charcoal-salt peter cannons made out of hollow tree branches filled with diamonds as projectiles.
  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 22598
  • Gorn Classic, user of Gornix
Re: How do they do it?
« Reply #5 on: November 13, 2018, 07:57:59 am »
What I have always wondered is, what would Google do if a significant population of Google using normies (ordinary non-enterprise personal users) attempted to max out each of their 15 GB allotment of data on Gmail and Google personal accounts.

It would be kind of like the coordinated Super Bowl toilet flush that destroys the water utilities across the country in late January.
Gornix is protected by the GPL. *

* Gorn Public License. Duplication by inferior sentient species prohibited.

ArnoldW2

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 531
Re: How do they do it?
« Reply #6 on: November 13, 2018, 04:09:57 pm »
pxsant wrote:
Quote
Somehow the estimates of Google storage requirements does not make sense.  For 10 years worth of data they would need the equivalent of a small city full or hard drives.   There must be a bit more than appears at first glance on their data capture scenarios.

90% of all the internet data in the world was created in only the last two years, according to the article at this link:

https://www.iflscience.com/technology/how-much-data-does-the-world-generate-every-minute/

Note that I linked to this article in my previous post.

If one huge hotel can contain enough hard disks to store one year's past data, then two is almost enough to store all the data that the internet has ever generated.

In my previous posts, I acknowledged that my estimate is based on some shaky assumptions.  Remember GIGO!

I don't know if Google really stores all the internet's data on their hard disks.  It might be only 50% or 25% or 10% for all I know.  Who - outside of Google - would know the difference?  And how?

Many web sites disappear from the internet every year.  Does Google keep all the data from deceased web sites?  If so, why?  I'm sure that Google at least removes deceased web sites from their indexes.  If expired sites are no longer indexed, why keep their data around either?

Many ongoing web sites delete old data from time to time.  Does Google keep expired data on their hard disks forever?  If so, why?  And even if Google did keep expired data, how would I access any of it if none of it's in Google's indexes?

The Gorn

  • I absolutely DESPISE improvised sulfur-charcoal-salt peter cannons made out of hollow tree branches filled with diamonds as projectiles.
  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 22598
  • Gorn Classic, user of Gornix
Re: How do they do it?
« Reply #7 on: November 13, 2018, 05:20:13 pm »
Google isn't archiving (saving) the entire internet's data output. Just think about it sensibly, that's just plain ridiculous to assert.

There's the Internet Archive, the Wayback Machine, privately run and funded - https://archive.org. That site attempts to snapshot public web pages of all known sites periodically. It's hardly perfect. If you pull up an archived historical copy of the index page from this site, clicking on most of the topics results in a not found error.

Google is most likely curating and constantly saving a slice of per-user activity data that is most important for advertising, and also for the purposes of national security (NSA.)

Google saves data from its users. Emails, histories including logins to services, browsing and search history, etc. And Google indexes the web and keeps current topical results from indexing in its cache.

The history data per user probably isn't that big. My guess is that each logged in or identified web user's data known to Google is saved for a brief period of time like 90 days - 1 year, maybe. But Google probably saves other stuff in perpetuity such as aggregate scores for each user of their general tastes, trends in internet usage, consumer and other preferences, etc. In other words Google probably knows that I like Trump, I watch videos on software development topics and business and marketing in certain niches, cooking videos and blog posts.

And Google definitely uses that user tracking data  to deliver paid per click ads to each web user.
Gornix is protected by the GPL. *

* Gorn Public License. Duplication by inferior sentient species prohibited.

ilconsiglliere

  • CCF Winner's Circle - Supporter
  • Wise Sage
  • *
  • Posts: 3248
Re: To figure it out ...
« Reply #8 on: November 13, 2018, 09:40:05 pm »
Unix asked where Google stores all those hard disks it uses to store all the internet's data.

I did some research and some math.  If you want to skip all the calculations I did, here's the bottom line:

1.  I figure that Google spent a little over $9 Billion for new hard disks to store all the data that the internet generated in 2017.  Google can easily afford that.

2.  I figure that Google had to buy a building the size of one of the largest hotels in the world to house all those new hard disks (or a bunch of smaller buildings that add up to the same number of square feet).  But that building - even at about $2 Billion - costs a lot less than the hard disks, so Google can afford that too.


Now for the research (links to web sites) and details (calculations).  We need some numbers.

How many dollars per TB does hard disk storage cost?
The cheapest option [dollars per TB] on the web page linked below is an external 8TB external hard disk for $160.

Fry's Electronics Prices for External Seagate Drives

That works out to $20 per TB -- retail price.
Wholesale is probably about $10 per TB.  Google may buy storage cheaper than that, but probably not much cheaper.

So multiply the internet's daily data production by $10 for every TB to calculate Google's daily expense for new hard disks (assuming Google retains 100% of all new internet data -- a dubious assumption).


Now, how many Tera-Bytes are generated on the internet each day?
I did a little searching.  One web site says that, in 2017, 2.5 quintillion bytes per day were created.

https://www.iflscience.com/technology/how-much-data-does-the-world-generate-every-minute/

That means that the internet generated 2,500,000 Terabytes per day.  How do I know that?

Let's detour into the names of large numbers.
Note that I'm using powers of 10 instead of powers of 2 in order to keep the math easy.

   One Million is 10 to the sixth power = 10^6 = 1 Megabyte
   One Billion is 10 to the ninth power = 10^9 = 1 Gigabyte
   One Trillion is 10 to the 12th power = 10^12 = 1 Terabyte
   One Quintillion is 10 to the 18th power = 10^18 = 1 Exabyte = 1 Million Terabytes

Returning to the internet's data production, 2.5 quintillion bytes per day = 2,500,000 Terabytes per day.

Multiply by $10 per Terabyte to get $25,000,000 dollars per day to buy hard disks for all the new data in 2017.
Then multiply by 365 days per year to get $9,125,000,000 dollars per year.

Conclusion:
Google can easily afford to buy all the hard disks it needs to store all internet data production.


HOWEVER, we're not quite done yet.  Unix asked where all those hard disks are stored.

Well, my Seagate backup drive enclosure (sitting on my desk) is 8 inches X 5 inches X 1.5 inches.
Google mounts all its drives on racks, and space between the drives is needed for cooling air flow.
And all the wires take up a significant amount of space too.

I'll assume 10 X 7 X 3 inches of space per drive.

Now let's assume each rack has four stacks and each stack is 20 drives.
The height of the rack would be 20 drives X 3 inches per drive = 60 inches = 5 feet.
The width would be 7 inches X 2 stacks = 14 inches
The length would be 10 inches X 2 stacks = 20 inches.

Overall rack dimensions:  60 inches tall X 14 inches wide X 20 inches long.

Since people need to walk among the racks, we need 2 feet = 24 inches between rows of racks.

Let's assume that each row is 36 racks long = 36 X 14 = 504 inches = 42 feet

Let's assume 9 rows = (20 + 24) X 9 = 396 inches = 33 feet.

Total area = 42 feet X 33 feet = 1386 square feet for one room with 9 rows of racks.

Total number of drives in that room = 80 drives per rack X 36 racks per row X 9 rows = 25,920 drives.

Let's make the math easy and assume 10 TB per drive, giving us 259,200 TB.

10 rooms would give us 2,592,000 TB -- more than enough for one day's internet data.  And 3600 rooms would give us more than enough to store a full year's internet data.  Three hotels in Las Vegas have more rooms than that, according to this web page.

https://en.wikipedia.org/wiki/List_of_largest_hotels

Bottom Line:
Google has to buy a building the size of one of the largest hotels in the world to house all the new hard disks it buys every year (or a bunch of smaller buildings that add up to the same number of square feet).  But that building - even at about $2 Billion - costs a lot less than the hard disks, so Google can afford that too.

Thats great math and analysis. If it is as you say they can easily afford it.

unix

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 4219
Re: How do they do it?
« Reply #9 on: November 14, 2018, 05:44:36 pm »
Quote
The history data per user probably isn't that big. My guess is that each logged in or identified web user's data known to Google is saved for a brief period of time like 90 days - 1 year, maybe

I looked at my archived search history, it went back several years, I thought. I don't remember precisely as that was maybe 5 years ago (and I turned it off since then) but it seemed to me like it went back at least 1.5 to 2 years, just from the impression I got.

I use duckduckgo search engine these days

Anyway, storing *all* users for *2* years?  That's something. That's just part of the data they are saving. Per person.
Brawndo. It's got what plants crave.

unix

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 4219
Re: How do they do it?
« Reply #10 on: November 14, 2018, 05:48:23 pm »
P.S. to just make it perfectly clear, I kinda considered this a political topic and used the 'private' section

I didn't really approach it as a nuts and bolts technical issue per se, although I can see from the  analysis above that is a completely valid technical discussion, and thanks for steering this in a productive direction.
I think we should consolidate things into a single section without sub-forums. Just IMO.

Maybe. Maybe.

Maybe not.
Brawndo. It's got what plants crave.

The Gorn

  • I absolutely DESPISE improvised sulfur-charcoal-salt peter cannons made out of hollow tree branches filled with diamonds as projectiles.
  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 22598
  • Gorn Classic, user of Gornix
Re: How do they do it?
« Reply #11 on: November 14, 2018, 06:37:41 pm »
Ah, ok. I've decided that with the attrition of the board, segregating politics makes little sense now. We are what we are. Feel free to post stuff like this to any appropriate forum. I removed the Politics forum so it's only fair to allow politics wherever.

Gornix is protected by the GPL. *

* Gorn Public License. Duplication by inferior sentient species prohibited.

unix

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 4219
Re: How do they do it?
« Reply #12 on: November 16, 2018, 08:14:44 pm »

I still think this discussion raises more questions than answers.
Brawndo. It's got what plants crave.

ArnoldW2

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 531
Re: How do they do it?
« Reply #13 on: November 17, 2018, 03:00:13 pm »
Unix wrote:
Quote
I still think this discussion raises more questions than answers.

Then ask the questions.  I'm interested in reading them.

unix

  • Trusted Member
  • Wise Sage
  • ******
  • Posts: 4219
Re: How does Google handle storage?
« Reply #14 on: November 18, 2018, 08:36:55 am »
Where are they getting the funds to run this operation? It is expensive. who would pay them to store data on every gulag user in the world?

Brawndo. It's got what plants crave.