What is the tech stack behind Google Search Engine?
Google Search is one of the most popular search engines on the web, handling over 3.5 billion searches per day. But what is the tech stack that powers Google Search?
The PageRank algorithm is at the heart of Google Search. This algorithm was developed by Google co-founders Larry Page and Sergey Brin and patented in 1998. It ranks web pages based on their quality and importance, taking into account things like incoming links from other websites. The PageRank algorithm has been constantly evolving over the years, and it continues to be a key part of Google Search today.
However, the PageRank algorithm is just one part of the story. The Google Search Engine also relies on a sophisticated infrastructure of servers and data centers spread around the world. This infrastructure enables Google to crawl and index billions of web pages quickly and efficiently. Additionally, Google has developed a number of proprietary technologies to further improve the quality of its search results. These include technologies like Spell Check, SafeSearch, and Knowledge Graph.
The technology stack that powers the Google Search Engine is immensely complex, and includes a number of sophisticated algorithms, technologies, and infrastructure components. At the heart of the system is the PageRank algorithm, which ranks pages based on a number of factors, including the number and quality of links to the page. The algorithm is constantly being refined and updated, in order to deliver more relevant and accurate results. In addition to the PageRank algorithm, Google also uses a number of other algorithms, including the Latent Semantic Indexing algorithm, which helps to index and retrieve documents based on their meaning. The search engine also makes use of a massive infrastructure, which includes hundreds of thousands of servers around the world. While google is the dominant player in the search engine market, there are a number of other well-established competitors, such as Microsoft’s Bing search engine and Duck Duck Go.
The original Google algorithm was called PageRank, named after inventor Larry Page (though, fittingly, the algorithm does rank web pages).
After 17 years of work by many software engineers, researchers, and statisticians, Google search uses algorithms upon algorithms upon algorithms.
Each year, Google changes its search algorithm around 500–600 times. While most of these changes are minor, Google occasionally rolls out a “major” algorithmic update (such as Google Panda and Google Penguin) that affects search results in significant ways.
For search marketers, knowing the dates of these Google updates can help explain changes in rankings and organic website traffic and ultimately improve search engine optimization. Below, we’ve listed the major algorithmic changes that have had the biggest impact on search.
Originally, Google’s indexing algorithm was fairly simple.
It took a starting page and added all the unique (if the word occurred more than once on the page, it was only counted once) words on the page to the index or incremented the index count if it was already in the index.
The page was indexed by the number of references the algorithm found to the specific page. So each time the system found a link to the page on a newly discovered page, the page count was incremented.
When you did a search, the system would identify all the pages with those words on it and show you the ones that had the most links to them.
As people searched and visited pages from the search results, Google would also track the pages that people would click to from the search page. Those that people clicked would also be identified as a better quality match for that set of search terms. If the person quickly came back to the search page and clicked another link, the match quality would be reduced.
Now, Google is using natural language processing, a method of trying to guess what the user really wants. From that it it finds similar words that might give a better set of results based on searches done by millions of other people like you. It might assume that you really meant this other word instead of the word you used in your search terms. It might just give you matches in the list with those other words as well as the words you provided.
It really all boils down to the fact that Google has been monitoring a lot of people doing searches for a very long time. It has a huge list of websites and search terms that have done the job for a lot of people.
There are a lot of proprietary algorithms, but the real magic is that they’ve been watching you and everyone else for a very long time.
C++, mostly. There are little bits in other languages, but the core of both the indexing system and the serving system is C++.
Use a ping service. Ping services can speed up your indexing process.
Our ranking algorithm simply doesn’t rank google.com highly for the query “search engine.” There is not a single, simple reason why this is the case. If I had to guess, I would say that people who type “search engine” into Google are usually looking for general information about search engines or about alternative search engines, and neither query is well-answered by listing google.com.
To be clear, we have never manually altered the search results for this (or any other) specific query.
When I tried the query “search engine” on Bing, the results were similar; bing.com was #5 and google.com was #6.
The basic idea is using an inverted index. This means for each word keeping a list of documents on the web that contain it.
Responding to a query corresponds to retrieval of the matching documents (This is basically done by intersecting the lists for the corresponding query words), processing the documents (extracting quality signals corresponding to the doc, query pair), ranking the documents (using document quality signals like Page Rank and query signals and query/doc signals) then returning the top 10 documents.
Here are some tricks for doing the retrieval part efficiently:
– distribute the whole thing over thousands and thousands of machines
– do it in memory
– caching
– looking first at the query word with the shortest document list
– keeping the documents in the list in reverse PageRank order so that we can stop early once we find enough good quality matches
– keep lists for pairs of words that occur frequently together
– shard by document id, this way the load is somewhat evenly distributed and the intersection is done in parallel
– compress messages that are sent across the network
etc
Jeff Dean in this great talk explains quite a few bits of the internal Google infrastructure. He mentions a few of the previous ideas in the talk.
He goes through the evolution of the Google Search Serving Design and through MapReduce while giving general advice about building large scale systems.
As for complexity, it’s pretty hard to analyze because of all the moving parts, but Jeff mentions that the the latency per query is about 0.2 s and that each query touches on average 1000 computers.
LaMDA is Google’s latest artificial intelligence (AI) chatbot. Blake Lemoine, a Google AI engineer, has claimed it is sentient. He’s been put on leave after publishing his conversations with LaMDA.
If Lemoine’s claims are true, it would be a milestone in the history of humankind and technological development.
Google strongly denies LaMDA has any sentient capacity.
Data Source: statcounterGS
Tools Used: Excel & PowerPoint
Edit: Note that the data for Baidu/China is likely higher. How statcounterGS collects the data might understate # users from China.
Baidu is popular in China, Yandex is popular in Russia.
Yandex is great for reverse image searches, google just can’t compete with yandex in that category.
Normal Google reverse search is a joke (except for finding a bigger version of a pic, it’s good for that), but Google Lens can be as good or sometimes better at finding similar images or locations than Yandex depending on the image type. Always good to try both, and also Bing can be decent sometimes.
Bing has been profitable since 2015 even with less than 3% of the market share. So just imagine how much money Google is taking in.
Firstly: Yahoo, DuckDuckGo, Ecosia, etc. all use Bing to get their search results. Which means Bing’s usage is more than the 3% indicated.
Secondly: This graph shows overall market share (phones and PCs). But, search engines make most of their money on desktop searches due to more screen space for ads. And Bing’s market share on desktop is WAY bigger, its market share on phones is ~0%. It’s American desktop market share is 10-15%. That is where the money is.
What you are saying is in fact true though. We make trillions of web searches – which means even three percent market-share equals billions of hits and a ton of money.
I like duck duck go. And they have good privacy features. I just wish their maps were better because if I’m searching a local restaurant nothing is easier than google to transition from the search to the map to the webpage for the company. But for informative searches I think it gives a more objective, less curated return.
Use Ecosia and profits go to reforestation efforts!
Turns out people don’t care about their privacy, especially if it gets them results.
I recently switched to using brave browser and duck duck go and I basically can’t tell the difference in using Google and chrome.
The only times I’ve needed to use Google are for really specific searches where duck duck go doesn’t always seem to give the expected results. But for daily browsing it’s absolutely fine and far far better for privacy.
There is a lot that happens between the moment a user types something in the input field and when they get their results.
Google Search has a high-level overview, but the gist of it is that there are dozens of sub systems involved and they all work extremely fast. The general idea is that search is going to process the query, try to understand what the user wants to know/accomplish, rank these possibilities, prepare a results page that reflects this and render it on the user’s device.
I would not qualify the UI of simple. Yes, the initial state looks like a single input field on an otherwise empty page. But there is already a lot going on in that input field and how it’s presented to the user. And then, as soon as the user interacts with the field, for instance as they start typing, there’s a ton of other things that happen – Search is able to pre-populate suggested queries really fast. Plus there’s a whole “syntax” to search with operators and what not, there’s many different modes (image, news, etc…).
One recent iteration of Google search is Google Lens: Google Lens interface is even simpler than the single input field: just take a picture with your phone! But under the hood a lot is going on. Source.
The Google search engine is a remarkable feat of engineering, and its capabilities are only made possible by the use of cutting-edge technology. At the heart of the Google search engine is the PageRank algorithm, which is used to rank web pages in order of importance. This algorithm takes into account a variety of factors, including the number and quality of links to a given page. In order to effectively crawl and index the billions of web pages on the internet, Google has developed a sophisticated infrastructure that includes tens of thousands of servers located around the world. This infrastructure enables Google to rapidly process search queries and deliver relevant results to users in a matter of seconds. While Google is the dominant player in the search engine market, there are a number of other search engines that compete for users, including Bing and Duck Duck Go. However, none of these competitors have been able to replicate the success of Google, due in large part to the company’s unrivaled technological capabilities.
Like there are hackers and they stole my Google Doodle. I was just making a doodle artwork. And it stole my artwork. submitted by /u/tjw1963 [link] [comments]
submitted by /u/AdSevere1797 [link] [comments]
submitted by /u/wewewawa [link] [comments]
submitted by /u/Tycoonstory2020 [link] [comments]
submitted by /u/dotcomgeek [link] [comments]
On the S20 FE 5G. I don't know even know how this happened but it's infuriating. submitted by /u/Dsyre2 [link] [comments]
submitted by /u/mayosmith [link] [comments]
I mean, come the fuck on. If you’re an exec and you need to cut headcount because things are bloated and the company is changing direction, then don’t also cut perks and pressure workers AND give yourself fat bonuses. Just ridiculous leadership. Stakeholders in a corporation are: customers, employees, shareholders. It’s so clear that only one of those is being addressed. 🫡 submitted by /u/coco_licius [link] [comments]
submitted by /u/all_is_good_360 [link] [comments]
Would you purchase this pass: YouTube TV YouTube premium Fitbit premium Google 1 Play pass All for $100/month Sign me up for it submitted by /u/birdheh [link] [comments]
What is this glitch? submitted by /u/Dalvinc1 [link] [comments]
submitted by /u/Pierruno [link] [comments]
submitted by /u/TamiroRabbit [link] [comments]
Immediately need it, thank you! submitted by /u/Cheddar404 [link] [comments]
submitted by /u/Adorable-Leadership8 [link] [comments]
It's like being told you gotta cook dinner, but the kitchen's mysteriously relocated to your bedroom. You used to just head straight there, but now the bedroom is in the way. Podcasts and music are two different rooms in my digital house, and I like having them that way, Google and Spotify! submitted by /u/Gustafo99 [link] [comments]
My website is web3wonders.us and the search engine is bringing up web3wonders. us - there is a space before us and after the "." I don't understand this - are they ruining my website or content because I am .us extension? Help me understand please why this malicious attack on a brand new website. I want quality and to resubmit my website for the Adsence partnership. And here this looks like I posted this and I didn't- it is blatantly wrong and harming my hard work. submitted by /u/KKlineBurnett [link] [comments]
For some reason my airlines tickets and hotel confirmation on Gmail is not adding into Google calendar automatically. Previously it did, and I did not change any settings at all. I also make sure all the settings are turned on in Gmail and calender. And found no solution, can someone help? submitted by /u/CJS_548 [link] [comments]
submitted by /u/kristileilani [link] [comments]
submitted by /u/alex_bit_ [link] [comments]
submitted by /u/anayamon [link] [comments]
Google’s SEO advice has been extremely vague recently.Continue reading on Medium »
Imagine the internet as a giant jungle, and Google is the coolest map to find anything you need. Every day, billions of people use Google…Continue reading on Medium »
submitted by /u/kristileilani [link] [comments]
Have a question you need answered? A new Google product you want to talk about? Ask away here! Recently, we at /r/Google have noticed a large number of support questions being asked. For a long time, we’ve removed these posts and directed the users to other subreddits, like /r/techsupport. However, we feel that users should be able to ask their Google-related questions here. These monthly threads serve as a hub for all of the support you need, as well as discussion about any Google products. Please note! Top level comments must be related to the topics discussed above. Any comments made off-topic will be removed at the discretion of the Moderator team. Discord Server We have made a Discord Server for more in-depth discussions relating to Google and for quicker response to tech support questions. submitted by /u/AutoModerator [link] [comments]
submitted by /u/kristileilani [link] [comments]
Explore the latest data on share of search engine referrals as Bing referrals increase YOY while Google referrals decline, top Google searches, and a decline in searches for news. submitted by /u/kristileilani [link] [comments]
Google is constantly updating and improving its search features to provide users with the most relevant and accurate information. And I am…Continue reading on Medium »
Hello, blockchain enthusiasts!Continue reading on Medium »
To make filter on explict contents you need to enable the filter option in google search by this you can control all those odd contents…Continue reading on Medium »
The discontinuation of People Cards is scheduled for April 7th, 2024, and users are encouraged to download or save their content before…Continue reading on Medium »
Ever created amazing content, but it feels like Google just isn’t seeing it? Technical SEO could be the missing piece.Continue reading on Medium »
In the dynamic realm of digital advertising, Google Search has long reigned supreme as the go-to platform for paid media campaigns…Continue reading on Medium »
Rich Result 在用户体验、点击率、SEO 友好等方面都具有优势,是一个值得优化的 SEO 项目。Continue reading on Medium »
逐一介绍 Shopify 首页 Home Page、产品页面 Product Page、产品集合页面 Collection Page、关于页面 About Us 等具体的页面或元素优化指南。Continue reading on Medium »
Have a question you need answered? A new Google product you want to talk about? Ask away here! Recently, we at /r/Google have noticed a large number of support questions being asked. For a long time, we’ve removed these posts and directed the users to other subreddits, like /r/techsupport. However, we feel that users should be able to ask their Google-related questions here. These monthly threads serve as a hub for all of the support you need, as well as discussion about any Google products. Please note! Top level comments must be related to the topics discussed above. Any comments made off-topic will be removed at the discretion of the Moderator team. Discord Server We have made a Discord Server for more in-depth discussions relating to Google and for quicker response to tech support questions. submitted by /u/AutoModerator [link] [comments]
A Twitter List by enoumen
Offering employees, coworkers, teammates, and students constructive feedback is a vital part of growth on…
Millennials should avoid delaying the inevitable and look into various retirement investment pathways. Here’s why…
For most people, a satisfactory career is essential for leading a happy life. However, ensuring…
The pipeline industry is more than pipework and construction, and we explore those details in…
SQL Interview Questions and Answers In the world of data-driven decision-making, SQL (Structured Query Language)…