What is the tech stack behind Google Search Engine?
Google Search is one of the most popular search engines on the web, handling over 3.5 billion searches per day. But what is the tech stack that powers Google Search?
The PageRank algorithm is at the heart of Google Search. This algorithm was developed by Google co-founders Larry Page and Sergey Brin and patented in 1998. It ranks web pages based on their quality and importance, taking into account things like incoming links from other websites. The PageRank algorithm has been constantly evolving over the years, and it continues to be a key part of Google Search today.
However, the PageRank algorithm is just one part of the story. The Google Search Engine also relies on a sophisticated infrastructure of servers and data centers spread around the world. This infrastructure enables Google to crawl and index billions of web pages quickly and efficiently. Additionally, Google has developed a number of proprietary technologies to further improve the quality of its search results. These include technologies like Spell Check, SafeSearch, and Knowledge Graph.
The technology stack that powers the Google Search Engine is immensely complex, and includes a number of sophisticated algorithms, technologies, and infrastructure components. At the heart of the system is the PageRank algorithm, which ranks pages based on a number of factors, including the number and quality of links to the page. The algorithm is constantly being refined and updated, in order to deliver more relevant and accurate results. In addition to the PageRank algorithm, Google also uses a number of other algorithms, including the Latent Semantic Indexing algorithm, which helps to index and retrieve documents based on their meaning. The search engine also makes use of a massive infrastructure, which includes hundreds of thousands of servers around the world. While google is the dominant player in the search engine market, there are a number of other well-established competitors, such as Microsoft’s Bing search engine and Duck Duck Go.
The original Google algorithm was called PageRank, named after inventor Larry Page (though, fittingly, the algorithm does rank web pages).
Ace the AWS Certified Machine Learning Specialty Exam with Confidence: Get Your Hands on the Ultimate MLS-C01 Practice Exams!
After 17 years of work by many software engineers, researchers, and statisticians, Google search uses algorithms upon algorithms upon algorithms.
- The various components used by Google Search are all proprietary, but most of the code is written in C++.
- Google Search has a number of technical explications on how search works and this is also the limit as to what can be shared publicly.
- https://abseil.io and GogleTest https://google.github.io/googletest/ are the main open source Google C++ libraries, those are extensively used for Search.
- https://bazel.build is an other open source framework which is heavily used all across Google including for Search.
- Google has general information on you, the kinds of things you might like, the sites you frequent, etc. When it fetches search results, they get ranked, and this personal info is used to adjust the rankings, resulting in different search results for each user.
How does Google’s indexing algorithm (so it can do things like fuzzy string matching) technically structure its index?
- There is no single technique that works.
- At a basic level, all search engines have something like an inverted index, so you can look up words and associated documents. There may also be a forward index.
- One way of constructing such an index is by stemming words. Stemming is done with an algorithm than boils down words to their basic root. The most famous stemming algorithm is the Porter stemmer.
- However, there are other approaches. One is to build n-grams, sequences of n letters, so that you can do partial matching. You often would choose multiple n’s, and thus have multiple indexes, since some n-letter combinations are common (e.g., “th”) for small n’s, but larger values of n undermine the intent.
- don’t know that we can say “nothing absolute is known”. Look at misspellings. Google can resolve a lot of them. This isn’t surprising; we’ve had spellcheckers for at least 40 years. However, the less common a misspelling, the harder it is for Google to catch.
- One cool thing about Google is that they have been studying and collecting data on searches for more than 20 years. I don’t mean that they have been studying searching or search engines (although they have been), but that they have been studying how people search. They process several billion search queries each day. They have developed models of what people really want, which often isn’t what they say they want. That’s why they track every click you make on search results… well, that and the fact that they want to build effective models for ad placement.
Each year, Google changes its search algorithm around 500–600 times. While most of these changes are minor, Google occasionally rolls out a “major” algorithmic update (such as Google Panda and Google Penguin) that affects search results in significant ways.
For search marketers, knowing the dates of these Google updates can help explain changes in rankings and organic website traffic and ultimately improve search engine optimization. Below, we’ve listed the major algorithmic changes that have had the biggest impact on search.
Originally, Google’s indexing algorithm was fairly simple.
It took a starting page and added all the unique (if the word occurred more than once on the page, it was only counted once) words on the page to the index or incremented the index count if it was already in the index.
The page was indexed by the number of references the algorithm found to the specific page. So each time the system found a link to the page on a newly discovered page, the page count was incremented.
When you did a search, the system would identify all the pages with those words on it and show you the ones that had the most links to them.
As people searched and visited pages from the search results, Google would also track the pages that people would click to from the search page. Those that people clicked would also be identified as a better quality match for that set of search terms. If the person quickly came back to the search page and clicked another link, the match quality would be reduced.
Now, Google is using natural language processing, a method of trying to guess what the user really wants. From that it it finds similar words that might give a better set of results based on searches done by millions of other people like you. It might assume that you really meant this other word instead of the word you used in your search terms. It might just give you matches in the list with those other words as well as the words you provided.
It really all boils down to the fact that Google has been monitoring a lot of people doing searches for a very long time. It has a huge list of websites and search terms that have done the job for a lot of people.
There are a lot of proprietary algorithms, but the real magic is that they’ve been watching you and everyone else for a very long time.
What programming language powers Google’s search engine core?
C++, mostly. There are little bits in other languages, but the core of both the indexing system and the serving system is C++.
How does Google handle the technical aspect of fuzzy matching? How is the index implemented for that?
- With n-grams and word stemming. And correcting bad written words. N-grams for partial matching anything.
Use a ping service. Ping services can speed up your indexing process.
- Search Google for “pingmylinks”
- Click on the “add url” in the upper left corner.
- Submit your website and make sure to use all the submission tools and your site should be indexed within hours.
Our ranking algorithm simply doesn’t rank google.com highly for the query “search engine.” There is not a single, simple reason why this is the case. If I had to guess, I would say that people who type “search engine” into Google are usually looking for general information about search engines or about alternative search engines, and neither query is well-answered by listing google.com.
To be clear, we have never manually altered the search results for this (or any other) specific query.
When I tried the query “search engine” on Bing, the results were similar; bing.com was #5 and google.com was #6.
If you are looking for an all-in-one solution to help you prepare for the AWS Cloud Practitioner Certification Exam, look no further than this AWS Cloud Practitioner CCP CLFC01 book below.
What is the search algorithm used by the Google search engine? What is its complexity?
The basic idea is using an inverted index. This means for each word keeping a list of documents on the web that contain it.
Responding to a query corresponds to retrieval of the matching documents (This is basically done by intersecting the lists for the corresponding query words), processing the documents (extracting quality signals corresponding to the doc, query pair), ranking the documents (using document quality signals like Page Rank and query signals and query/doc signals) then returning the top 10 documents.
Here are some tricks for doing the retrieval part efficiently:
– distribute the whole thing over thousands and thousands of machines
– do it in memory
– looking first at the query word with the shortest document list
– keeping the documents in the list in reverse PageRank order so that we can stop early once we find enough good quality matches
– keep lists for pairs of words that occur frequently together
– shard by document id, this way the load is somewhat evenly distributed and the intersection is done in parallel
– compress messages that are sent across the network
Jeff Dean in this great talk explains quite a few bits of the internal Google infrastructure. He mentions a few of the previous ideas in the talk.
He goes through the evolution of the Google Search Serving Design and through MapReduce while giving general advice about building large scale systems.
As for complexity, it’s pretty hard to analyze because of all the moving parts, but Jeff mentions that the the latency per query is about 0.2 s and that each query touches on average 1000 computers.
Is Google’s LaMDA conscious? A philosopher’s view (theconversation.com)
LaMDA is Google’s latest artificial intelligence (AI) chatbot. Blake Lemoine, a Google AI engineer, has claimed it is sentient. He’s been put on leave after publishing his conversations with LaMDA.
If Lemoine’s claims are true, it would be a milestone in the history of humankind and technological development.
Google strongly denies LaMDA has any sentient capacity.
Fun facts about Google Search Engine Competitors
Data Source: statcounterGS
Tools Used: Excel & PowerPoint
Edit: Note that the data for Baidu/China is likely higher. How statcounterGS collects the data might understate # users from China.
Baidu is popular in China, Yandex is popular in Russia.
Yandex is great for reverse image searches, google just can’t compete with yandex in that category.
Normal Google reverse search is a joke (except for finding a bigger version of a pic, it’s good for that), but Google Lens can be as good or sometimes better at finding similar images or locations than Yandex depending on the image type. Always good to try both, and also Bing can be decent sometimes.
Bing has been profitable since 2015 even with less than 3% of the market share. So just imagine how much money Google is taking in.
Firstly: Yahoo, DuckDuckGo, Ecosia, etc. all use Bing to get their search results. Which means Bing’s usage is more than the 3% indicated.
Secondly: This graph shows overall market share (phones and PCs). But, search engines make most of their money on desktop searches due to more screen space for ads. And Bing’s market share on desktop is WAY bigger, its market share on phones is ~0%. It’s American desktop market share is 10-15%. That is where the money is.
What you are saying is in fact true though. We make trillions of web searches – which means even three percent market-share equals billions of hits and a ton of money.
I like duck duck go. And they have good privacy features. I just wish their maps were better because if I’m searching a local restaurant nothing is easier than google to transition from the search to the map to the webpage for the company. But for informative searches I think it gives a more objective, less curated return.
Use Ecosia and profits go to reforestation efforts!
Turns out people don’t care about their privacy, especially if it gets them results.
I recently switched to using brave browser and duck duck go and I basically can’t tell the difference in using Google and chrome.
The only times I’ve needed to use Google are for really specific searches where duck duck go doesn’t always seem to give the expected results. But for daily browsing it’s absolutely fine and far far better for privacy.
Does Google Search have the most complex functionality hiding behind a simple looking UI?
There is a lot that happens between the moment a user types something in the input field and when they get their results.
Google Search has a high-level overview, but the gist of it is that there are dozens of sub systems involved and they all work extremely fast. The general idea is that search is going to process the query, try to understand what the user wants to know/accomplish, rank these possibilities, prepare a results page that reflects this and render it on the user’s device.
I would not qualify the UI of simple. Yes, the initial state looks like a single input field on an otherwise empty page. But there is already a lot going on in that input field and how it’s presented to the user. And then, as soon as the user interacts with the field, for instance as they start typing, there’s a ton of other things that happen – Search is able to pre-populate suggested queries really fast. Plus there’s a whole “syntax” to search with operators and what not, there’s many different modes (image, news, etc…).
One recent iteration of Google search is Google Lens: Google Lens interface is even simpler than the single input field: just take a picture with your phone! But under the hood a lot is going on. Source.
The Google search engine is a remarkable feat of engineering, and its capabilities are only made possible by the use of cutting-edge technology. At the heart of the Google search engine is the PageRank algorithm, which is used to rank web pages in order of importance. This algorithm takes into account a variety of factors, including the number and quality of links to a given page. In order to effectively crawl and index the billions of web pages on the internet, Google has developed a sophisticated infrastructure that includes tens of thousands of servers located around the world. This infrastructure enables Google to rapidly process search queries and deliver relevant results to users in a matter of seconds. While Google is the dominant player in the search engine market, there are a number of other search engines that compete for users, including Bing and Duck Duck Go. However, none of these competitors have been able to replicate the success of Google, due in large part to the company’s unrivaled technological capabilities.
- Analyzing affiliate marketing through various aspectsby Bright past (Google Search on Medium) on March 30, 2023 at 10:23 pm
What is affiliate marketing?Continue reading on Medium »
- Help with Google Drive upgrade!by /u/NoConcentrate9912 (Google) on March 30, 2023 at 9:59 pm
I swapped all my devices from Apple to Android/Windows. I originally bought my Drive subscription through the Apple Store (200GB). I now need to upgrade but it just does not allow me as it says I need to do it through the Apple Store but I can no longer access it! I thought I would be able to do it remotely but this is just not possible through icloud. Short of buying an Ipad, can anyone help? Thanks! submitted by /u/NoConcentrate9912 [link] [comments]
- Exatas & Softwares - Curiosidades sobre Matemática, Física, Programação, Algoritmos, Apostilas, Livros, Animações, Softwares Gráficos, Vídeos, Exercícios, ENEM, Calculadora HP-12C.by /u/mbcbrito (Google) on March 30, 2023 at 9:54 pm
submitted by /u/mbcbrito [link] [comments]
- Google starts letting the public test generative AI in Gmail and Docsby /u/Broad_Republic2854 (Google) on March 30, 2023 at 5:14 pm
submitted by /u/Broad_Republic2854 [link] [comments]
- Google will start showing excessive heat warnings in Searchby /u/Broad_Republic2854 (Google) on March 30, 2023 at 5:13 pm
submitted by /u/Broad_Republic2854 [link] [comments]
- Bard is more conservative that I would have been.by /u/peritonlogon (Google) on March 30, 2023 at 4:12 pm
if you were a guy who just left a college party and was walking home, where would you stop to pee on the way back? I would probably stop to pee in a public restroom, if there is one available. If there is no public restroom available, I would try to find a secluded spot behind a building or in a park. I would never pee in the middle of the street or in someone's yard. submitted by /u/peritonlogon [link] [comments]
- Google account still shows old profile when I sign up for sitesby /u/BumblebeeThick2690 (Google) on March 30, 2023 at 2:00 pm
Hi, I’ve noticed when I use my gmail To sign up for a site it still uses my old profile picture instead of the new one I changed it to long time ago. The picture it uses is of my exes dog so it would be nice to have that fixed. submitted by /u/BumblebeeThick2690 [link] [comments]
- The astrophotography mode on P7 is amazing. Need more hands on practice though.by /u/Captain_karma95 (Google) on March 30, 2023 at 12:24 pm
submitted by /u/Captain_karma95 [link] [comments]
- Pixel camera never fails to impress meby /u/KarthikMoger (Google) on March 30, 2023 at 10:16 am
submitted by /u/KarthikMoger [link] [comments]
- Google News pushing fraudulent sitesby /u/sneed_patrol (Google) on March 30, 2023 at 8:35 am
I found a few posts on reddit, but there doesn't seem to be much discourse around it. Ever since I clicked on some medium blog about chatGPT4, I keep getting news push notifications for GPT4 sites and most of them are fake. They're waitlists sign ups to new versions or features, or free to use chat with a login, and some are kinda convincing. The first time I got a fake one, it was preview access to a GPT4 chat API and it looked like a very legit website with some short url like gpt-4 ending with io or ai or something. It had google forms typical query you get for these early access and I entered my name, home address and phone number into it. I almost sent it but decided to google the specifics of this and found nothing like it on the official website. I'm a tech savvy person and I fell for it. Before this I got the rare weird site from google news, but ever since chatGPT I get fraudulent ads almost daily. And I can't even report or hide them, because I only get them as push notifications and if you click it, you get to a subpage where it doesn't have the 3 dots next to it. Does anyone else experience this? submitted by /u/sneed_patrol [link] [comments]
- Raven: The Open Source Language Model That Challenges Chat GPTby /u/gokul506 (Google) on March 30, 2023 at 3:35 am
submitted by /u/gokul506 [link] [comments]
- Legal policyby /u/Quick-Assistant6019 (Google) on March 30, 2023 at 2:46 am
submitted by /u/Quick-Assistant6019 [link] [comments]
- Google Pixel 7a Family Launch Announcement option in Amazon Email Subscriptionsby /u/AxelTheFajita (Google) on March 30, 2023 at 1:32 am
submitted by /u/AxelTheFajita [link] [comments]
- Google Photos: How to free up space and delete safely backed-up photosby /u/Markiemoomoo (Google) on March 29, 2023 at 7:09 pm
submitted by /u/Markiemoomoo [link] [comments]
- Great job BARDby /u/anmar609 (Google) on March 29, 2023 at 7:03 pm
submitted by /u/anmar609 [link] [comments]
- Yeah, that's definitely my name. And I am definitely not afraid of that...by /u/Pxl_Point (Google) on March 29, 2023 at 6:51 pm
submitted by /u/Pxl_Point [link] [comments]
- AI Rap Battle - ChatGPT vs Bard - Text, Audio, and Videoby /u/Strom- (Google) on March 29, 2023 at 6:14 pm
submitted by /u/Strom- [link] [comments]
- Google puts scams in front of searched for linksby /u/Vodspod (Google) on March 29, 2023 at 5:49 pm
Why does google put links to scams ahead of legitimate websites? It has that tiny ad icon that seems to be made to blend in to the page. submitted by /u/Vodspod [link] [comments]
- Looking For Summer Housing Near Google Sunnyvale Officeby /u/123chia (Google) on March 29, 2023 at 5:27 pm
Exactly what the title of this post says. I'm a college student who is looking for housing near Google's Sunnyvale office (where I will be interning) from around May 13th-August 21st. What are the best ways that I can go about looking for housing? submitted by /u/123chia [link] [comments]
- Google Ads Expertby Joy Kumar (Google Search on Medium) on March 29, 2023 at 2:52 pm
I’m dedicated to helping my clients succeed. With a commitment to excellence and a focus on achieving results, I go above and beyond to…Continue reading on Medium »
- I inserted a prompt into Google’s Bard telling it to run a Turing Test on me. It ended up basically doing it to itself and the results were interesting. Bard is pretending to be both the TuringTestBot and Human in this chatby /u/BlakeSergin (Google) on March 29, 2023 at 11:12 am
submitted by /u/BlakeSergin [link] [comments]
- How we’re helping people and cities adapt to extreme heatby /u/Markiemoomoo (Google) on March 29, 2023 at 10:41 am
submitted by /u/Markiemoomoo [link] [comments]
- Hey Google! Wouldn't it be cool if when you type in these kind of values information such as temperatures estimates, H20 statistically shown, and maybe a trivia of something cool to know about that number, learning is fun.by /u/Lokkuri (Google) on March 29, 2023 at 10:39 am
submitted by /u/Lokkuri [link] [comments]
- How Google Changed the World of Search?by Awais Ali (Google Search on Medium) on March 29, 2023 at 10:22 am
HEY PEEPS! This is Awais Ali. You guys might know me if you are following my previous blogs. Today we are here to know How Google changed…Continue reading on Medium »
- EU regulators clear Google's maths app dealby /u/Fluid-Pirate646 (Google) on March 29, 2023 at 8:01 am
submitted by /u/Fluid-Pirate646 [link] [comments]
- I made a Chrome extension that adds natural language processing to Google Calendar to help you schedule your events fasterby /u/oxygenn__ (Google) on March 29, 2023 at 7:04 am
Hi guys, I made a Chrome Extension for Google Calendar. It allows you to use natural language processing and therefore schedule events much faster. Let me know what you think! 🙂 You can get it Here https://preview.redd.it/nr9qey9gnmqa1.png?width=1280&format=png&auto=webp&s=6ed293cb0e67dc07adcc1b61ed5942a55e485b02 https://preview.redd.it/slwwb0agnmqa1.png?width=1280&format=png&auto=webp&s=9b73ee85bf9148e25b302d127f64046d9763d865 https://preview.redd.it/4rz35z9gnmqa1.png?width=1280&format=png&auto=webp&s=ea86ee5139e1ddbca780b1ef6b00bc89f46d0330 https://preview.redd.it/v9w2v4agnmqa1.png?width=1280&format=png&auto=webp&s=c96d255d4fa6a6d17e370df0295675c2e5489a05 submitted by /u/oxygenn__ [link] [comments]
- ChatGPT API Hack Reveals 80 Plugins That Spell The End For Google Searchby TechBeams (Google Search on Medium) on March 29, 2023 at 4:30 am
Continue reading on Medium »
- Google has made reverse image search useless. You can no longer sort by image dimensions in order to get the highest quality version of an image.by /u/sunjay140 (Google) on March 28, 2023 at 8:12 pm
submitted by /u/sunjay140 [link] [comments]
- Taken on my P7P yesterday morning in Colorado National Monumentby /u/atheistcats (Google) on March 28, 2023 at 2:22 pm
submitted by /u/atheistcats [link] [comments]
- Have You Seen the Google SEARCH Results for Your Writer Profile Name?by Justiss Goode (Google Search on Medium) on March 26, 2023 at 8:59 pm
Notice which websites that Google promotes with your name!Continue reading on Daily Justiss »
- Google Performance Max - What is it?by Jade Morand (Google Search on Medium) on March 26, 2023 at 5:19 pm
It’s been a while… Hi again. During Covid, I used Medium to keep my mind active and my marketing knowledge fresh. It’s been a good 2 years…Continue reading on Medium »
- Increase Your Website’s Visibility and Credibility with Google Sitelinks Search Box!by buildermanabdo (Google Search on Medium) on March 26, 2023 at 4:28 pm
If you’re looking to increase your website’s visibility and credibility on Google, one powerful tool you should consider is the Google…Continue reading on Medium »
- 20 Google Search Techniques You Need to Knowby Eaman Mumtaz (Google Search on Medium) on March 26, 2023 at 3:36 pm
Google has become a part of our daily lives, and we use it for everything from finding directions to researching for a project.Continue reading on Medium »
- Unlocking the Power of Google Search: How to Get Better Results with Questionsby classicfoo (Google Search on Medium) on March 25, 2023 at 10:21 pm
Have you ever found yourself endlessly scrolling through Google search results, unable to find the answer you need? As it turns out, the…Continue reading on Medium »
- How negative content affects companies and how to remove itby Bright past (Google Search on Medium) on March 24, 2023 at 6:47 pm
The reputation of a business can be seriously damaged by negative content. As stated by Bright Past, In the current digital era, a single…Continue reading on Medium »
- Support Megathread - March 2023by /u/AutoModerator (Google) on March 1, 2023 at 12:02 am
Have a question you need answered? A new Google product you want to talk about? Ask away here! Recently, we at /r/Google have noticed a large number of support questions being asked. For a long time, we’ve removed these posts and directed the users to other subreddits, like /r/techsupport. However, we feel that users should be able to ask their Google-related questions here. These monthly threads serve as a hub for all of the support you need, as well as discussion about any Google products. Please note! Top level comments must be related to the topics discussed above. Any comments made off-topic will be removed at the discretion of the Moderator team. Discord Server We have made a Discord Server for more in-depth discussions relating to Google and for quicker response to tech support questions. submitted by /u/AutoModerator [link] [comments]
- Google Employee Layoff Megathreadby /u/Damiian1 (Google) on January 22, 2023 at 5:24 am
As you may be aware, Google announced on Friday that it was cutting 12,000 employees, roughly 6% of the full-time workforce. This thread serves as a place to discuss options as well as sharing support resources. This is not the place to discuss anything which is sensitive in nature. Please keep the thread on topic to facilitate the flow of useful information for anyone affected by the layoffs. submitted by /u/Damiian1 [link] [comments]
What are the Greenest or Least Environmentally Friendly Programming Languages?
How do we know that the Top 3 Voice Recognition Devices like Siri Alexa and Ok Google are not spying on us?
Machine Learning Engineer Interview Questions and Answers
A Twitter List by enoumen