What is the tech stack behind Google Search Engine?
Google Search is one of the most popular search engines on the web, handling over 3.5 billion searches per day. But what is the tech stack that powers Google Search?
The PageRank algorithm is at the heart of Google Search. This algorithm was developed by Google co-founders Larry Page and Sergey Brin and patented in 1998. It ranks web pages based on their quality and importance, taking into account things like incoming links from other websites. The PageRank algorithm has been constantly evolving over the years, and it continues to be a key part of Google Search today.
However, the PageRank algorithm is just one part of the story. The Google Search Engine also relies on a sophisticated infrastructure of servers and data centers spread around the world. This infrastructure enables Google to crawl and index billions of web pages quickly and efficiently. Additionally, Google has developed a number of proprietary technologies to further improve the quality of its search results. These include technologies like Spell Check, SafeSearch, and Knowledge Graph.
The technology stack that powers the Google Search Engine is immensely complex, and includes a number of sophisticated algorithms, technologies, and infrastructure components. At the heart of the system is the PageRank algorithm, which ranks pages based on a number of factors, including the number and quality of links to the page. The algorithm is constantly being refined and updated, in order to deliver more relevant and accurate results. In addition to the PageRank algorithm, Google also uses a number of other algorithms, including the Latent Semantic Indexing algorithm, which helps to index and retrieve documents based on their meaning. The search engine also makes use of a massive infrastructure, which includes hundreds of thousands of servers around the world. While google is the dominant player in the search engine market, there are a number of other well-established competitors, such as Microsoft’s Bing search engine and Duck Duck Go.
The original Google algorithm was called PageRank, named after inventor Larry Page (though, fittingly, the algorithm does rank web pages).
After 17 years of work by many software engineers, researchers, and statisticians, Google search uses algorithms upon algorithms upon algorithms.
- The various components used by Google Search are all proprietary, but most of the code is written in C++.
- Google Search has a number of technical explications on how search works and this is also the limit as to what can be shared publicly.
- https://abseil.io and GogleTest https://google.github.io/googletest/ are the main open source Google C++ libraries, those are extensively used for Search.
- https://bazel.build is an other open source framework which is heavily used all across Google including for Search.
- Google has general information on you, the kinds of things you might like, the sites you frequent, etc. When it fetches search results, they get ranked, and this personal info is used to adjust the rankings, resulting in different search results for each user.
How does Google’s indexing algorithm (so it can do things like fuzzy string matching) technically structure its index?
- There is no single technique that works.
- At a basic level, all search engines have something like an inverted index, so you can look up words and associated documents. There may also be a forward index.
- One way of constructing such an index is by stemming words. Stemming is done with an algorithm than boils down words to their basic root. The most famous stemming algorithm is the Porter stemmer.
- However, there are other approaches. One is to build n-grams, sequences of n letters, so that you can do partial matching. You often would choose multiple n’s, and thus have multiple indexes, since some n-letter combinations are common (e.g., “th”) for small n’s, but larger values of n undermine the intent.
- don’t know that we can say “nothing absolute is known”. Look at misspellings. Google can resolve a lot of them. This isn’t surprising; we’ve had spellcheckers for at least 40 years. However, the less common a misspelling, the harder it is for Google to catch.
- One cool thing about Google is that they have been studying and collecting data on searches for more than 20 years. I don’t mean that they have been studying searching or search engines (although they have been), but that they have been studying how people search. They process several billion search queries each day. They have developed models of what people really want, which often isn’t what they say they want. That’s why they track every click you make on search results… well, that and the fact that they want to build effective models for ad placement.
Each year, Google changes its search algorithm around 500–600 times. While most of these changes are minor, Google occasionally rolls out a “major” algorithmic update (such as Google Panda and Google Penguin) that affects search results in significant ways.
For search marketers, knowing the dates of these Google updates can help explain changes in rankings and organic website traffic and ultimately improve search engine optimization. Below, we’ve listed the major algorithmic changes that have had the biggest impact on search.
It took a starting page and added all the unique (if the word occurred more than once on the page, it was only counted once) words on the page to the index or incremented the index count if it was already in the index.
The page was indexed by the number of references the algorithm found to the specific page. So each time the system found a link to the page on a newly discovered page, the page count was incremented.
When you did a search, the system would identify all the pages with those words on it and show you the ones that had the most links to them.
As people searched and visited pages from the search results, Google would also track the pages that people would click to from the search page. Those that people clicked would also be identified as a better quality match for that set of search terms. If the person quickly came back to the search page and clicked another link, the match quality would be reduced.
Now, Google is using natural language processing, a method of trying to guess what the user really wants. From that it it finds similar words that might give a better set of results based on searches done by millions of other people like you. It might assume that you really meant this other word instead of the word you used in your search terms. It might just give you matches in the list with those other words as well as the words you provided.
It really all boils down to the fact that Google has been monitoring a lot of people doing searches for a very long time. It has a huge list of websites and search terms that have done the job for a lot of people.
There are a lot of proprietary algorithms, but the real magic is that they’ve been watching you and everyone else for a very long time.
What programming language powers Google’s search engine core?
C++, mostly. There are little bits in other languages, but the core of both the indexing system and the serving system is C++.
How does Google handle the technical aspect of fuzzy matching? How is the index implemented for that?
- With n-grams and word stemming. And correcting bad written words. N-grams for partial matching anything.
Use a ping service. Ping services can speed up your indexing process.
- Search Google for “pingmylinks”
- Click on the “add url” in the upper left corner.
- Submit your website and make sure to use all the submission tools and your site should be indexed within hours.
Our ranking algorithm simply doesn’t rank google.com highly for the query “search engine.” There is not a single, simple reason why this is the case. If I had to guess, I would say that people who type “search engine” into Google are usually looking for general information about search engines or about alternative search engines, and neither query is well-answered by listing google.com.
To be clear, we have never manually altered the search results for this (or any other) specific query.
The basic idea is using an inverted index. This means for each word keeping a list of documents on the web that contain it.
Responding to a query corresponds to retrieval of the matching documents (This is basically done by intersecting the lists for the corresponding query words), processing the documents (extracting quality signals corresponding to the doc, query pair), ranking the documents (using document quality signals like Page Rank and query signals and query/doc signals) then returning the top 10 documents.
Here are some tricks for doing the retrieval part efficiently:
– distribute the whole thing over thousands and thousands of machines
– do it in memory
– looking first at the query word with the shortest document list
– keeping the documents in the list in reverse PageRank order so that we can stop early once we find enough good quality matches
– keep lists for pairs of words that occur frequently together
– shard by document id, this way the load is somewhat evenly distributed and the intersection is done in parallel
– compress messages that are sent across the network
Jeff Dean in this great talk explains quite a few bits of the internal Google infrastructure. He mentions a few of the previous ideas in the talk.
He goes through the evolution of the Google Search Serving Design and through MapReduce while giving general advice about building large scale systems.
As for complexity, it’s pretty hard to analyze because of all the moving parts, but Jeff mentions that the the latency per query is about 0.2 s and that each query touches on average 1000 computers.
If Lemoine’s claims are true, it would be a milestone in the history of humankind and technological development.
Google strongly denies LaMDA has any sentient capacity.
Fun facts about Google Search Engine Competitors
Data Source: statcounterGS
Tools Used: Excel & PowerPoint
Edit: Note that the data for Baidu/China is likely higher. How statcounterGS collects the data might understate # users from China.
Baidu is popular in China, Yandex is popular in Russia.
Yandex is great for reverse image searches, google just can’t compete with yandex in that category.
Normal Google reverse search is a joke (except for finding a bigger version of a pic, it’s good for that), but Google Lens can be as good or sometimes better at finding similar images or locations than Yandex depending on the image type. Always good to try both, and also Bing can be decent sometimes.
Bing has been profitable since 2015 even with less than 3% of the market share. So just imagine how much money Google is taking in.
Firstly: Yahoo, DuckDuckGo, Ecosia, etc. all use Bing to get their search results. Which means Bing’s usage is more than the 3% indicated.
Secondly: This graph shows overall market share (phones and PCs). But, search engines make most of their money on desktop searches due to more screen space for ads. And Bing’s market share on desktop is WAY bigger, its market share on phones is ~0%. It’s American desktop market share is 10-15%. That is where the money is.
What you are saying is in fact true though. We make trillions of web searches – which means even three percent market-share equals billions of hits and a ton of money.
I like duck duck go. And they have good privacy features. I just wish their maps were better because if I’m searching a local restaurant nothing is easier than google to transition from the search to the map to the webpage for the company. But for informative searches I think it gives a more objective, less curated return.
Use Ecosia and profits go to reforestation efforts!
Turns out people don’t care about their privacy, especially if it gets them results.
I recently switched to using brave browser and duck duck go and I basically can’t tell the difference in using Google and chrome.
The only times I’ve needed to use Google are for really specific searches where duck duck go doesn’t always seem to give the expected results. But for daily browsing it’s absolutely fine and far far better for privacy.
Does Google Search have the most complex functionality hiding behind a simple looking UI?
There is a lot that happens between the moment a user types something in the input field and when they get their results.
Google Search has a high-level overview, but the gist of it is that there are dozens of sub systems involved and they all work extremely fast. The general idea is that search is going to process the query, try to understand what the user wants to know/accomplish, rank these possibilities, prepare a results page that reflects this and render it on the user’s device.
I would not qualify the UI of simple. Yes, the initial state looks like a single input field on an otherwise empty page. But there is already a lot going on in that input field and how it’s presented to the user. And then, as soon as the user interacts with the field, for instance as they start typing, there’s a ton of other things that happen – Search is able to pre-populate suggested queries really fast. Plus there’s a whole “syntax” to search with operators and what not, there’s many different modes (image, news, etc…).
One recent iteration of Google search is Google Lens: Google Lens interface is even simpler than the single input field: just take a picture with your phone! But under the hood a lot is going on. Source.
The Google search engine is a remarkable feat of engineering, and its capabilities are only made possible by the use of cutting-edge technology. At the heart of the Google search engine is the PageRank algorithm, which is used to rank web pages in order of importance. This algorithm takes into account a variety of factors, including the number and quality of links to a given page. In order to effectively crawl and index the billions of web pages on the internet, Google has developed a sophisticated infrastructure that includes tens of thousands of servers located around the world. This infrastructure enables Google to rapidly process search queries and deliver relevant results to users in a matter of seconds. While Google is the dominant player in the search engine market, there are a number of other search engines that compete for users, including Bing and Duck Duck Go. However, none of these competitors have been able to replicate the success of Google, due in large part to the company’s unrivaled technological capabilities.
- These personal accounts have been inactive for two years, and starting this week, Google will begin deleting them.by /u/Tycoonstory2020 (Google) on December 8, 2023 at 12:06 pm
submitted by /u/Tycoonstory2020 [link] [comments]
- Preferring Bard with Gemini to ChatGPTby /u/Unlikely-Loan-4175 (Google) on December 8, 2023 at 10:41 am
I've been using ChatGPT since it came out and have been a ChatGPT Plus user for a few months. I use it a lot for programming and learning new things in general. It's amazing. But the GPT4 experience is slow and seems to have deteriorated in quality the last while. Started using Gemini in Bard the last few days and find myself using it over the paid ChatGPT service. The answers seem a little more concise and on target. It is just ..how shall I put this...less bonkers! If this is the free version, can't wait for the premium. And the multi-model stuff seems very promising. I wonder are we in a Hotmail/GMail situation where Google bided their time but will just sweep the field here. Who knows? But I am liking the competition. It will benefit us users. submitted by /u/Unlikely-Loan-4175 [link] [comments]
- We expect Gemini to play chessby /u/tausiqsamantaray (Google) on December 8, 2023 at 8:01 am
I know that LLMs !== (not equal to) chess engines, it's obvious that they can't play chess. But, as LLMs are quite advanced, I, hope they will play decent chess soon (it's expected). Here is an image where the Gemini pro model told itself that it's 1600 Elo. But, it can't outstand the Komodo chess engine. It bemused itself and the game ends. Here are some screenshots: https://imgur.com/a/8QeHDT6 submitted by /u/tausiqsamantaray [link] [comments]
- Gemini: A Family of Highly Capable Multimodal Modelsby /u/Santarini (Google) on December 8, 2023 at 12:52 am
submitted by /u/Santarini [link] [comments]
- Google says new AI model Gemini outperforms ChatGPT in most testsby /u/Santarini (Google) on December 8, 2023 at 12:47 am
submitted by /u/Santarini [link] [comments]
- Alphabet Inc. (NASDAQ: GOOG) Currently in an Uptrend Move for NASDAQ:GOOG by DEXWireNewsby /u/ExternalCollection92 (Google) on December 7, 2023 at 7:49 pm
submitted by /u/ExternalCollection92 [link] [comments]
- Just saw Google has supposedly released instructions for people to recover missing Drive filesby /u/RoastPsyduck (Google) on December 7, 2023 at 7:17 pm
submitted by /u/RoastPsyduck [link] [comments]
- Google Bard finally passes this logic test thanks to Gemini that previous model failed to doby /u/hasanahmad (Google) on December 7, 2023 at 6:12 pm
submitted by /u/hasanahmad [link] [comments]
- Gemini Ultra Version Surpasses Human Expertiseby /u/Neelesh15 (Google) on December 7, 2023 at 4:27 pm
submitted by /u/Neelesh15 [link] [comments]
- Gorilla_ Custom X-ATXby /u/Gorilla_Custom (Google) on December 7, 2023 at 3:44 pm
submitted by /u/Gorilla_Custom [link] [comments]
- ibrahim murat gündüz google imagesby Turkey fight crime news (Google Search on Medium) on December 7, 2023 at 12:48 pm
ibrahim murat gündüzContinue reading on Medium »
- Web & App Activity confused!by /u/An2netB (Google) on December 7, 2023 at 10:38 am
My web & app activity on android phone has started playing up. Doesn’t update like it used to (as soon as an app or phone was used) however will update 24hrs later and groups all the activity at the same time. It shows that I used about 7 apps all at the same time which isn’t the case. It also has the time of these activities incorrect, often 2hrs late ie if I used WhatsApp at 6.30am it will show up as used at 8.30am along with a whole bunch of other apps. What could be causing this please and how to fix it?? Thankyou submitted by /u/An2netB [link] [comments]
- Google's Gemini ♊by /u/Muted-Two1810 (Google) on December 7, 2023 at 10:04 am
Top AI technology Chat GPT , Bard , Bing , Komo and now from den of Google all Google apps and services will be integrated with the new AI masterpiece technology Gemini which is in direct fight with GPT 4 , which is capable of text , image , Audio , Gif , video interaction 👏👏 submitted by /u/Muted-Two1810 [link] [comments]
- Google introduces Gemini, its 'most capable' AI model to take on OpenAI's GPT-4 | - Times of Indiaby /u/Acceptable_Passion92 (Google) on December 7, 2023 at 7:16 am
submitted by /u/Acceptable_Passion92 [link] [comments]
- Need to get rid of my Instagram account from Google search without going privateby /u/Some_Ad7616 (Google) on December 7, 2023 at 4:00 am
Hey guys! I have a public Instagram account that shows up on Google search when you search up my name. My username does not contain my actual name in it. I thought this is because my account is public, but my friend’s account that is public does not show up from searching his name. One thing I did notice is that it says my high school name in the Google description but I don’t have any link or description of my high school anywhere so I don’t even know where that came from either. Anyway, my account is on public for creative connection purposes. However, I don’t want it to show up on Google search. Is there anyway I can get rid of my Instagram account from Google, searching my name? I already revoked all third-party access from my Instagram. I don’t want to have to go private because the engagement on my creative projects will likely go down a lot. submitted by /u/Some_Ad7616 [link] [comments]
- Be supportive or share, please, I would be grateful for your help. 🤍by /u/yolandasedu (Google) on December 7, 2023 at 2:45 am
Hello¡ to all my new friends, today, how are you? ¡i always answer that question in the evening when my day is over! the last time i spoke with my friend was at our high school graduation, he gave me a letter written by him, where he expressed all the appreciation he had for our friendship and the best he wished us in this new stage of our lives, between very nice words that i remember with nostalgia and at the end his mother's number to be in touch, we said goodbye, i returned home, when i was in my room i realized i had lost the letter. I looked for him on the internet many times and there was no result, he was my only friend, i always think about him and wonder if he remembers me, i miss him! I have remained in the abyss for many years, since i was born more than love and understanding, i have felt a lot of fear and loneliness, my vulnerability and insecurities speak of everything i have been through and it saddens me that the interpreters are my relatives, they do not care when i needed help, my emotional support has been my cats that i adopted from the streets, Salvathe and her children: Lorde, Caludia, Botas blancas, Samuel, Wilson, Victor and Leonard. They are the joy of the hours of my days, but they are not united and eat little, which makes me sad because i can't feed them every day, for our sake i decided to give them up separately for adoption, but no one adopted them, which is understandable, it is a big responsibility considering the situation we are in, but i am looking for someone who understands me and cares about me, i don't want to be dead again In life, I'll be glad to meet you and have a nice friendship. I am from Caracas-Venezuela, i am 18 years old and i have no income, i am in a difficult situation due to the problems of my country, i am alone and i do not have the help of my relatives, i want to make my life, prepare myself, progress and achieve everything i propose, but my country cannot give me the opportunities and i could not finish my studies, as long as i remain in it i will not be able to achieve anything in life, i do not find any solution but to leave the country to be happy, i would be in need of any financial help, which i would be very grateful, i am counting on you because i need you and i know you can help me either by collaborating or sharing my publication, any questions or doubts I will be answering, thank you. submitted by /u/yolandasedu [link] [comments]
- Apple reveals 'push notification spying' by foreign governments/US Governmentby /u/Drtysouth205 (Google) on December 6, 2023 at 6:58 pm
submitted by /u/Drtysouth205 [link] [comments]
- Google Gemini Multimodal demo is incredibleby /u/hasanahmad (Google) on December 6, 2023 at 5:45 pm
submitted by /u/hasanahmad [link] [comments]
- https://blog.google/technology/ai/google-gemini-ai/by /u/AdithyaSai (Google) on December 6, 2023 at 5:39 pm
Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video submitted by /u/AdithyaSai [link] [comments]
- Welcome to the Gemini eraby /u/bartturner (Google) on December 6, 2023 at 4:45 pm
submitted by /u/bartturner [link] [comments]
- Hands-on with Gemini: Interacting with multimodal AIby /u/dat0dat1 (Google) on December 6, 2023 at 4:42 pm
submitted by /u/dat0dat1 [link] [comments]
- Google Calendar questionby /u/Mooseboots1999 (Google) on December 6, 2023 at 2:30 pm
My 92 year old father entered a nursing home on Mon, and I am looking to create a calendar to track his activities, doctor appointments, visitors, etc. I would like some advice here on the best way to accomplish my goals: I want to “share” this calendar with family. No one else in my family will be tech savvy enough to manage any electronic sharing. I will probably print a weekly calendar to a PDF and email it out on Sunday as a “week ahead” look. I want to simply nudge family to visit him on regular intervals, manage their visits to not conflict with medical appointments, etc. A couple of ideas: 1. Create a separate Google account for him, and then add that account to my main account as a managed profile. 2. Create a calendar on my account that I can turn the visibility on/off so his appointments don’t clutter my calendar when I’m not managing his stuff. I don’t think anyone else will ever create an event for him. They will call or text me when they want to add something. I’m a big advocate of using Google Calendar to manage my life, but I haven’t dived into trying to do anything really complex. Just thought I’d cast a question out to see if anyone has advice on how to set this up properly to make life easier from the onset. Thanks! submitted by /u/Mooseboots1999 [link] [comments]
- IBM, Meta lead 50+ tech firms to counter AI dominance of OpenAI, Googleby /u/intengineering (Google) on December 6, 2023 at 12:44 pm
submitted by /u/intengineering [link] [comments]
- How to remove people from "shared location" list if they are "offline"? I don't see them from my PCby /u/Top-Seaweed1862 (Google) on December 5, 2023 at 8:14 pm
submitted by /u/Top-Seaweed1862 [link] [comments]
- Google Drive Scan VS One Drive Scan!by /u/Scary_Cash2477 (Google) on December 5, 2023 at 10:57 am
I've been a frequent user of the G Drive Scanner feature, and boy, since the last update, it's become soooo much worse! Now, every paper scan got an extra gray undertone, camera shadows linger, and the overall scan quality took a nosedive. Printing a Google Drive Scan practically sends my printer on an ink refill marathon. Bravo, Google Drive dev team, for turning scanning into a gray nightmare! Google Drive Scan VS One Drive Scan! https://preview.redd.it/r88lmwcuig4c1.png?width=2255&format=png&auto=webp&s=83713d11784b68aa4eb31283fbec499ad042f33a https://preview.redd.it/yf9wcrzejg4c1.png?width=1188&format=png&auto=webp&s=4ddff3cbc3800ddb6da014be4cfd7519ac041bdf submitted by /u/Scary_Cash2477 [link] [comments]
- Gmail gets one of its biggest security update - Times of Indiaby /u/Acceptable_Passion92 (Google) on December 5, 2023 at 6:32 am
submitted by /u/Acceptable_Passion92 [link] [comments]
- Search Engine Battle: Google vs. Bing vs. DuckDuckGo — Unveiling the Search Secretsby Studywithaditya (Google Search on Medium) on December 5, 2023 at 6:26 am
Homesearch-engine-battle Search Engine Battle: Google vs. Bing vs. DuckDuckGo — Unveiling the Search SecretsContinue reading on Medium »
- Google Search vs Local PC Searchby John Clayton Blanc (Google Search on Medium) on December 1, 2023 at 4:40 pm
Have you ever experienced the frustration of waiting for your computer to cough up search results while Google effortlessly delivers them…Continue reading on AI Mind »
- Ինչ են Որոնում Հայերը Google-ում և Youtube-ումby Aram Sanamyan (Google Search on Medium) on November 29, 2023 at 8:09 pm
Վստահ եմ, ձեզանից յուրաքանչյուրն առնվազն մեկ անգամ որոնել է իր անունը Google-ում։ Բոլորիս էլ հետաքրքիր է, թե որքանով է Google-ը մեր անձը…Continue reading on Medium »
- Navigating the New Landscape: Google Notes and Its Impact on SaaSby Steve Miller (Google Search on Medium) on November 29, 2023 at 5:45 pm
Unpacking the Latest Google Feature and Its Implications for SaaS BusinessesContinue reading on Medium »
- What is the impact of game piracy on Google search results?by Pellonia (Google Search on Medium) on November 29, 2023 at 1:45 pm
The impacts of game piracyContinue reading on Medium »
- My Journey to ranking on Google Search : Unveiling the Secrets of SEO-Optimized Bloggingby Rishabh Jain (Google Search on Medium) on November 28, 2023 at 6:04 pm
A few months ago, I was just another voice in the vast digital wilderness, pouring my heart and soul into blog posts that seemed to vanish…Continue reading on The AI Soup »
- Inspiring Change: Michael Bates Path to Entrepreneurship and Giving Backby Entertainment Media Group (Google Search on Medium) on November 27, 2023 at 11:01 pm
Business TrendingContinue reading on Medium »
- Should you invest in SEO in 2024 after SGE update?by Nicholas Cheung (Google Search on Medium) on November 26, 2023 at 3:27 pm
Believe it or not, they’re saying “SEO is dead” once again.Continue reading on Medium »
- What Happens When You Type Something On Google Search Bar and Hit ‘Enter’ On The Keyboard.by Tunde Babatunde (Google Search on Medium) on November 25, 2023 at 10:58 am
Step A: The Text InputContinue reading on Medium »
- Support Megathread - November 2023by /u/AutoModerator (Google) on November 1, 2023 at 12:01 am
Have a question you need answered? A new Google product you want to talk about? Ask away here! Recently, we at /r/Google have noticed a large number of support questions being asked. For a long time, we’ve removed these posts and directed the users to other subreddits, like /r/techsupport. However, we feel that users should be able to ask their Google-related questions here. These monthly threads serve as a hub for all of the support you need, as well as discussion about any Google products. Please note! Top level comments must be related to the topics discussed above. Any comments made off-topic will be removed at the discretion of the Moderator team. Discord Server We have made a Discord Server for more in-depth discussions relating to Google and for quicker response to tech support questions. submitted by /u/AutoModerator [link] [comments]
How do we know that the Top 3 Voice Recognition Devices like Siri Alexa and Ok Google are not spying on us?
A Twitter List by enoumen