How does a database handle pagination?

How does a database handle pagination?

How does a database handle pagination?

How does a database handle pagination?

It doesn’t. First, a database is a collection of related data, so I assume you mean DBMS or database language.

Second, pagination is generally a function of the front-end and/or middleware, not the database layer.

But some database languages provide helpful facilities that aide in implementing pagination. For example, many SQL dialects provide LIMIT and OFFSET clauses that can be used to emit up to n rows starting at a given row number. I.e., a “page” of rows. If the query results are sorted via ORDER BY and are generally unchanged between successive invocations, then that can be used to implement pagination.

That may not be the most efficient or effective implementation, though.

How does a database handle pagination?

So how do you propose pagination should be done?

On context of web apps , let’s say there are 100 mn users. One cannot dump all the users in response.

Cache database query results in the middleware layer using Redis or similar and serve out pages of rows from that.

What if you have 30, 000 rows plus, do you fetch all of that from the database and cache in Redis?

I feel the most efficient solution is still offset and limit. It doesn’t make sense to use a database and then end up putting all of your data in Redis especially data that changes a lot. Redis is not for storing all of your data.

If you have large data set, you should use offset and limit, getting only what is needed from the database into main memory (and maybe caching those in Redis) at any point in time is very efficient.

With 30,000 rows in a table, if offset/limit is the only viable or appropriate restriction, then that’s sometimes the way to go.

More often, there’s a much better way of restricting 30,000 rows via some search criteria that significantly reduces the displayed volume of rows — ideally to a single page or a few pages (which are appropriate to cache in Redis.)

It’s unlikely (though it does happen) that users really want to casually browse 30,000 rows, page by page. More often, they want this one record, or these small number of records.

 

Question: This is a general question that applies to MySQL, Oracle DB or whatever else might be out there.

I know for MySQL there is LIMIT offset,size; and for Oracle there is ‘ROW_NUMBER’ or something like that.

But when such ‘paginated’ queries are called back to back, does the database engine actually do the entire ‘select’ all over again and then retrieve a different subset of results each time? Or does it do the overall fetching of results only once, keeps the results in memory or something, and then serves subsets of results from it for subsequent queries based on offset and size?

If it does the full fetch every time, then it seems quite inefficient.

If it does full fetch only once, it must be ‘storing’ the query somewhere somehow, so that the next time that query comes in, it knows that it has already fetched all the data and just needs to extract next page from it. In that case, how will the database engine handle multiple threads? Two threads executing the same query?

something will be quick or slow without taking measurements, and complicate the code in advance to download 12 pages at once and cache them because “it seems to me that it will be faster”.

Answer: First of all, do not make assumptions in advance whether something will be quick or slow without taking measurements, and complicate the code in advance to download 12 pages at once and cache them because “it seems to me that it will be faster”.

YAGNI principle – the programmer should not add functionality until deemed necessary.
Do it in the simplest way (ordinary pagination of one page), measure how it works on production, if it is slow, then try a different method, if the speed is satisfactory, leave it as it is.


From my own practice – an application that retrieves data from a table containing about 80,000 records, the main table is joined with 4-5 additional lookup tables, the whole query is paginated, about 25-30 records per page, about 2500-3000 pages in total. Database is Oracle 12c, there are indexes on a few columns, queries are generated by Hibernate. Measurements on production system at the server side show that an average time (median – 50% percentile) of retrieving one page is about 300 ms. 95% percentile is less than 800 ms – this means that 95% of requests for retrieving a single page is less that 800ms, when we add a transfer time from the server to the user and a rendering time of about 0.5-1 seconds, the total time is less than 2 seconds. That’s enough, users are happy.


And some theory – see this answer to know what is purpose of Pagination pattern

  • Panduan Lengkap: Instalasi MySQL dengan XAMPP untuk Pemula
    by Fajri Farid (Database on Medium) on July 24, 2024 at 3:07 am

    Jika teman-teman baru memulai perjalanan Anda dalam dunia pengembangan web dan database, XAMPP adalah alat yang sangat berguna yang perlu…Continue reading on Medium »

  • Database And Server Icon Pack
    by Photonprophet (Database on Medium) on July 24, 2024 at 12:43 am

    Continue reading on Medium »

  • Choosing the right database(s)
    by /u/Secure-Economist-986 (Database) on July 24, 2024 at 12:43 am

    Hi guys, we're currently planning migrating from SQL Server to another db engine because of the costs. Basically we're storing all data in SQL Server databases. One for system database which size is ~5GB. The data we store here, customers, users, settings etc.. like required to run the product. Another db, that we store the data have fixed schema. Besides the schema, the data is not changing either. Once created it stay same forever. We call it DATA db. We shard this DATA db annually. Every Jan 1st, we create a new db for that year with year suffix and redirect queries to this db for inserts. Annual size is ~2TB and ~3 billion rows. But this db frequently queried by customers. We have 18 DATA db for last 18 years. After short brief, we're planning to migrate another db engine to reduce costs and the first candidate for the system db is Postgres. Since our data is relational and team has experience on Postgres as well as SQL Server, we keen to pick Postgres. But for the DATA dBs, we have doubts about picking Postgres for that purpose, not because of problems but we're thinking if there's better option for that use case. Basically we're looking a database that can handle 100K+ writes/second and much more important it should serve 100K+ rows in seconds (under 5 seconds could be best if possible) Just letting you know, currently we're having 100K+ rows in 15 seconds min. Not going down. Our most used query has 4 where clauses and we have composite index which contains these 4 columns in the same order of the query and wee maintenanced. (Server: Xeon Premium, 16 cores, 128GB RAM and there's no disk performance issues. avg. disk queue len is ok, avg. disk read is ok) We're aware that Postgres can handle both of this workload but would love to hear some recommandations. Especially NoSQL databases look promising but I don't have enough experience to convince the team about them. submitted by /u/Secure-Economist-986 [link] [comments]

  • Help finding MariaDB Columnstore Pentaho Plugin
    by /u/Konatokun (Database) on July 24, 2024 at 12:25 am

    Good day, At this time I'm in need of Maria DB ColumnStore Kettle plugin because I'm moving some data to a ColumnStore based database for some reports. I tried compiling the github sources but I couldn't, and all prebuilt versions mentioned in the docs arent available. I'm using Pentaho Data Integration 9.4 submitted by /u/Konatokun [link] [comments]

  • SQL 101 | Chapter 2: Setting Up Your Database Environment
    by Yǔ jīn《与金》 (Database on Medium) on July 24, 2024 at 12:12 am

    Setting up a robust and efficient database environment is a crucial step for businesses and individuals alike.Continue reading on Medium »

  • SQL 101 | Bab 2: Menyiapkan Lingkungan Basis Data Anda
    by Yǔ jīn《与金》 (Database on Medium) on July 24, 2024 at 12:12 am

    Menyiapkan lingkungan basis data yang kuat dan efisien merupakan langkah penting bagi bisnis dan individu. Baik Anda mengelola data untuk…Continue reading on Medium »

  • Backup SQL Server Database with Python: A Step-by-Step Guide
    by Salim Segaf Alqosam (Database on Medium) on July 23, 2024 at 11:27 pm

    Backing up a SQL Server database using Python is a practical and efficient solution. This article provides a detailed step-by-step guide toContinue reading on Medium »

  • Últimos Jogadores Atualizados — 23/07/2024
    by Tudo pelo Futebol (Database on Medium) on July 23, 2024 at 8:14 pm

    Continue reading on Medium »

  • Últimos Jugadores Actualizados — 23/07/2024
    by Todo por el Fútbol (Database on Medium) on July 23, 2024 at 8:14 pm

    Continue reading on Medium »

  • Last Updated Players — 07/23/2024
    by Everything for Football (Database on Medium) on July 23, 2024 at 8:14 pm

    Continue reading on Medium »

  • Database Design For Role Based Access Control With Admin For Specific Roles
    by /u/silverparzival (Database) on July 23, 2024 at 7:09 pm

    I am trying to build an application and I am trying to create role-based access control for my application. To explain I am going to use a basic scenario below. Assume I have 5 users for a blog, regular USER, SUPER ADMIN, ADMIN, EDITOR, REVIEW. A SUPER ADMIN has all the privileges. An ADMIN can have permissions specified by SUPER ADMIN. Scenario: A SUPER ADMIN can create an ADMIN and an ADMIN can for example create a REVIEWER ADMIN. A REVIEWER ADMIN can create more REVIEWERS and limit the permissions specific to reviewers. For example, the REVIEWER ADMIN creates 2 users, Reviewer A and Reviewer B and then gives them permissions. Reviewer A can only view blog posts and Reviewer B can view and delete posts. Note that the permissions will be specific to only reviewers. For example, the Reviewer ADMIN can only create users and then set permissions relating to review routes. I want to design the database in Postgres for the above but I am having a hard time understanding how to model the resources. Any sample database similar to the above or pointing me in the right direction will help as I have exhausted searching online and watching videos on YouTube. Thank you. submitted by /u/silverparzival [link] [comments]

  • Sharding in Database…
    by Ashmit Pandey (Database on Medium) on July 23, 2024 at 6:37 pm

    Sharding is a Database architecture pattern which involves dividing of large database into smaller manageable pieces of data called…Continue reading on Medium »

  • Database Design for Google Calendar: a tutorial
    by /u/BLochmann (Database) on July 23, 2024 at 5:55 pm

    submitted by /u/BLochmann [link] [comments]

  • High Availability in AWS Databases: Ensuring Uptime and Reliability
    by Simrankumari (Database on Medium) on July 23, 2024 at 5:42 pm

    High availability (HA) is crucial for maintaining continuous database operations and minimizing downtime. As your database needs grow…Continue reading on Medium »

  • Are one to many relationships truly one to many because of the constraints put on various fields?
    by /u/PersonalFigure8331 (Database) on July 23, 2024 at 1:30 pm

    The primary key is forced to be unique. But is it the enforced uniqueneess of the entity column (the column the table is about) that actually makes the table perform as a one to many. If I were to allow duplicates on this field, they would essentially behave as a many to many table. Am I understanding this properly? Edit for clarity: I understand that one to many reflects business rules and real-world relationships: I'm asking, very specifically, to understand conceptually, whether it's constraints applied to the table on the many side that truly makes a table behave as a "many" side. Is that constraint what ensures that the one to many relationship actually works as expected. submitted by /u/PersonalFigure8331 [link] [comments]

  • How can I find all unique sub graphs within a graph database such as memgraph?
    by /u/Montags25 (Database) on July 23, 2024 at 12:56 pm

    I have a dataset of tens of thousands of directed subgraphs. The subgraphs are trees, they branch from the root node and never merge. I want to find all the unique subgraphs based on the event_type attribute of the nodes. Currently I have a solution that returns all subgraphs, converts the subgraph into a NetworkX DiGraph, before calling the weisfeiler_lehman_graph_hash to return a unique hash for the graph. This isn't efficient nor does it make use of the graph database. Is there a way I can do this with Cypher queries or within the graph database itself? submitted by /u/Montags25 [link] [comments]

  • Certificate for database analysis
    by /u/ennezetaqu (Database) on July 23, 2024 at 8:11 am

    Hello, I currently work as a ETL developer, but have no certificates. I was thinking about taking some certificates, but I'm not sure about which one is better. Do you have any advice? submitted by /u/ennezetaqu [link] [comments]

  • Security advice needed regarding views (DB beginner)
    by /u/Desater_ (Database) on July 23, 2024 at 7:43 am

    Hey there, I have a question that is about views in postgres. I have records that I want to be available to public/anon. Problem is, that userIDs are part of the record and I do not want to expose them. I created a security definer view, that hides the ids and only exposes the information I want. I am unexperienced with databases and can not really estimate the risks of this. I thought views are exactly for this use-case, but I am getting warned about this. I would be glad if you could provide information what a good practice would be, and what I have to ensure that my application is safe. Thank you! submitted by /u/Desater_ [link] [comments]

  • Simple Laundry POS ERD
    by /u/Menihocbacc (Database) on July 23, 2024 at 6:48 am

    Hello, we're trying to create a schema for our laundry POS project. Here is our initial representation of the schema. This project is a simple one, we would like to receive feedback from sources other than ourselves and our peers. We are still students and new to this, but we are mainly concerned about the one-to-many and many-to-one or many to many relationship. Please make an honest review on our laundry POS ERD roast it if you have to. https://preview.redd.it/iybphc0ft7ed1.jpg?width=3669&format=pjpg&auto=webp&s=cefdc2119bfa0b183e0d0986967c0b967d0d9faf submitted by /u/Menihocbacc [link] [comments]

  • How would you architect this?
    by /u/bishop491 (Database) on July 22, 2024 at 8:53 pm

    Background: this is an annual PITA that I'm told will never change. I have greater hopes. It works as follows... Committee decides on what laptop specifications are acceptable this year for vendors to sell to us They send the specs in a spreadsheet We update a template in Excel that has validations for all options (OS, RAM, Storage, etc) That template goes out to vendors, they select options and send back to us We upload to a database via Java web app This whole thing seems incredibly complicated and clunky. The committee decides the minimum specs, but our developers (not our database people or me) decide what "other" values can be in there. For example, if 32G of memory is minimum, they will put in 32G, 64G, etc, etc. This also means that if a vendor has a config that we don't have in the list but meets the minimum, the validations still think that is wrong. Sometime back, they allowed free text in all fields but then complained about cleaning the data that came in. That's why the validations were done. But that's the opposite end of the spectrum because now we are shooting in the dark at options that may or may not be there. And then they will free-text options like graphics and put in proprietary language that doesn't even tell if it meets minimum spec. I want to re-architect this but am told it's been this way and always will be. What makes sense in my head is to have the vendors put in the models of the laptops they sell and then behind the scenes we go to CDW or somewhere similar to grab the specs on those machines. That would then pull in the specs and the vendors can validate from there. We spend so many hours propping up a bad process. I mean, honestly the spreadsheet is pretty useless in terms of "validating." How would you redesign this? submitted by /u/bishop491 [link] [comments]

  • 1.5TB Database
    by /u/nikola_0020 (Database) on July 22, 2024 at 3:30 pm

    Hello everyone! I have 700 .gz files, each around 200MB compressed, which expand to 2GB each (totaling 1.5TB). I want to avoid making a huge 1.5TB database. Is there a good way to store this compressed data so it’s queryable? Also, how can I efficiently load these 700 files? I use Postgresql for the DB. Thanks in advance! You're awesome! submitted by /u/nikola_0020 [link] [comments]

  • Need advice on a Database Management System
    by /u/Fxguy1 (Database) on July 22, 2024 at 2:26 pm

    I'm somewhat new to database development but know enough to be dangerous. I'm trying to build a relational database that has 3 tables - a table containing a list of medications, a table with a list of health problems, and a link table that links a medication to a health problem. I'm looking for something that has more of a GUI and the ability to manually enter data like you would with Microsoft Access. Unfortunately I'm on a Mac and so Access isnt really an option. The ultimate goal is a web application so a web page frontend with php and mysql on the backend. The only reason I am inclined to use mysql is it is all I know. Perhaps PostgreSQL or MongoDB would be easier? Are there other options that are easier to use than MySQL? Ideally I could build and test self-hosting and then move to a hosted site / server to deploy. submitted by /u/Fxguy1 [link] [comments]

  • Database design for storing image metadata
    by /u/kevinfat2 (Database) on July 22, 2024 at 5:01 am

    I'm trying to understand what are typical designs for storing image metadata. Imagine you might have scans of drawings, scans of documents, photos, etc. A scan of a multipage document represented as a sequence of images might lead to something like the below tables such as the following as a super basic starting point. Here a scanned document would be represented by a images_groups and the multiple images point back to that with group_id. Now for the various one off photos would these also need an entry in images_groups even if there is only a single image. Is paying the cost of a row in images_groups a typical design here? I'm trying to learn what are the various options that have been tried in the past and thought of even for something nit picky like having to store an extra row for each image. images - id - group_id - filepath images_groups - id submitted by /u/kevinfat2 [link] [comments]

  • What kind of data management tools are you using for data management in your organization or nonprofit?
    by /u/Temporary_Secret_ (Database) on July 21, 2024 at 10:31 pm

    please help me find data management tools and software for data management and collections for organizations and nonprofits. submitted by /u/Temporary_Secret_ [link] [comments]

  • hola engineers! help me out with this simple thing plz
    by /u/Efficient_Elevator15 (Database) on July 21, 2024 at 2:18 pm

    I'm working on a chat app similar to WhatsApp, my database is monogdb and I'm currently stuck on the database design for storing messages. There will be a lot of users sending a lot of messages, so storing each message as a single document in the database isn't practical. Instead, I thought of using time bucketing to group all the messages in a chat by time intervals, such as every hour. This way, we can store all messages sent within an hour in one document in the database. If new messages come in after that hour, we create another bucket to hold the new messages, using either sequence numbers or timestamps to maintain the order. However, I'm unsure where to store the messages in the meantime before they are saved to the database. One option is to temporarily store them in Redis or another caching service. Alternatively, we could use the server's local SSD for temporary storage. I'm not sure which approach to take. Storing each message directly in the database as it is sent isn't ideal because it would slow down the app, preventing real-time messaging. We would have to wait for the database to save the message before it can be displayed. Could you please help me figure out the best way to handle this? submitted by /u/Efficient_Elevator15 [link] [comments]

  • Structure for entities with many dynamic sub-entities
    by /u/severalservals23 (Database) on July 19, 2024 at 10:26 pm

    Hi. My background is mostly in transactional DBs but I'm now helping my company work on a data warehouse. We have an incident reporting system where employees can report on safety incidents that occur in housing we operate. These incidents are individually very complicated. Users fill out a lot of questions, the questions they are asked vary depending on the answers they give (a whole chain of gun-related questions that won't be asked at all if the problem is that someone left a stove on, for instance). The answers are mostly dropdown selections, not free text. Incidents can also have multiple types, because life is messy (someone leaving a stove on, one problem, led to an argument and a fight in a hallway, a different kind of problem). We want to move these into our reporting system, which is MS Fabric, using Synapse for the warehouse and a Lakehouse for an intermediary layer. My impulse looking at this is to extract all the questions and answers and put them in one table, and then have a many to many bridge table that maps a given incident # to every question and answer it spawned. The main table (what would generally be the "fact" table, but we're already outside of snowflake/star schema) would have relatively few columns, but each row would likely appear many times as a foreign key in the questions bridge table. The same would be true of each row in the question/answer table. This appeals to all my transactional instincts. Everything looks neat, you don't have many columns, you don't have lots of columns for each incident that are filled with nulls, you have exactly as many rows as you need. Our consultant has suggested a main table with 177 columns. Each question gets a statically named column. Most of the columns for any given incident are null. This looks like nails on a chalkboard to me, but I might be wrong. Reporting is a new world to me. Is this actually ok? Should I just club all my normalization instincts here? The table just seems so unwieldy and kludgy to manage if people ever change the questions (which doesn't happen often, but once in a while). Thanks for any insight, sorry for the long post. submitted by /u/severalservals23 [link] [comments]

  • YouTube Recap: CrowdStrike Blew Up The Internet
    by /u/Yugabeing1 (Database) on July 19, 2024 at 2:13 pm

    Interesting overview video from John Hammond here: https://www.youtube.com/watch?v=E8RQVx2gBFc submitted by /u/Yugabeing1 [link] [comments]

  • Cassandra database and logic operators...
    by /u/MajesticMistake2655 (Database) on July 19, 2024 at 1:56 pm

    I have to search a list of elements. If i were on MySQL i would use logic operators after WHERE but i saw it wrote that logic operators are not supported and that these operations need to be handled on the server side, however i am asking myself if anyone has any experience submitted by /u/MajesticMistake2655 [link] [comments]

  • Hello engineer friend, help a brother out
    by /u/RemoteTreat3476 (Database) on July 19, 2024 at 12:48 pm

    I am trying to decide whether to use mongodb or postgresql for the given application: The application has mobile and web versions, high traffic where every interaction will be written to the database and there will be read operations with almost every request, too. I will be accessing to this database from node js backend. I prefer json type of responses(I guess both of them provide the same) but this is not a deciding factor. Advanced query capabilities of postgresql are not important since only simple queries will be made. I just want to hear the opinion of the fellow engineers for which one might be more suitable. The database will be on the cloud, probably AWS submitted by /u/RemoteTreat3476 [link] [comments]

  • Best practices for cassandra db
    by /u/MajesticMistake2655 (Database) on July 19, 2024 at 12:16 pm

    Hi everyone, i need to build an apache cassandra db. What are the best security (and in general) practices when building a apache cassandra database? Thank you very much submitted by /u/MajesticMistake2655 [link] [comments]

  • Need help on project.
    by /u/Bionic_Rabbit_5898 (Database) on July 18, 2024 at 4:28 pm

    Hello guys. I am working on a database using MySQL and I wanted to use Budibase for the frontend. Unfortunately, I can't connect my server to budibase, it keeps giving me error messages. Is there an alternative? Have you had that problem before? Please help! submitted by /u/Bionic_Rabbit_5898 [link] [comments]

  • How to design a table to handle additional columns to the main table?
    by /u/billyblak (Database) on July 18, 2024 at 2:05 pm

    TLDR: which of the 2 designs are better suited to handle the new fields (columns) added to the main entity? customfield_A one to many compound primary key (customfield_id, task_id) store the full customfield name (eg. "collaborator name", "print_id", etc.) based on the type, store value in the appropriate value_type column, others are null customfield_B one to one task_id as primary key add new fields as columns https://preview.redd.it/l378pwve7add1.png?width=403&format=png&auto=webp&s=04c1c8f06a21955f2e1201e7cb8ff36d9abe30ce Long version We have an external task manager app; it has a task entity with default fields. Users can add custom fields to different task types. We're exporting the app data to our db. Everything is exported to a csv. The customfields are task_type dependant so there are a lot of rows with null values in those columns. I'm leaning towards A because I don't like to keep atlering the table whenever there's column change. The fields can be removed as well, not just added. submitted by /u/billyblak [link] [comments]

  • How to get entry level jobs that leads to becoming a DBA?
    by /u/monkeysgorawr (Database) on July 18, 2024 at 6:31 am

    I'm looking to make a change into databases from operations/qa. I know some basic coding and SQL. Definitely not enough to be dangerous but I can definitely keep learning. I looked through this sub and saw some say people start off as a software dev, database developer or jr DBA and then later become a DBA since it's a more senior role. My main question is what would you do/learn to potentially land a one of these entry level jobs? Certs? Specific online classes anyone can recommend? submitted by /u/monkeysgorawr [link] [comments]

  • Register for the upcoming FREE US (multi-region)² database roadshow - w/c 22nd July
    by /u/Yugabeing1 (Database) on July 16, 2024 at 2:18 pm

    Want to know more about multi-region databases, distributed databases, database migrations, high availability, and multi-region disaster recovery? Register for FREE for the Yugabyte US multi-region² database roadshow, starting in Palo Alto on July 22nd. (Dates: Palo Alto - Monday, July 22nd, Austin - Wednesday, July 24th, Boston - Thursday, July 25th). Join fellow developers, architects, and entrepreneurs for an evening of tech talks, show-and-tell sessions, and demos. Plus: food, drinks, prizes, and giveaways! Find out more and register. https://info.yugabyte.com/multi-region-roadshow submitted by /u/Yugabeing1 [link] [comments]

  • Understanding database consistency levels and applying them to a single web service
    by /u/techPackets_005 (Database) on July 14, 2024 at 10:10 am

    submitted by /u/techPackets_005 [link] [comments]

Budget to start a web app built on the MEAN stack

I want to start a web app built on the MEAN stack (mongoDB, express.js, angular, and node.js). How much would it cost me to host this site? What resources are there for hosting websites built on the MEAN stack?

I went through the same questions and concerns and I actually tried a couple of different cloud providers for similar environments and machines.

Web Apps Feed

  1. At Digital Ocean, you can get a fully loaded machine to develop and host at $5 per month (512 MB RAM, 20 GB disk ). You can even get a $10 credit by using this link of mine.[1] It is very easy to sign up and start. Just don’t use their web console to connect to your host. It is slow. I recommend using ssh client to connect and it is very fast.
  2. GoDaddy will charge you around 8$ per month for a similar MEAN stack host (512 MB RAM, 1 core processor, 20 Gb disk ) for your MEAN Stack development.
  3. Azure use bitmani’s mean stack on minimum DS1_V2 machine (1core, 3.5 gB RAM) and your average cost will be $52 per month if you never shut down the machine. The set up is a little bit more complicated that Digital Ocean, but very doable. I also recommend ssh to connect to the server and develop.
  4. AWS also offers Bitmani’s MEAN stack on EC2 instances similar to Azure DS1V2 described above and it is around $55 per month.
  5. Other suggestions

All those solutions will work fine and it all depends on your budget. If you are cheap like me and don’t have a big budget, go with Digital Ocean and start with $10 off with this code.

Basic Gotcha Linux Questions for IT DevOps and SysAdmin Interviews

Some IT DevOps, SysAdmin, Developer positions require the knowledge of basic linux Operating System. Most of the time, we know the answer but forget them when we don’t practice very often. This refresher will help you prepare for the linux portion of your IT interview by answering some gotcha Linux Questions for IT DevOps and SysAdmin Interviews.

Get a $10 credit to have your own linux server for your MEAN STACK development and more. It is only $5 per month for a fully loaded Ubuntu machine.

Latest Linux Feeds

I- Networking:

  1. How many bytes are there in a MAC address?
    48.
    MAC, Media Access Control, address is a globally unique identifier assigned to network devices, and therefore it is often referred to as hardware or physical address. MAC addresses are 6-byte (48-bits) in length, and are written in MM:MM:MM:SS:SS:SS format.
  2. What are the different parts of a TCP packet?
    The term TCP packet appears in both informal and formal usage, whereas in more precise terminology segment refers to the TCP protocol data unit (PDU), datagram to the IP PDU, and frame to the data link layer PDU: … A TCP segment consists of a segment header and a data section.
  3. Networking: Which command is used to initialize an interface, assign IP address, etc.
    ifconfig (interface configuration). The equivalent command for Dos is ipconfig.
    Other useful networking commands are: Ping, traceroute, netstat, dig, nslookup, route, lsof
  4. What’s the difference between TCP and UDP; Between DNS TCP and UDP?
    There are two types of Internet Protocol (IP) traffic. They are TCP or Transmission Control Protocol and UDP or User Datagram Protocol. TCP is connection oriented – once a connection is established, data can be sent bidirectional. UDP is a simpler, connectionless Internet protocol.
    The reality is that DNS queries can also use TCP port 53 if UDP port 53 is not accepted.
    DNS uses TCP for Zone Transfer over port :53.
    DNS uses UDP for DNS Queries over port :53.

  5. What are defaults ports used by http, telnet, ftp, smtp, dns, , snmp, squid?
    All those services are part of the Application level of the TCP/IP protocol.
    http => 80
    telnet => 23
    ftp => 20 (data transfer), 21 (Connection established)
    smtp => 25
    dns => 53
    snmp => 161
    dhcp => 67 (server), 68 (Client)
    ssh => 22
    squid => 3128
  6. How many host available in a subnet (Class B and C Networks)
  7. How DNS works?
    When you enter a URL into your Web browser, your DNS server uses its resources to resolve the name into the IP address for the appropriate Web server.
  8. What is the difference between class A, class B and class C IP addresses?
    Class A Network (/ 8 Prefixes)
    This network is 8-bit network prefix. IP address range from 0.0.0.0 to 127.255.255.255
    Class B Networks (/16 Prefixes)
    This network is 16-bit network prefix. IP address range from 128.0.0.0 to 191.255.255.255Class C Networks (/24 Prefixes)
    This network is 24-bit network prefix.IP address range from 192.0.0.0 to 223.255.255.255
  9. Difference between ospf and bgp?
    The first reason is that BGP is more scalable than OSPF. , and this, normal igp like ospf cannot perform. Generally speaking OSPF and BGP are routing protocols for two different things. OSPF is an IGP (Interior Gateway Protocol) and is used internally within a companies network to provide routing.

II- Operating System
1&1 Web Hosting

  1. How to find the Operating System version?
    $uname -a
    To check the distribution for redhat for example: $cat /etc/redhat –release
  2. How to list all the process running?
    top
    To list java processes, ps -ef | grep java
    To list processes on a specific port:
    netstat -aon | findstr :port_number
    lsof -i:80
  3. How to check disk space?
    df shows the amount of disk space used and available.
    du displays the amount of disk used by the specified files and for each subdirectories.
    To drill down and find out which file is filling up a drive: du -ks /drive_name/* | sort -nr | head
  4. How to check memory usage?
    free or cat /proc/meminfo
  5. What is the load average?
    It is the average sum of the number of process waiting in the queue and the number of process currently executing over the period of 1, 5 and 15 minutes. Use top to find the load average.
  6. What is a load balancer?
    A load balancer is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. Load balancers are used to increase capacity (concurrent users) and reliability of applications.
  7. What is the Linux Kernel?
    The Linux Kernel is a low-level systems software whose main role is to manage hardware resources for the user. It is also used to provide an interface for user-level interaction.
  8. What is the default kill signal?
    There are many different signals that can be sent (see signal for a full list), although the signals in which users are generally most interested are SIGTERM (“terminate”) and SIGKILL (“kill”). The default signal sent is SIGTERM.
    kill 1234
    kill -s TERM 1234
    kill -TERM 1234
    kill -15 1234
  9. Describe Linux boot process
    BIOS => MBR => GRUB => KERNEL => INIT => RUN LEVEL
    As power comes up, the BIOS (Basic Input/Output System) is given control and executes MBR (Master Boot Record). The MBR executes GRUB (Grand Unified Boot Loader). GRUB executes Kernel. Kernel executes /sbin/init. Init executes run level programs. Run level programs are executed from /etc/rc.d/rc*.d
    Mac OS X Boot Process:

    Boot ROMFirmware. Part of Hardware system
    BootROM firmware is activated
    POSTPower-On Self Test
    initializes some hardware interfaces and verifies that sufficient memory is available and in a good state.
    EFI Extensible Firmware Interface
    EFI does basic hardware initialization and selects which operating system to use.
    BOOTX boot.efi boot loader
    load the kernel environment
    Rooting/Kernel The init routine of the kernel is executed
    boot loader starts the kernel’s initialization procedure
    Various Mach/BSD data structures are initialized by the kernel.
    The I/O Kit is initialized.
    The kernel starts /sbin/mach_init
    Run Level mach_init starts /sbin/init
    init determines the runlevel, and runs /etc/rc.boot, which sets up the machine enough to run single-user.
    rc.boot figures out the type of boot (Multi-User, Safe, CD-ROM, Network etc.)
  10. List services enabled at a particular run level
    chkconfig –list | grep 5:0n
    Enable|Disable a service at a specific run level: chkconfig on|off –level 5
  11. How do you stop a bash fork bomb?
    Create a fork bomb by editing limits.conf:
    root hard nproc 512
    Drop a fork bomb as below:
    :(){ :|:& };:
    Assuming you have access to shell:
    kill -STOP
    killall -STOP -u user1
    killall -KILL -u user1
  12. What is a fork?
    fork is an operation whereby a process creates a copy of itself. It is usually a system call, implemented in the kernel. Fork is the primary (and historically, only) method of process creation on Unix-like operating systems.
  13. What is the D state?
    D state code means that process is in uninterruptible sleep, and that may mean different things but it is usually I/O.

III- File System

  1. What is umask?
    umask is “User File Creation Mask”, which determines the settings of a mask that controls which file permissions are set for files and directories when they are created.
  2. What is the role of the swap space?
    A swap space is a certain amount of space used by Linux to temporarily hold some programs that are running concurrently. This happens when RAM does not have enough memory to hold all programs that are executing.
  • What is the role of the swap space?
    A swap space is a certain amount of space used by Linux to temporarily hold some programs that are running concurrently. This happens when RAM does not have enough memory to hold all programs that are executing.
  • What is the null device in Linux?
    The null device is typically used for disposing of unwanted output streams of a process, or as a convenient empty file for input streams. This is usually done by redirection. The /dev/null device is a special file, not a directory, so one cannot move a whole file or directory into it with the Unix mv command.You might receive the “Bad file descriptor” error message if /dev/null has been deleted or overwritten. You can infer this cause when file system is reported as read-only at the time of booting through error messages, such as“/dev/null: Read-only filesystem” and “dup2: bad file descriptor”.
    In Unix and related computer operating systems, a file descriptor (FD, less frequently fildes) is an abstract indicator (handle) used to access a file or other input/output resource, such as a pipe or network socket.
  • What is a inode?
    The inode is a data structure in a Unix-style file system that describes a filesystem object such as a file or a directory. Each inode stores the attributes and disk block location(s) of the object’s data.

IV- Databases

  1. What is the difference between a document store and a relational database?
    In a relational database system you must define a schema before adding records to a database. The schema is the structure described in a formal language supported by the database and provides a blueprint for the tables in a database and the relationships between tables of data. Within a table, you need to define constraints in terms of rows and named columns as well as the type of data that can be stored in each column.In contrast, a document-oriented database contains documents, which are records that describe the data in the document, as well as the actual data. Documents can be as complex as you choose; you can use nested data to provide additional sub-categories of information about your object. You can also use one or more document to represent a real-world object.
  2. How to optimise a slow DB?
    • Rewrite the queries
    • Change indexing strategy
    • Change schema
    • Use an external cache
    • Server tuning and beyond
  3. How would you build a 1 Petabyte storage with commodity hardware?
    Using JBODs with large capacity disks with Linux in a distributed storage system stacking nodes until 1PB is reached.
    JBOD (which stands for “just a bunch of disks”) generally refers to a collection of hard disks that have not been configured to act as a redundant array of independent disks (RAID) array.
    JBOD

V- Scripting

  1. What is @INC in Perl?
    The @INC Array. @INC is a special Perl variable that is the equivalent to the shell’s PATH variable. Whereas PATH contains a list of directories to search for executables, @INC contains a list of directories from which Perl modules and libraries can be loaded.
  2. Strings comparison – operator – for loop – if statement
  3. Sort access log file by http Response Codes
    Via Shell using linux commands
    cat sample_log.log | cut -d ‘”‘ -f3 | cut -d ‘ ‘ -f2 | sort | uniq -c | sort -rn
  4. Sort access log file by http Response Codes Using awk
    awk ‘{print $9}’ sample_log.log | sort | uniq -c | sort -rn
  5. Find broken links from access log file
    awk ‘($9 ~ /404/)’ sample_log.log | awk ‘{print $7}’ sample_log.log | sort | uniq -c | sort -rn
  6. Most requested page:
    awk -F\” ‘{print $2}’ sample_log.log | awk ‘{print $2}’ | sort | uniq -c | sort -r
  7. Count all occurrences of a word in a file
    grep -o “user” sample_log.log | wc -w

Learn more at http://career.guru99.com/top-50-linux-interview-questions/

Real Time Linux Jobs

Install and run your first noSQL MongoDB on Mac OSX

Amazon SQL vs NoSQL

Install and run your first noSQL MongoDB on Mac OSX

Classified as a NoSQL database, MongoDB is an open source, document-oriented database designed with both scalability and developer agility in mind. Instead of storing your data in tables and rows as you would with a relational database, in MongoDB you store JSON-like documents with dynamic schemas; This makes the integration of data in certain types of application easier and faster.
Why?
MongoDB can help you make a difference to the business. Tens of thousands of organizations, from startups to the largest companies and government agencies, choose MongoDB because it lets them build applications that weren’t possible before. With MongoDB, these organizations move faster than they could with relational databases at one tenth of the cost. With MongoDB, you can do things you could never do before.

    1. Install Homebrew
      $ /usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”
      Homebrew installs the stuff you need that Apple didn’t.
      $ brew install wget
    2. Install MongoDB
      $ brew install mongodb
    3. Run MongoDB
      Create the data directory: $ mkdir -p /data/db
      Set permissions for the data directory:$ chown -R you:yourgroup /data/db then chmod -R 775 /data/db
      Run MongoDB (as non root): $ mongod
    4. Begin using MongoDB.(MongoDB will be running as soon as you ran mongod above)Open another terminal and run: mongo

Install and run your first noSQL MongoDB on Mac OSX

References: https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/


Ace the 2023 AWS Solutions Architect Associate SAA-C03 Exam with Confidence Pass the 2023 AWS Certified Machine Learning Specialty MLS-C01 Exam with Flying Colors

List of Freely available programming books - What is the single most influential book every Programmers should read



#BlackOwned #BlackEntrepreneurs #BlackBuniness #AWSCertified #AWSCloudPractitioner #AWSCertification #AWSCLFC02 #CloudComputing #AWSStudyGuide #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AWSBasics #AWSCertified #AWSMachineLearning #AWSCertification #AWSSpecialty #MachineLearning #AWSStudyGuide #CloudComputing #DataScience #AWSCertified #AWSSolutionsArchitect #AWSArchitectAssociate #AWSCertification #AWSStudyGuide #CloudComputing #AWSArchitecture #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AzureFundamentals #AZ900 #MicrosoftAzure #ITCertification #CertificationPrep #StudyMaterials #TechLearning #MicrosoftCertified #AzureCertification #TechBooks

Top 1000 Canada Quiz and trivia: CANADA CITIZENSHIP TEST- HISTORY - GEOGRAPHY - GOVERNMENT- CULTURE - PEOPLE - LANGUAGES - TRAVEL - WILDLIFE - HOCKEY - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
zCanadian Quiz and Trivia, Canadian History, Citizenship Test, Geography, Wildlife, Secenries, Banff, Tourism

Top 1000 Africa Quiz and trivia: HISTORY - GEOGRAPHY - WILDLIFE - CULTURE - PEOPLE - LANGUAGES - TRAVEL - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
Africa Quiz, Africa Trivia, Quiz, African History, Geography, Wildlife, Culture

Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada.
Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada

Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA
Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA


Health Health, a science-based community to discuss health news and the coronavirus (COVID-19) pandemic

Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.

Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.

Reddit Sports Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, and leagues around the world.

Turn your dream into reality with Google Workspace: It’s free for the first 14 days.
Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes:
Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes: 96DRHDRA9J7GTN6 96DRHDRA9J7GTN6
63F733CLLY7R7MM
63F7D7CPD9XXUVT
63FLKQHWV3AEEE6
63JGLWWK36CP7WM
63KKR9EULQRR7VE
63KNY4N7VHCUA9R
63LDXXFYU6VXDG9
63MGNRCKXURAYWC
63NGNDVVXJP4N99
63P4G3ELRPADKQU
With Google Workspace, Get custom email @yourcompany, Work from anywhere; Easily scale up or down
Google gives you the tools you need to run your business like a pro. Set up custom email, share files securely online, video chat from any device, and more.
Google Workspace provides a platform, a common ground, for all our internal teams and operations to collaboratively support our primary business goal, which is to deliver quality information to our readers quickly.
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE
C37HCAQRVR7JTFK
C3AE76E7WATCTL9
C3C3RGUF9VW6LXE
C3D9LD4L736CALC
C3EQXV674DQ6PXP
C3G9M3JEHXM3XC7
C3GGR3H4TRHUD7L
C3LVUVC3LHKUEQK
C3PVGM4CHHPMWLE
C3QHQ763LWGTW4C
Even if you’re small, you want people to see you as a professional business. If you’re still growing, you need the building blocks to get you where you want to be. I’ve learned so much about business through Google Workspace—I can’t imagine working without it.
(Email us for more codes)