How does a database handle pagination?

How does a database handle pagination?

How does a database handle pagination?

How does a database handle pagination?

It doesn’t. First, a database is a collection of related data, so I assume you mean DBMS or database language.

Second, pagination is generally a function of the front-end and/or middleware, not the database layer.

But some database languages provide helpful facilities that aide in implementing pagination. For example, many SQL dialects provide LIMIT and OFFSET clauses that can be used to emit up to n rows starting at a given row number. I.e., a “page” of rows. If the query results are sorted via ORDER BY and are generally unchanged between successive invocations, then that can be used to implement pagination.

That may not be the most efficient or effective implementation, though.


How does a database handle pagination?

So how do you propose pagination should be done?

On context of web apps , let’s say there are 100 mn users. One cannot dump all the users in response.

Cache database query results in the middleware layer using Redis or similar and serve out pages of rows from that.

What if you have 30, 000 rows plus, do you fetch all of that from the database and cache in Redis?

I feel the most efficient solution is still offset and limit. It doesn’t make sense to use a database and then end up putting all of your data in Redis especially data that changes a lot. Redis is not for storing all of your data.


AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence (OpenAI, ChatGPT, Google Bard, Generative AI, Discriminative AI, xAI, LLMs, GPUs, Machine Learning, NLP, Promp Engineering)

If you have large data set, you should use offset and limit, getting only what is needed from the database into main memory (and maybe caching those in Redis) at any point in time is very efficient.

With 30,000 rows in a table, if offset/limit is the only viable or appropriate restriction, then that’s sometimes the way to go.

More often, there’s a much better way of restricting 30,000 rows via some search criteria that significantly reduces the displayed volume of rows — ideally to a single page or a few pages (which are appropriate to cache in Redis.)

It’s unlikely (though it does happen) that users really want to casually browse 30,000 rows, page by page. More often, they want this one record, or these small number of records.

If you are looking for an all-in-one solution to help you prepare for the AWS Cloud Practitioner Certification Exam, look no further than this AWS Cloud Practitioner CCP CLF-C02 book

 

Question: This is a general question that applies to MySQL, Oracle DB or whatever else might be out there.

I know for MySQL there is LIMIT offset,size; and for Oracle there is ‘ROW_NUMBER’ or something like that.

But when such ‘paginated’ queries are called back to back, does the database engine actually do the entire ‘select’ all over again and then retrieve a different subset of results each time? Or does it do the overall fetching of results only once, keeps the results in memory or something, and then serves subsets of results from it for subsequent queries based on offset and size?

If it does the full fetch every time, then it seems quite inefficient.

If it does full fetch only once, it must be ‘storing’ the query somewhere somehow, so that the next time that query comes in, it knows that it has already fetched all the data and just needs to extract next page from it. In that case, how will the database engine handle multiple threads? Two threads executing the same query?

something will be quick or slow without taking measurements, and complicate the code in advance to download 12 pages at once and cache them because “it seems to me that it will be faster”.

Answer: First of all, do not make assumptions in advance whether something will be quick or slow without taking measurements, and complicate the code in advance to download 12 pages at once and cache them because “it seems to me that it will be faster”.

YAGNI principle – the programmer should not add functionality until deemed necessary.
Do it in the simplest way (ordinary pagination of one page), measure how it works on production, if it is slow, then try a different method, if the speed is satisfactory, leave it as it is.


From my own practice – an application that retrieves data from a table containing about 80,000 records, the main table is joined with 4-5 additional lookup tables, the whole query is paginated, about 25-30 records per page, about 2500-3000 pages in total. Database is Oracle 12c, there are indexes on a few columns, queries are generated by Hibernate. Measurements on production system at the server side show that an average time (median – 50% percentile) of retrieving one page is about 300 ms. 95% percentile is less than 800 ms – this means that 95% of requests for retrieving a single page is less that 800ms, when we add a transfer time from the server to the user and a rendering time of about 0.5-1 seconds, the total time is less than 2 seconds. That’s enough, users are happy.


And some theory – see this answer to know what is purpose of Pagination pattern

  • Últimos Jogadores Atualizados — 27/02/2024
    by Tudo pelo Futebol (Database on Medium) on February 27, 2024 at 8:14 pm

    Continue reading on Medium »

  • Últimos Jugadores Actualizados — 27/02/2024
    by Todo por el Fútbol (Database on Medium) on February 27, 2024 at 8:14 pm

    Continue reading on Medium »

  • Last Updated Players — 02/27/2024
    by Everything for Football (Database on Medium) on February 27, 2024 at 8:14 pm

    Continue reading on Medium »

  • Exploring the Frontiers of Database Innovation
    by Faith Anthony-okoh (Database on Medium) on February 27, 2024 at 6:26 pm

    EXPLORING THE FRONTIERS OF DATABASE INNOVATIONContinue reading on Medium »

  • How to choose the Right OLAP Storage
    by Aleh Belausau (Database on Medium) on February 27, 2024 at 5:43 pm

    In the realm of OLAP storage solutions, organizations face the crucial task of selecting a platform that aligns seamlessly with their…Continue reading on Towards Data Engineering »

  • Jogadores Mais Vistos do Dia — 27/02/2024
    by Tudo pelo Futebol (Database on Medium) on February 27, 2024 at 5:20 pm

    Continue reading on Medium »

  • Jugadores Más Vistos del Día — 27/02/2024
    by Todo por el Fútbol (Database on Medium) on February 27, 2024 at 5:20 pm

    Continue reading on Medium »

  • Most Viewed Players of the Day — 02/27/2024
    by Everything for Football (Database on Medium) on February 27, 2024 at 5:20 pm

    Continue reading on Medium »

  • Working with Multiple Databases: A SQL Learning Series Guide. (15)
    by Tamanna shaikh (Database on Medium) on February 27, 2024 at 4:56 pm

    In this post, we will learn how to work with multiple databases using SQL. This is a useful skill for data analysts, DBA, developers, and…Continue reading on Medium »

  • Beyond Basic Queries: SQL Inception, the Power of Subqueries
    by B. Krouch (Database on Medium) on February 27, 2024 at 4:53 pm

    Throughout my SQL learning journey, becoming familiar with concepts such as data selection, record filtering, aggregation functions…Continue reading on Medium »

  • My select queries are taking 2 minutes long. HELP!
    by /u/Alexander_Chneerov (Database) on February 27, 2024 at 7:21 am

    I am using MySQL and have a database with one table with one attribute of VARCHAR(1024). within this table I have 700 million domain names. When I try to search to see if one exists with a query such as SELECT * FROM com_domains WHERE domain_name = “example.com” LIMIT 1 It take a minute to search "1 row in set (1 min 6.70 sec)" Is it possible to improve this? submitted by /u/Alexander_Chneerov [link] [comments]

  • Are there any distributed databases out there other than Aurora that uses witness replicas?
    by /u/the123saurav (Database) on February 27, 2024 at 5:31 am

    Was reading the AWS Aurora paper and they mention the notion of "full" and "tail" segments for a partition and how it aids in reducing tail latency while still giving high availability gurantees. Does anyone know of any open source database that does the same? submitted by /u/the123saurav [link] [comments]

  • Are there databases that can emit exactly which tables / rows changed during a transaction?
    by /u/T_N1ck (Database) on February 26, 2024 at 10:01 pm

    Hello all, I hope this is the right place to ask this. In my day to day work, I use the pattern of previewing data changes quite a bit before applying them. I do this by splitting database changes into a plan & execute step. The planning has all the logic - what rows to add, update and delete and the execute is rather simple. It's quite some work to do this and I'm wondering as most databases already have transactions, where one can apply changes and then either commit or rollback. Why can none of the popular ones e.g. postgres or mysql also emit a data structure that lists exactly what changes where done to which table and rows, the information should already be there. If you need more details what exactly I mean, I wrote about it here, asking the same question in the end - without an answer yet. submitted by /u/T_N1ck [link] [comments]

  • DB/Distributed Systems Podcasts
    by /u/sfskiteam (Database) on February 26, 2024 at 5:37 pm

    What are the best / most popular DB and Distributed Systems podcasts? Looking to compile some resources to help some new grads on my team get more context on how the rest of the industry operates submitted by /u/sfskiteam [link] [comments]

  • Best practices for database benchmarking
    by /u/wheeler1432 (Database) on February 26, 2024 at 2:17 pm

    submitted by /u/wheeler1432 [link] [comments]

  • CMS vs Database
    by /u/Averroes2 (Database) on February 26, 2024 at 2:21 am

    Can you make a CMS into a database or make a database into a CMS? submitted by /u/Averroes2 [link] [comments]

  • Unique constraint error in Oracle
    by /u/OreoWaffle96 (Database) on February 25, 2024 at 9:59 am

    I tried to create a PL/SQL program to insert values in a table 3 times.. It gave me the unique constraint error I've attached the pictures above I searched everywhere but couldn't find a solution I believe the unique index is due to the primary key , though I'm using the primary key in the program and I'm not inserting any duplicate values still the error is arising Please help me with this submitted by /u/OreoWaffle96 [link] [comments]

  • Database Selection
    by /u/sdas99 (Database) on February 25, 2024 at 3:39 am

    I'm developing a fairly simple program and am running into scaling constraints with my current approach: 1) Data is continuously saved to a .csv (500k rows/day with 4 columns; ~25MB/day) 2) .csv is manually uploaded to a server 3) index.html webpage queries the .csv to create graphs The problem is the .csv file is getting way too big as the dataset grows. I'm hoping to find a way to scale in the least complicated way possible and was curious if folks had advice on switching to MySQL, PostgreSQL, or something else. submitted by /u/sdas99 [link] [comments]

  • Managing encrypted DB push notifications with only IDs
    by /u/CoroBuddy (Database) on February 24, 2024 at 7:46 am

    Hi everybody, ​ in our company, we have the following problem. We are switching to an encryption system that stores push notifications encrypted in our DB until they become valid for transfer to the clients. Our security recommended having all the push notifications encrypted in our DB and only organizing them via the IDs of the notification and user. So when a push notification is requested by a device/ 3rd party BE for the user, then the push notification service detects how many devices are registered creates a message for each of the devices, and encrypts this message with a specific public key of the device. The notifications can have the same content for multiple devices if one user registers multiple devices for the push notification system. Depending on the context the notification type is important to find out which of the services initiated the push notification creation, but this needs to be also encrypted. The problem is that we can't identify the types of some notifications and therefore can't call certain "bulk" deletes. Scenario 1: As a user, I want to delete/edit all my notifications of a certain type for a certain service, but I want to keep the other notifications. Scenario 2: As a user, I want to delete/edit specific notification content for multiple devices. ​ When we were scrapping our heads around, I thought "ok, this could be a job for Reddit". Do you have any idea how we could manage this? submitted by /u/CoroBuddy [link] [comments]

  • Help with messagestore.db over 10 years old?
    by /u/Hoollyweeds (Database) on February 24, 2024 at 1:09 am

    Hey I basically have 2 files (text chats) that i have kept over the years in hope to gain access to them one day, so far no luck, i have never tried asking around here before so i figured why not. I have no idea how anything of this works so i don't even know if its possible to open them anymore. submitted by /u/Hoollyweeds [link] [comments]

  • 4NF Normalization Question
    by /u/LockedPockets23 (Database) on February 24, 2024 at 12:53 am

    Noob here Is this correct? https://preview.redd.it/amg79amukfkc1.png?width=719&format=png&auto=webp&s=4e96c9f351702c304d38873f550f786b776ecc0c How do you go about selecting which candidate key to make your primary key? In the example above, what would make {EmployeeName, ProjectName} a better primary key than {ProjectName, Date} or vice versa? submitted by /u/LockedPockets23 [link] [comments]

  • Time series database
    by /u/Either_Vermicelli_82 (Database) on February 23, 2024 at 8:17 pm

    We are a small research groups at a university and are looking into storing information from our lab equipment into a open source local time series database. The middle ware will be a mqtt broker to ensure flexibility if we ever have to switch systems at some point… Now we had a test running on influx v2 and it works but we would like to keep this system running for a fair few years (and updating) and seeing where influx v3 is going…. The setup will likely be a single database where all our equipment are streaming information to with different tags to distinguish project related measurements. Now i was wondering what is out there that has proven to be trust worthy with a stable query language and a little bit user friendly at least? submitted by /u/Either_Vermicelli_82 [link] [comments]

  • MongoDB Enterprise Advanced option
    by /u/Aztreix (Database) on February 23, 2024 at 6:40 pm

    I am considering MongoDB as the database for my application and evaluating the cost of Atlas vs a self-hosted enterprise advanced option. While Atlas cost is listed on mongo db pricing page , enterprise option is not. Anyone has any experience to share on cost of the self-hosted option. Any suggestion/opinions on costing factor between the two. Managed service would mean lesser setup and maintenance overhead, but would like to know much difference are we shelling out. [Expecting 3 servers 8 core/16GB , 1 TB storage * 2] submitted by /u/Aztreix [link] [comments]

  • Restoring incremental backups in Postgres
    by /u/Rareness_ (Database) on February 23, 2024 at 4:53 pm

    Hey guys, I have restored basic backup successfully of Postgres. However, after that I have tried to restore incremental backup of Postgres(it was a couple of write to test table), and it had a couple of WAL files and some WAL files with .backup at the end. I have tried to put them into my pg_wal archive directory, that I'm using in the following restore command at postgresql.conf: wal_level = replica max_connections = 100 archive_mode = on archive_command = 'test ! -f /var/lib/postgresql/16/main/pg_wal/wal_archive/%f && cp %p /var/lib/postgresql/16/main/pg_wal/wal_archive/%f' restore_command = 'cp /var/lib/postgresql/16/pg_wal/wal_archive/%f %p After putting them into /var/lib/postgresql/16/pg_wal/wal_archive I have tried to restart a couple times postgresql, however my new added WAL files still was not applied. Worth to mention that it seems like restarts did not trigger recovery mode, as restore_command usually executed from that mode. ​ Any help or advices is greatly appreciated! submitted by /u/Rareness_ [link] [comments]

  • Transitioning Database Access Modes: Moving from Exclusive to Shared Access in Advantage Database Server.
    by /u/xxcriticxx (Database) on February 23, 2024 at 3:57 pm

    I'm currently utilizing Advantage Database Server 11.10 on a Windows 10 platform with exclusive access rights. My goal is to enable access to a single database from multiple locations. I'm seeking guidance on transitioning the database access from exclusive to shared. I recall that earlier versions offered a registry fix for this issue. Could someone kindly direct me to the appropriate solution? submitted by /u/xxcriticxx [link] [comments]

  • History of Database
    by /u/Straight-Rule-1299 (Database) on February 23, 2024 at 3:49 am

    What would be some must-read research papers for database which show the progression of the technology? submitted by /u/Straight-Rule-1299 [link] [comments]

  • Need a good EAN database
    by /u/Cozy_Kozyge (Database) on February 22, 2024 at 10:53 pm

    Hey, im working on a product scan app, and i dont know where to look for a EAN barcode, database with an api that has all info like producer name, product name and etc. Can anybody help me with this? submitted by /u/Cozy_Kozyge [link] [comments]

  • Separate tables based on business goals
    by /u/8483 (Database) on February 22, 2024 at 5:49 am

    Is it worth to split tables bases solely on the business goal? NOTE: The tables all have identical columns. 1. One table One orders table, with a orderType column to differentiate them. 2. Two tables TWO tables called sales_orders and purchase_orders. 3. Many tables How about even more tables like sales_quotes, sales_orders, sales_backorders, purchase_requisitions, purchase_orders... I believe approach 3 is pushing it too far, while approach 1 is too much mental load to keep filtering 2 very distinct business goals. Approach 2 seems like just enough separation, but not sure if it will haunt me later. submitted by /u/8483 [link] [comments]

  • Need help with calling file.
    by /u/DetailedLogMessage (Database) on February 22, 2024 at 3:30 am

    Hi guys, I'm currently working with oracle database and I received multiple ".sql" files from many people (40) and I'm always responsible for assembling calling files with all SQL files that need to be executed. I want to automate it, but they do not allow me to use any tool, or compile anything. It must be a script that runs autonomously, from any folder, I've done something similar in there pay using shell script, but it wasn't good at all, too fidly. Can anyone give me suggestions? submitted by /u/DetailedLogMessage [link] [comments]

  • How do you scale an e-commerce website?
    by /u/cakemachines (Database) on February 21, 2024 at 10:52 pm

    You can't add shards according to the last digit of the id, because when you search for a product, you would have to query from all sharded databases, so how is sharding done for an ecommerce website? submitted by /u/cakemachines [link] [comments]

  • Time series for not metrics
    by /u/surpyc (Database) on February 21, 2024 at 5:47 pm

    I am searching for a timeseries DB where we will save data (SQL Format) and not just metrics or logs. And search without time-series option if is possible. One solution is Elasticsearch but i don't think is the correct solution for this. I found VictoriaMetris,Mimir, and Timescale but not sure if the support this or if is the correct job for this. Does anyone use anything except Elasticsearch for this? submitted by /u/surpyc [link] [comments]

  • Looking for a fast embeddable database that supports highly concurrent writing
    by /u/phaethornis-idalie (Database) on February 21, 2024 at 1:11 pm

    I have a rather unique use case for a database. Going into it here would take way too long, but basically: I'm parsing a VERY large XML file. I want to use multiple threads for it to take a manageable amount of time. I can guarantee that writes from different threads will not modify the same data. I need very few features of a conventional database. This is going to be used for a visualization, so risk of data loss, SQL, etc, are unimportant. It just needs to hold data in a structured way and let me query it with reasonable ease. Ideally, I want something with Rust bindings to minimize the amount of work I have to redo, and I need it to be fast. I was looking at RocksDB, but I was unable to evaluate from their documentation how applicable it would be. Any help would be hugely appreciated! submitted by /u/phaethornis-idalie [link] [comments]

  • When you should NOT use MongoDB?
    by /u/Lopsided-Variety1530 (Database) on February 21, 2024 at 11:46 am

    submitted by /u/Lopsided-Variety1530 [link] [comments]

  • How does Cassandra achieve strong consistency with failed writes if there is no rollback and no two phase commit?
    by /u/Rough_Source_123 (Database) on February 20, 2024 at 11:51 pm

    Confusion on how cassandra claim strong consistency with no rollback and no two phase commit if we have the following scenario Write of key1, value1 is requested with consistency level of QUORUM but only N replica responded success where N < QUORUM What happen to those N nodes that just updated key1? Do they get rollback? In cassandra documentation https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlTransactionsDiffer.html if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will replicate the write to all nodes in the cluster and wait for acknowledgement from two nodes. If the write fails on one of the nodes but succeeds on the other, Cassandra reports a failure to replicate the write on that node. However, the replicated write that succeeds on the other node is not automatically rolled back. It mentions if some write failed and did not satisfied consistency level, coordinator will return failure, but the data will persist on nodes that have write succeeded But this means strong consistency can never be achieved even if R + W > number of replica as official. documentation suggested https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutDataConsistency.html Consider the following situation replica number = 5 consistency level write = 3 consistency level read = 3 If a write is attempted , but one nodes succeeds , coordinator will return failure, but that one node will not rollback, so you need a consistency level of 5 in order to achieve strong consistentcy The documentation has conflicting information What am I getting wrong here? submitted by /u/Rough_Source_123 [link] [comments]

  • Translating extended SQL syntax into relational algebra
    by /u/8u3b87r7ot (Database) on February 20, 2024 at 5:41 pm

    I've been going through the CMU courses lately and wanted to experiment writing a basic optimizer. I have a parsed representation of my query and I want to translate it into a relational algebra expression, which can later be optimized into a physical operation tree. I managed to translate basic operations (e.g. WHERE predicates into selections, SELECT items into selections) but I'm stuck with 'extended' SQL syntax such as common table expressions and lateral joins. How do databases typically implement those? Is it even possible to use regular algebra trees for this or should I use bespoke data structures? In particular: for CTEs, my intuition would be to inline each reference but that would force the optimizer to run multiple times on the same CTE? for lateral joins, considering the following example: SELECT * FROM (SELECT 1 id) A, ( (SELECT 2) B JOIN LATERAL (SELECT A.id) C ON TRUE ) D; A tree would be └── NAT. JOIN ├── A └── LATERAL JOIN (D) ├── B └── C how can C reference A's columns given that A is higher in the tree? submitted by /u/8u3b87r7ot [link] [comments]

error: Content is protected !!