In information systems design, engineers have many options when it comes to database management systems. Relational database management systems (RDBMS) are very common. And since 2010, document databases, caching databases, and NoSQL databases designed to scale to very large sizes, provide other options for specific use cases. For example, companies, such as Google, use NoSQL databases for their search engine.
While some of the topics in this article may apply to both types of systems, this article primarily focuses on relational database management best practices that make an enterprise database easier to maintain and faster overtime. Managing large enterprise databases, of course, requires more expertise, credentials, and experience than what is provided in this high-level overview.
Security
Malicious hackers or unscrupulous workers could always gain access to an organization’s database and use it in unauthorized ways. And IT departments have many ways of securing access to their database. To prevent hackers from accessing a business’s internal systems and making a copy of their production database, the IT department should try to reduce their attack surface. This means auditing switches for vulnerable systems exposed to the internet, or using DMZs for on-premise websites, to block unauthorized access to internal networks.
For direct database access, create strong passwords stored in a password manager. Most DBMSs today offer the ability to encrypt fields to protect sensitive information, to prevent unauthorized individuals from viewing the contents of those fields. When hosting a database in the cloud, use a key management service to securely store connection strings and configuration information, instead of storing it in a configuration file with the application. And consider using multifactor authentication (MFA) when using an online password manager or cloud provider.
When granting access to a database, follow the principle of least privilege and only assign the permissions the staff needs to perform their work. In other words, if they don’t need the privilege of making schema changes, then don’t assign them that permission.
Documentation
Document the database schema, maintenance schedules, triggers and functions, and other related items. This helps management to distribute information about practices, policies, and procedures that keep the IT staff informed. Document what systems and software depend on that database and who should have access to that database.
Document a disaster recovery plan in the event the database needs to be recovered. That disaster recovery plan should include instructions for what to do if the host goes offline or becomes inaccessible.
Document change management procedures to track changes to the database and schema. This may be useful if a recent change created a bug. Document procedures for testing and deploying changes to the DBMS and schema.
When making changes to the database schema, a common practice among database administrators (DBAs) is to use data definition language (DDL) and data markup language (DML), which is self-documenting.
And document any business and regulatory requirements, such as record retention policies and access authorization.
Managing Schema
Follow common naming conventions for databases, tables, columns, and indexes to improve maintainability. Use prefixes or suffixes to add further descriptive details. In my own databases, lookup tables or associative entities have the word “map” in them to distinguish them from other tables.
Know how to normalize database tables so the database is extendable (e.g., using 1st normal form, 2nd normal form, 3rd normal form, and so on…). Anytime column values could have more than one entry, move them to a separate table so the data is more extendable. For example, an email field could be put in a user record. However, if management later requests that users should be allowed to store an indefinite number of email addresses, those email addresses should be moved into a separate table. And a one-to-many relationship would be used to access those multiple email addresses.
Fixing Data
Occasionally a data fix may be required because of a mistake or junk data in the database. Developing a standard process for fixing data reduces the possibility of mistakes. How that is accomplished largely depends on the size and impact of the database. If it’s purely an internal database for an internal business application, running a simple fix may be enough. Of course, test the script before running it on live data to avoid accidentally modifying the production database in unexpected ways. But if it’s a database used by millions of customers, then a more structured process may be required. For example, a validation script may be required to test the fix, in addition to a script that reverses the action in the event the fix produces unexpected side effects.
Testing
Test database changes, whether it is a change to the schema or the data, before applying them to a production database. DDL and DML scripting is particularly useful for this. This also requires a test database, or a backup database that is modified for testing, to accurately represent the production database.
Microsoft’s Entity Framework provides the ability to perform “code first” schema changes, which is ideal for small applications that need to create their own database. But this approach could prove to be precarious when allowing developers, who are not DBAs, to make schema changes to large, complex, production databases.
Transactions
When needing to update many tables in a single operation, use transactions to prevent junk data from being saved in the database if an error occurs. Suppose an employee record is being saved. And that requires an update to five tables. The first three tables are successfully updated. But the fourth table produces an error and the operation stops. Junk data now exists in the database. Transactions, in ACID compliant databases, will roll-back those previous updates to prevent that junk data from entering the database to ensure the integrity of the data.
Parameterize Queries
Without parameterization, which is usually done automatically by object-relational mappers, certain characters may be “escaped” and replaced with other characters. The database may also be less secure and more likely to be exploited or stolen as a result of SQL injection attacks.
Stored Procedures
Stored procedures and database functions are useful for wrapping complex queries and updates in transactions. This may be useful for breaking up large complex joins that could slow the performance of the database. And try to reduce the complexity of stored procedures to make them simple and efficient. If stored procedures are too complex, not only is it very difficult to manage and modify without introducing new bugs, DBMSs will throw errors (i.e., “Error: This stored procedure is too complex. Consider simplifying it.”). This also helps to make them readable and more efficient.
Optimize Dataflows
Only pull only the information required for a request. However, ensure that enough data is pulled to complete the lifecycle of the request. What this means is, don’t just pull an ID and then two seconds later pull more information based on that ID. If it’s a small amount of information (e.g., a record), then pull all the information at once. This allows for more efficient dataflows.
Data Quality
Implement data quality controls to ensure the data going into the database is accurate. Several ways of doing this exist. Designing the schema to only allow certain data types, and certain amounts of data, is one way of ensuring that. For example, when storing a number, make the database column numeric, not a string. This may prevent datatype conversion errors that may happen as a result of “boxing-and-unboxing” strings into numbers. Doing that increases the chances of errors, such as: “Error converting data type nvarchar to numeric.” And the application developer or DBA may have no idea what record in the database triggered that error, which could lead to a massive waste of time trying to hunt down that mysterious record among the millions of records. In the application, include validation controls, and have validation checks on the data, to ensure the data going into the database conforms to those datatypes.
Auditing
Regularly review the database and schema to see if improvements may be made. It may be possible to include better database optimizations, better security, emerging standards, or compliance with rules and regulations. And review backups and recovery procedures to ensure that they work as expected by testing those backups.
Monitoring
Monitoring and regularly reviewing database performance, size, and resource consumption, may prevent unexpected problems, for example, when running out of system resources. Monitoring software or running routine scripts to generate resource consumption statistics, could be instrumental in alerting staff so they are able to detect and respond to problems. Monitor database logs to identify unusual or suspicious activity, such as someone trying to hack into the systems. Monitoring also helps to identify regular maintenance that needs to be performed on a regular basis to keep the database size smaller, faster, and easier to back up and recover.
Performance Tuning & Optimization
Performance tuning is necessary for high transaction systems to ensure they are highly available and scalable. Try to optimize slow and inefficient queries that may put excessive burdens on the DBMS and resources and design optimal dataflows.
Expensive and complex “Where” clauses may be eliminated by restructuring how information is gathered when the record is originally saved. Don’t write code that calls the DBMS repeatedly within a “foreach loop” or “while loop.” Instead, pass an array to the DBMS, or write a stored procedure, and have the DBMS do that.
Also don’t allow developers to run long-running, or indefinite queries, and then try to loop through the results of those queries. That will lead to database timeouts and network timeouts. And may result in deadlocked transactions where multiple users are trying to access the same resources. Microsoft SQL server, for example, will produce errors:
“Transaction (Process ID 66) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.”
“Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.”
Use in-memory caching, or database caching, to cache the results of queries so that they don’t have to be repeatedly pulled from the database. This speeds-up access to information and reduces load on the DBMS, which allows the DBMS to respond to a greater number of requests with fewer problems.
Optimizing Performance and Storage
When storing files or images, consider storing those outside the database in storage, such as a CDN. Web browsers will cache images that have a web address so that it does not have to be retrieved every time. However, when storing the image in a database, the database size will grow substantially, and that image will have to be pulled from the DBMS every time the record is pulled. And the load-time and performance will suffer. If encryption is necessary for files that contain sensitive information, a backend service could encrypt and decrypt those files.
Data Warehouses
Because databases continue to grow overtime, it may be a good idea to warehouse old records that are no longer in use on a regular (e.g., annual) basis. This means moving those records to an inactive read-only replica of the database. This reduces the size of the database and improves the performance. Otherwise, loading records will get slower overtime and analyzing, and generating complex reports from, the data will become increasingly more expensive.
How much information to retain depends on the organization’s data retention policies as well as any laws or regulations that may apply to the organization’s industry. For example, user records or financial transactions may be needed for auditing or accounting purposes. Or, organizations operating in the healthcare industry may be required to comply with laws, such as HIPAA, that require sufficient data protection, privacy, and data retention policies. Archiving old data into data warehouses on lower-cost storage systems, or purging unnecessary records, will improve the performance of the database and make it easier to manage overtime.
Precompute (or pre-query) historical data to improve performance when generating complex reports. Historical data does not need to be recomputed or re-queried because it never changes. What this means is, expensive queries on old data to generate numbers for financial analysis are unnecessary. Instead, precompute those numbers, store those results in a record, and add those results to the most recent data for fast and responsive retrieval. As an example, when averaging sales, compute that average on the historical data one time. Then average those results into the new data, which makes it an extremely fast and simple computation. This is especially useful for dashboards and business intelligence (BI) systems that make lots of expensive queries and computations across millions of records and decades of time. Offloading this process to a service that runs an update on a regular interval (e.g., weekly), or to check for discrepancies in the data, may also help. Alternatively, computations could be performed when saving the data, so those computations don’t have to be performed again by complex SQL queries.
Scalability
Large systems require some capacity planning to ensure the systems are able to handle increasing load as they continue to grow. This requires a number of considerations, such as storage, replicas, and connection volume. These problems may be resolved with database clustering, eventually consistent databases, and read only views that speed up read time.
High availability features, like replication and failover, helps to ensure database availability in the event of a system failure. With clustering, or the use of eventually consistent databases, failing-over to a different database keeps the application resilient in the event the primary database becomes inaccessible because of a network outage, systems failure, or other unexpected event. And perform load testing, and stress testing, to ensure that the database remains responsive when under heavy load.
Separating data from different domains into different databases may also distribute the load across systems and improve scalability and performance. And a number of ways of doing that exist, by clustering, giving individual instances of the application its own database, and by sharding or partitioning, where a large database is partitioned into smaller databases. It’s also possible to do database-per-app instances, which significantly improves performance, backups, and recoverability.
Scheduled Maintenance
Databases need to be maintained as they grow. And typically, maintenance schedules are created for regular maintenance. Downtime may need to be scheduled during off-peak hours to apply patches and software updates to reduce the possibility of bugs and security vulnerabilities. And these maintenance windows need to be communicated to impacted users so they are not taken by surprise.
Automating these maintenance tasks, such backups, performance tuning, reindexing, archiving, and purging saves time and reduces the risk of human error. These maintenance tasks are necessary to prevent the database from becoming unbearably slow due to fragmentation.
Disaster recovery testing should also be part of scheduled maintenance. Changes are regularly made to databases, environments, and configuration. And those changes may break the disaster recovery plan. So, by testing the backups on a regular schedule, such every quarter, biannually, or annually gives the organization a better chance of a speedy recovery in a disaster scenario. Disaster recovery automation may also be implemented so that if a site goes down, the systems will failover to another site to minimize downtime. This may also require a disaster recovery site to failover to.