1.2k questions

1.4k answers

361 comments

339 users

Categories

Sidebar
0 votes
111 views
by mathew-m-3605 (120 points)
Hello All!

 

I have a channel making a SELECT query from a MSSQL database then sending to a PostgreSQL database using an INSERT query.  The process works well enough but it is very slow.  When using the export/import tools built into SSMS and pgAdmin, I can export the data to CSV then import into PostgreSQL in under 30 minutes.  The Qvera channel takes 8+ hours.  It seems the INSERT is the bottleneck but I'm not 100% sure about that.  Is there something I am missing in Qvera that can help speed things up?

 

INSERT INTO allergy.cus_shc_patientvisitprocsagg_1(

patientid, ticketnumber, patientvisitid, visitdate, patientvisitprocsid, dateofservice, currentinsurancecarrierid, currentinsurancecarriername, currentinsurancecarrierlistname, primaryinsurancecarrierid, primaryinsurancecarriername, primarycarrierlistname, chargecode, cptcode, totalcharge, totalpvpbalance, insurancebalance, paientbalance, totaladjustment, insuranceadjustment, patientadjustment, totalpayment, insurancepayment, patientpayment, voidchargesdetail, appiaaccountdetails, baddebtaccountdetails, denialcodes, visitdescription, lastpaymentdate, collecttype, lastmodified)

VALUES ('{m:PatientId}', '{m:TicketNumber}', '{m:PatientVisitId}', '{m:VisitDate}', '{m:PatientVisitProcsId}', '{m:DateOfService}', '{m:CurrentInsuranceCarrierID}', '{m:CurrentInsuranceCarrierName}', '{m:CurrentInsuranceCarrierListName}', '{m:PrimaryInsuranceCarrierID}', '{m:PrimaryInsuranceCarrierName}', '{m:PrimaryCarrierListName}', '{m:ChargeCode}', '{m:CPTCode}', '{m:TotalCharge}', '{m:TotalPVPBalance}', '{m:InsuranceBalance}', '{m:PaientBalance}', '{m:TotalAdjustment}', '{m:InsuranceAdjustment}', '{m:PatientAdjustment}', '{m:TotalPayment}', '{m:InsurancePayment}', '{m:PatientPayment}', '{m:VoidChargesDetail}', '{m:ApPiaAccountDetails}', '{m:BadDebtAccountDetails}', '{m:DenialCodes}', '{m:VisitDescription}', '{m:LastPaymentDate}', '{m:CollectType}', '{m:LastModified}');

 

Thanks for the help!

1 Answer

0 votes

Short Answer:

To improve the performance of the channel, the best approach is to rework the channel so that it reads the records from the source system in batches, and then do the insert into the destination server as batches.

This will reduce the number of transactions required to insert all of the data from the source system into the destination system.

The batch size needs to be something that would allow all of the records to be inserted without exceeding the query size limit. So if the database query size limit is 64 M, and each record from the source system could be up to 1 M in size, then the batch limit would be 64. However, if the record size is 500 K, then you could set the batch size to 100. This can be changed on the Source Node, '# of rows to include in each group'.

Here is a sample of code to loop through all the batches and insert into the database:

var query = new java.lang.StringBuilder();
var template = "('{alt:PatientId}', '{alt:TicketNumber}', '{alt:PatientVisitId}')";

query.append('INSERT INTO client.cus_shc_patientvisitprocsagg_1\n');
query.append('(patientid, ticketnumber, patientvisitid)\n');
query.append(' VALUES ');

var rowCount = message.getRowCount();
for (var i = 1; i <= rowCount; i++) {
   if (i > 1) {
      query.append(',\n');
   }
   var record = message.getNode('*[0]') + '\n';
   record = record + message.getNode('*[' + i + ']');
   
   var csvRecord = qie.parseCSVString(record, true, '"', ',', true, 'UTF-8');
   
   var insertRecord = qie.evaluateTemplate(template, null, null, false, null, csvRecord);
   
   query.append(insertRecord);
}

qie.warn(query.toString());


qie.doUpdateQuery('dataBaseName', query.toString(), false)


Long Explanation:

When inserting records into a database, the system needs to start a transaction, and then after the insert, it closes the transaction. This means that there are 3 interactions with the database for every statement that updates data. When processing a large number of records and persisting the data to the database, the most common approach is to do the insert for each record. This approach works and is used very frequently.

To answer the question on why it is so much slower than doing an import directly into the database, it is because the import process has two advantages:
1) To start and end the transaction, it is on the same server, so network latency for these interactions are eliminated completely.
2) The import process does not add the records 1 at a time, but in batches. The batch size is determined by the database itself.

This means if I am importing 1 million records into the database, using the standard approach of inserting one record per query, there are 3 million interactions with the database. Each interaction has network latency associated with it and can take a much longer time that importing directly to the databse.

If you import the same 1 million records directly into the database and it is batching 100 messages per transaction, then it is actually only doing 300,000 interactions with the database. This is significanly fewer transactions due to the batching.

by ben-s-7515 (13.0k points)
edited by amanda-w-3695
...