Skip to content

Add Performance Optimizations and Bug Fix for Import Browser#146

Open
cwcrystal8 wants to merge 4 commits into
datacommonsorg:mainfrom
cwcrystal8:master
Open

Add Performance Optimizations and Bug Fix for Import Browser#146
cwcrystal8 wants to merge 4 commits into
datacommonsorg:mainfrom
cwcrystal8:master

Conversation

@cwcrystal8
Copy link
Copy Markdown
Contributor

This PR has the following changes:

  • Performance optimizations
    • Wrote custom csvtojson method
    • Removed quadratic time method
    • Switch over to bulk api
    • Build timedata object as you iterate originally
    • Don’t make subject nodes for StatVarObservations
  • Bug fix for the select all/clear all button to update state properly
  • Updated tests for the back end changes
  • Fixed indentation

@cwcrystal8 cwcrystal8 marked this pull request as ready for review August 25, 2022 22:52
Comment on lines -46 to -71
test('testing fillTemplateFromRow', () => {
const parser = new ParseTmcf();
parser.csvIndex = 8;
const filledTemp =
parser.fillTemplateFromRow(TestStr.testTMCF2, TestStr.testCSV2[0]);
expect(filledTemp).toBe(TestStr.expectedFilledTemp0);

// testing multiple propValues that are comma separated
const template =
'Node: dcid:test1\npropLabel1: C:TestSet->Col1, C:TestSet->Col2';
const row = {
'Col1': 'dcid:propVal1',
'Col2': 'dcid:propVal2',
};
const expectedMCF =
'Node: dcid:test1\npropLabel1: dcid:propVal1, dcid:propVal2';
const filledTemp2 = parser.fillTemplateFromRow(template, row);
expect(filledTemp2).toBe(expectedMCF);
});

test('testing csvToMCF', () => {
const parser = new ParseTmcf();
const mcf = parser.csvToMcf(TestStr.testTMCF1, TestStr.testCSV1);
expect(mcf).toBe(TestStr.expectedMCF1);
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding: why are these tests being removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are test cases for functions that no longer exist in the rewritten parse-tmcf file. The functions were essentially duplicates of previous parsing logic and now that the whole process is rewritten to do everything in one pass, the tests are useless.

Comment on lines +193 to +195
headers: {
'Content-Type': 'application/json',
},
Copy link
Copy Markdown

@juliawu juliawu Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self, and the team: We'll need to add an API key to this, and all functions that use the v1 API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean:

a. we need to add an API key in this function and pass it when making the API call - if this is the case, we want to add a TODO in the code
b. mixer needs to enforce API key for this endpoint - if this is the case, please create a github issue for this in the mixer repository

Copy link
Copy Markdown

@juliawu juliawu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass, deferring approval to @beets or @chejennifer

Copy link
Copy Markdown
Contributor

@chejennifer chejennifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job finding these optimizations!

* Current row number of the csv file that is being parsed.
* @type {number}
*/
* Current row number of the csv file that is being parsed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert these formatting changes

[header: string]: string | number;
}

interface ParsedCsvRow {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, is it guaranteed that each csv row will only have one observation point?

i.e., are users not allowed to input a csv file where there are multiple observations in a single row like:
date,sv1,sv2
2020,1,2
2021,2,3
...

Or am I misunderstanding this data structure?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users are allowed to input a CSV file like that but it gets parsed accordingly later on, I think that's why it's called CsvRow and not DataPoint or ObservationPoint

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there can be multiple ParsedCsvRow objects for the same row? the naming is a little confusing in that case

}

interface ParsedCsv {
otherMcfs: string[];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just mcfs? what is the "other" referring to?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it refers to non-StatVarObservation nodes that may have been included in the CSV file, these nodes will have MCF strings generated for them and stored here to display in the Graph Explorer later on.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then maybe something like nonSvObMcfs or something more descriptive & leave a comment


interface ParsedTemplate {
dataEntities: string[];
otherEntities: string[];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "other" is not very descriptive, maybe "nonDataEntities" or something else that actually describes what type of entities these will be

* @param {CsvRow} row The json representation of the csv row.
* Each CsvRow element of the array represents one row of the csv.
* @return {ParsedCsvRow | null} an object containing the facet, date,
* value, and other additional properties
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of "other additional properties", mention how its the mcf representation of properties for the node or something along those lines

}
finalReturn['localNodes'] = finalReturn['localNodes'].concat(
tmcfOut['localNodes'],
const mcfs = parsedCsv.otherMcfs;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defining this constant also feels unnecessary since it's only used in line 144 where you can just use parsedCsv.otherMcfs

const parsedMcf = mcfParser.parseMcfStr(mcf as string);

if (parsedMcf['errMsgs'].length !== 0) {
finalReturn['errMsgs'] = finalReturn['errMsgs'].concat({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since finalReturn['errMsgs'] is just an array, we should be able to just push the additional element like finalReturn['errMsgs'].push({...});`?


type ValueObject = {
value: number | undefined;
mcf: string;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what this mcf is, can you add a comment?

`${dcid}`,
data.data.map(
(entity: NameResponse) =>
entity.values ? entity.values[0].value : entity.entity,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty arrays will evaluate to true, probably want to check length of entity.values before accessing the first element

body: JSON.stringify(data),
};
// response is {"data": [...list of entities...]}
// expected entity if dcid exists is {"entity": entity, "values":"[...]}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you've already defined the NameResponse type, you could just say response is {"data": NameResponse[]} and don't need this whole comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants