Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3922114
Add migration for new imported urls table
luis-pabon-tf Mar 19, 2026
a5641c4
Added model and fixed syntax
luis-pabon-tf Mar 19, 2026
06cb026
Imported URL columns are updated on save
luis-pabon-tf Mar 23, 2026
ee19458
Add relationship for imported works and save imported url to new rela…
luis-pabon-tf Mar 23, 2026
27519b2
Add test case for imported url autosave
luis-pabon-tf Mar 23, 2026
b72f2ea
Remove bad copypaste
luis-pabon-tf Mar 23, 2026
c49e84f
Bringing over backfill task from different implementation to test on …
luis-pabon-tf Mar 23, 2026
766791c
Use original url string not a symbol
luis-pabon-tf Mar 23, 2026
c09e6ff
Updated relation save
luis-pabon-tf Mar 23, 2026
95a9553
Update task to work with my implementation
luis-pabon-tf Mar 23, 2026
2f89e65
Merge remote-tracking branch 'upstream/master' into AO3-6978
luis-pabon-tf Mar 23, 2026
81385a1
Fix syntax on new test class
luis-pabon-tf Mar 23, 2026
46ebd6b
Add spacing to please Rubocop
luis-pabon-tf Mar 23, 2026
3d0d165
Fix relation reference on ImportedUrl
luis-pabon-tf Mar 24, 2026
50a0997
Fix the way to create a relation when making the ImportedUrl object
luis-pabon-tf Mar 24, 2026
49c693e
Fix loop in url import rake
luis-pabon-tf Mar 24, 2026
138cea1
Using unless instead of negative if
luis-pabon-tf Mar 25, 2026
d26a5ab
Fix spacing
luis-pabon-tf Mar 25, 2026
354809d
Update ImportedUrl test to compare to raw strings
luis-pabon-tf Apr 12, 2026
77c37a5
Add test case for work import url rake
luis-pabon-tf Apr 13, 2026
b694c8a
Fix work import url rake loop to skip duplicates
luis-pabon-tf Apr 13, 2026
a0716a0
Rubocop fixes
luis-pabon-tf Apr 14, 2026
df40d50
Extended limit of imported_urls columns that can be too long and adde…
luis-pabon-tf May 7, 2026
63f68b4
Fix typo on test case name
luis-pabon-tf May 8, 2026
efc7a52
Add exception catching and logging for import_url backfill
luis-pabon-tf May 8, 2026
f0eaf92
Merge remote-tracking branch 'upstream/master' into AO3-6978
luis-pabon-tf May 8, 2026
5b304f2
Fix Rubocop errors
luis-pabon-tf May 8, 2026
3c52794
Add rationale to the string limits on create_imported_urls migration
luis-pabon-tf May 10, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions db/migrate/20260318013837_create_imported_urls.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,16 @@ def change
t.string :minimal, null: false
t.string :minimal_no_protocol_no_www, null: false
t.string :no_www, null: false
t.string :with_www, null: false
# as of this migration, with_www is about 4 chars longer than the original string
# since the default string length is limited to 255, we need at least 259
# putting 300 to give a bit of extra wiggle room
t.string :with_www, null: false, limit: 300
Comment thread
luis-pabon-tf marked this conversation as resolved.
t.string :with_http, null: false
t.string :with_https, null: false
t.string :encoded, null: false
# encoded can be much larger than original if there's enough special characters in the string
# setting a limit of 2080 to match the limit used on abuse reports
# as that is the largest url limit currently on the project
t.string :encoded, null: false, limit: 2080
t.string :decoded, null: false

t.timestamps
Expand Down
11 changes: 9 additions & 2 deletions lib/tasks/work_import_urls.rake
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,24 @@ namespace :work_import_urls do
scope = Work.where("imported_from_url IS NOT NULL AND imported_from_url != ''")
total = scope.count
processed = 0
failed = 0

puts "Backfilling #{total} work_import_urls records..."

scope.find_each(batch_size: batch_size) do |work|
next if ImportedUrl.exists?(work_id: work.id)

work.imported_url = ImportedUrl.create(original: work.imported_from_url)
begin
work.imported_url = ImportedUrl.create(original: work.imported_from_url)
rescue StandardError => e
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update spec/lib/tasks/work_import_urls.rake_spec.rb‎ with a test that shows the output for the error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure a way to do that. With the new limits in place the error won't hit since the original url can't be larger than 255 already. The largest I've gotten on the encoded column has been around 600 chars.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay -- it looks like we don't test the output for fully successful runs, either, so we can live without it.

puts "------- Error on import for work_id #{work.id}: #{e.inspect}"
failed += 1
next
end
processed += 1
puts "Processed #{processed}/#{total}" if (processed % batch_size).zero?
end

puts "Done! Backfilled #{processed} records."
puts "Done! Backfilled #{processed} records, failed #{failed} records."
end
end
13 changes: 13 additions & 0 deletions spec/models/imported_url_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,17 @@
expect(url.encoded).to eq("http://www.trickster.org/llwyden/misc/cracked.html")
expect(url.decoded).to eq("http://www.trickster.org/llwyden/misc/cracked.html")
end

it "handles longer urls where the encoded column will be much larger gracefully" do
# default string column max is 255 so we create an url with exactly 255 chars
# and encoding that will extend the
repeated_string = "#() {}" * 38
long_url = "http://www.faikes.org/#{repeated_string}.html"

formatter = UrlFormatter.new(long_url)
url.original = formatter.original

expect { url.save }
.not_to raise_error
end
end
Loading