.NET Megathread 3.5: await GetGoodPosts()

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > .NET Megathread 3.5: await GetGoodPosts()

«‹›3 »

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Hey guys, nice shiny new thread. Let's get it dirty with some idiot posting.

The problem:
I'm using StringBuilder to concatenate a whole biggo bunch of strings, converting that StringBuilder.tostring, and then writing the resultant string to an XML file. I'm sure many of you know what's coming next since I said "biggo" and "StringBuilder"; that's right, the dreaded 'System.OutOfMemoryException' caused by StringBuilder over-allocating contiguous memory.

So the first question is obviously, any sneaky ways around this? The file is about 94mb raw text, but it can also fail randomly on smaller files (around 50mb), so 94mb is not a 'magic number' to stay under by any means. I've tried assigning double that to the StringBuilder when instantiating, but all that does is give me the memory error right up front at variable declaration instead of at the end when trying to convert to string. Based on what I can figure out it's a Bad Idea to even use StringBuilder this way anyway, and most of the workarounds are pretty dodgy.

So assuming the answer to the first question is "no", my second question is advice on chunking up this monster. Right now things are structured like so (this is not real code):

code:

Protected Sub Page_Load() Handles Me.Load
 dsThinger = Big database call to grab about 80,000 rows

 MakeXML(dsThinger.Tables(0))
End Sub

Protected Sub MakeXML(ByVal dtThinger As DataTable)
 Dim sbThinger as New StringBuilder

 sbThinger.AppendLine("<?xml version=""1.0"" ?>") 'etc start tags

 for each drThinger as datarow in dtThinger
   'About 2000 lines of if/then, case statements, value lookups, and translations, most of which look like:
  If drThinger("value")="other value" then
   sbThinger.AppendLine("<Thinger>whatever value=othervalue means</Thinger>")
  End If
   'Repeat 80,000 times
 Next
 
 sbThinger.AppendLine("stuff") 'etc end tags
 PostXMLDocument(sbThinger.ToString) '<-- Failure Point 1
End Sub

Protected Sub PostXMLDocument(ByVal strThinger as String)
 Dim xmlDoc As New XmlDocument
 Dim filePath As String = "c:\Feeds\Outbox\" & Left(System.Guid.NewGuid.ToString, 8) & ".xml"
 xmlDoc.LoadXml(strThinger) '<-- Failure Point 2, sometimes
 xmlDoc.Save(filePath)
 xmlDoc = Nothing
 xml = Nothing

 'API calls, etc. to publish the XML data
End Sub

One of the things I tried is having PostXMLDocument accept StringBuilder as a variable directly, but all that does is move the point of failure to Failure Point 2, because LoadXml can't overload to use a StringBuilder as input, this being kind of a dumb thing to try in the first place.

The obvious chunking candidate is the big SQL query, but that's kind of a pain because A)it's not my query and B)it's reused to generate OTHER files. However, on the other hand, I can't chunk the XML itself too much, because the publisher only accepts (x) amount of connections at once, dumps the ones over that, and this is relatively timely information that has to be generated twice a day.

The last thing I thought of is to chunk the stringbuilder itself before converting to string (and being able to go xml = string1 & string2 & string3), but by the time the stringbuilder is 'done' it's too big to modify since anything I've found to modify it either implicitly converts it to string, or explicitly requires it done.

I'm not looking to you guys to 'solve' the stringbuilder memory problem; I'm fully aware it's the result of bad design on my part. More trying to pick brains for possible workarounds.

EDIT-I also looked into splitting the datatable itself and then passing the table chunks in MakeXML, but the ways I've seen to do that seem pretty inefficient: http://stackoverflow.com/questions/8132208/c-how-to-partitioning-or-splitting-datatable

Scaramouche fucked around with this message at 03:55 on Jun 27, 2014

# ¿ Jun 27, 2014 03:40

Adbot: ADBOT LOVES YOU

# ¿ May 2, 2024 00:56

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

hirvox posted:

If you are certain that your code will output valid XML, you should just write each line to the file as soon as you have generated each snippet. That way you'll only need to have one copy of the data (the DataTable) in memory at any time. The way you're doing it now you'll have at least four copies: The DataTable, the StringBuilder, the String itself and finally the XmlDocument.

ljw1004 posted:

I think you should change it to using streams throughout.

Thanks guys, that's basically what I did:

code:

Protected Sub MakeXML2(dtThinger as Datatable)

Dim strFileName As String = "c:\feeds\outbox\" & Left(System.Guid.NewGuid.ToString, 8) & "_POST_PRODUCT_DATA_" & ".xml"
Dim fsX As New FileStream(strFileName, FileMode.Create)

Using sbThinger As New StreamWriter(fsX)
 sbThinger.WriteLine("<?xml version=""1.0"" ?>") ' header etc.
 For Each drThinger As DataRow In dtThinger.Rows
  If drThinger("value")="other value" then
   sbThinger.WriteLine("<Thinger>whatever value=othervalue means</Thinger>")
  End If
 Next
 sbThinger.WriteLine("stuff") ' footer etc.
End Using

PostXMLDocument2(strFilename)

End Sub

Protected Sub PostXMLDocument2
 'Don't load/save xml file from string anymore, go straight to existing file

 Dim buffer As Byte() = File.ReadAllBytes(strFileName)
 'connect, authenticate, upload, etc.
End Sub

So far, works fine.
Pros:
- Entire string is never in memory at once while building it
- Not passing string between subs any more; the only time it's fully loaded is when I'm getting it from drive
- Generated File size is actually about 15% smaller for some reason (I think from white space introduced by LoadXML/SaveXML)
- Faster; for the 94mb file it actually takes longer to upload than create now

Cons:
- String isn't loaded into XML doc anymore for quick/dirty validation check
- Drive thrashing? I have no idea if there'll be eventual performance concerns of going to drive iteratively instead of all at once

Malcolm XML posted:

what the gently caress

2000 lines/ 80k times? Refactor that poo poo, stat.

Unless you're being hyperbolic but even then goddamn

Also use xmlwriter or schematize it and reflect it into a class via xsd.exe. But there be dragons.

Those 2000 lines don't 'live' in the MakeXML sub, about 90% are in external abstracted functions (e.g. GetColor, GetLength, GetBrand, that kind of thing). I'm parsing data from about 30 different suppliers, and need to convert their values into numeric codes. E.g. "Red" = 87550701, with the resulting XML being "<Color>87550701</Color>". Except I have to parse "Red","Cherry","Blood","Redd",etc.

I realize the 'real' solution is make a rockin hardcore XSD translation but man, this data is so all over the place it's currently 'easier' to add cases to GetColor.

Thanks again guys! Why this forum rocks as usual.

# ¿ Jun 27, 2014 20:05

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Malcolm XML posted:

why are you writing to a file on disk first (do you need to keep it around?). you could just stream it all the way out to the connection (if you must keep the file around and read from it at least stream it out via file.openread)

Hmm you might be right on some of this. I keep the file around because it's for an Amazon product feed, and Amazon is constantly changing the specification in strange ways so I need to be able to refer back to the original when a 'error cvc complex blah blah line 54615 column 132' pops up.

The changes I've made have improved the situation, but not solved them. Now I'm getting OutOfMemory errors in a different place, when submitting my XML here:

code:

Dim buffer As Byte() = File.ReadAllBytes(xml) '<-- errors out here
Dim ms As New MemoryStream(buffer)

request.FeedContent = ms

request.ContentMD5 = MarketplaceWebServiceClient.CalculateContentMD5(request.FeedContent)
request.FeedContent.Position = 0

Dim response As SubmitFeedResponse = service.SubmitFeed(request)

This makes sense, since it is loading the entire file into memory. The MarketplaceWebServiceClient is provided by Amazon so I don't want to have to mess with it too too much. I see how you can instantiate the file.OpenRead, but all the examples I've seen chunk the read into bytes (usually 1024 or 2048) and apply an encoding, and then loop from there. I'm not sure how I could apply that to what the Amazon service is expecting, since the way it's constructed above it seems to be be expecting the content all of a piece. Unless that FeedContent.Position var means more than I think it does...

# ¿ Jul 1, 2014 22:24

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Night Shade posted:

So if instead you were to do:
code:
Dim response As SubmitFeedResponse
Using fs As Stream = File.OpenRead(xml)

request.FeedContent = fs

request.ContentMD5 = MarketplaceWebServiceClient.CalculateContentMD5(request.FeedContent)
request.FeedContent.Position = 0

response = service.SubmitFeed(request)

End Using
both the MD5 calculation and the web service client should read the data directly out of the file on disk.

I suspect the examples you're referring to describe the manual process of converting stream data to and from text, which almost nobody ever does because StreamReader and StreamWriter exist and which you don't have to do anyway because you're handing all of the work off to the Amazon library.

e: vb may be bogus, I'm a C# guy
e2: The using block is important so that you close the file when the Amazon lib is done with it, regardless of stuff otherwise breaking

Huh, I had really hoped you were onto something there, but alas, OutOfMemoryException strikes again when I use the exact code you posted above.

code:

Dim response As SubmitFeedResponse
Using fs As Stream = File.OpenRead(xml)
 request.FeedContent = fs

 request.ContentMD5 = MarketplaceWebServiceClient.CalculateContentMD5(request.FeedContent)
 request.FeedContent.Position = 0
 request.FeedType = feedType

 response = service.SubmitFeed(request) '<-- error on this line
End Using

Here's the error log deets:

code:

Exception type: MarketplaceWebServiceException 
Exception message: Exception of type 'System.OutOfMemoryException' was thrown.
 at MarketplaceWebService.MarketplaceWebServiceClient.Invoke[T,K](IDictionary`2 parameters, K clazz)
 at MarketplaceWebService.MarketplaceWebServiceClient.SubmitFeed(SubmitFeedRequest request)

Exception of type 'System.OutOfMemoryException' was thrown.
 at System.Net.ScatterGatherBuffers.AllocateMemoryChunk(Int32 newSize)
 at System.Net.ScatterGatherBuffers.Write(Byte[] buffer, Int32 offset, Int32 count)
 at System.Net.ConnectStream.InternalWrite(Boolean async, Byte[] buffer, Int32 offset, 
 Int32 size, AsyncCallback callback, Object state)
 at System.Net.ConnectStream.Write(Byte[] buffer, Int32 offset, Int32 size)
 at MarketplaceWebService.MarketplaceWebServiceClient.CopyStream(Stream from, Stream to)
 at MarketplaceWebService.MarketplaceWebServiceClient.Invoke[T,K](IDictionary`2 parameters, K clazz)

The interesting thing is I had stumbled upon a similar solution earlier, with the only difference being mine didn't use the Using... block, but the error is identical, down to ScatterGatherBuffers. I'm not sure what's going on here, other than the stream is obviously being treated as the entire thing instead of streaming. This is an example where someone chunks something too big for webclient, but it looks so specific and so much goes on in the While... loop I can't see how I'd get it to work with the Amazon SubmitFeed(request) model:
http://blogs.msdn.com/b/johan/archive/2006/11/15/are-you-getting-outofmemoryexceptions-when-uploading-large-files.aspx

EDIT-Hmm thought I could mess with MarketplaceWebService.Model.ContentType but the only choice is OctetStream

EDIT EDIT-I'm poking around in the ObjectExplorer and I've found a MarketplaceWebService.Attributes.RequestType, which can be Default, Mixed, or Streaming but I think it's an internal book-keeping thing.

Scaramouche fucked around with this message at 03:59 on Jul 2, 2014

# ¿ Jul 2, 2014 03:33

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

This might be an algorithmic question instead of a .NET question, but I'm doing it in VB.NET so I figured I'd ask here.

I've got to 'sanitize' a couple hundred thousand product titles. These titles were constructed by a relatively simple algo created years ago that was basically:
(company name) + (quality) + (title) + (optional dimension) + (optional size)

The problem being, this algo has been applied to several suppliers and manufacturers, and not all of them are consistent. Upon reflection, and uploading products to a new platform, we're seeing some hinky names like (plusses added to show which part is which):
"Butts Corp + 14k Yellow Gold + Yellow Gold Back Scratcher + + 8 inches"

The problem being, some vendors are including the quality in the title, and it's leading to redundancy. I've already figured out how to replace the ones that are exactly the same (e.g. 'Butts Corp Sterling Silver Sterling Silver Butt Plug 5 inch diameter") by doing a relatively simple regex/matchcollection. However, in the example above, the quality ("14k Yellow Gold") is not an exact match with the quality in the title ("Yellow Gold Back Scratcher"). So what I'm wondering, before going down a long road of split and for...next, is there an elegant way to try to match and replace fragments of Quality against Title, deleting on match so I only have 1 Quality and a modified Title? Ideally the result would look like:
"Butts Corp 14k Yellow Gold back Scratcher 8 inches"

# ¿ Jul 18, 2014 21:22

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Ithaqua posted:

You might want to look into NUML (http://numl.net/) for this one. It's pretty neat. Basically, you give it a bunch of examples of clean data, a bunch of examples of unclean data, and then run it against your real dataset and let it sort things into two buckets. You'll still have to clean it up manually, but at least it'll help you identify the items that need human love.

Thanks for the reply; I've actually been whacking away at this. I looked into numl and agree that it's pretty neat, but it doesn't work super good for me since I don't really have a list of 'good' values, I just have a list of values. I did this quick hack that I think is going to get me 80% of the way there though:

code:

strTitle = FuckedUpTitle
Dim strDistinct = strTitle.Split(" ").Distinct()
strFixed = string.join(" ",strDistinct.Distinct().ToArray())

# ¿ Jul 23, 2014 20:17

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Newf posted:

For that level of responsiveness I think client-side solutions are worth checking out. tinysort is a pretty rad js library for sorting dom elements via custom search functions (which can be attached to the keyup event of your favorite search box). Bottom of the page there has it working with a table.

I'll reiterate tinysort is really good at this. We've implemented it literally dozens of times.

Also thanks in general to everyone else for the Xamarin talk too; we're thinking of taking the plunge on it so it's good to hear from people who have experience with it.

# ¿ Jul 31, 2014 19:46

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Is this an internal app or a public facing one? A lot of the "never use code behind" is because of speed concerns, AXD chicanery, etc. I don't really get the hate for it myself; I've built enterprise scale sites (e.g. 1 million uniques a day) that use code-behind and not really had a problem with it. There's some session management and program flow gotchas, but once you learn to work around them it's really not a big deal.

# ¿ Jul 31, 2014 23:12

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

This Post Sucks posted:

Hey everyone, I hope this is the right place for this. I've got an issue that I'm banging my head against. We've got a big project that's supposed to deploy this Tuesday and over the last two weeks, I've been getting the following message on our QA box. The biggest thing is that it seems like random sections will break. I can do a redeploy and the error may fix in a previously broken section, yet break in another one. Furthermore, I did a deploy on Thursday; Friday everything worked, then came in this morning and a section that was working on Friday, broke.

After enabling Fusion logging, I get the following Assembly Load Trace:

I can't find any explicit calls in the project that is referencing the System.Web.Razor. I've tried several different solutions that I've came across, but none have seemed to work.

The variable and seemingly random nature of the error is what is really bothering me. Anyone have any ideas?

Thanks!

Edited for more info: Using MVC4, IIS7.5

This could be a lot of a lot of things. Is there a reference to Razor in your project (Solution > Project Name > References, you may have to hit the 'Show All Files' button)? And if so, is it valid? If it's not there, what happens if you add it?

The second thing that springs to mind is can you find a reference to Razor in any of those machine.config, aspnet.config, or web.config?

# ¿ Aug 4, 2014 18:49

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Yeah, every time I see columns with Something1, Something2, Something3 that usually indicates a normalization problem.

# ¿ Aug 28, 2014 20:29

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

This is a relatively simple question, but I can't seem to google-fu my way to an answer. I'm reading a CSV file like so:

code:

Dim afile As FileIO.TextFieldParser = New FileIO.TextFieldParser("c:\" & filename)
Dim csv As String()
afile.TextFieldType = FileIO.FieldType.Delimited
afile.Delimiters = New String() {","}
afile.HasFieldsEnclosedInQuotes = False

Do While Not afile.EndOfData
 csv = afile.ReadFields
 If csv(0) <> "Col1" AndAlso csv(0) <> "" AndAlso csv(0) <> "LOL I'M A DICKBAG AND DON'T UNDERSTAND CSV SUPERFLUOUS TITLE" Then
  'stuff with csv(0) and csv(1) and so on
 End If
Loop

Except the CSVs I have to read are made by a dickbag who doesn't understand what the gently caress they are for and has this beautiful setup:

code:

Col1,Col2,Col3
Value1,Value2,Value3
(lots of data)
(empty line break)
LOL I'M A DICKBAG AND DON'T UNDERSTAND CSV SUPERFLUOUS TITLE
Value1,Value2,Value3
(lots more data)

My code works, up until the empty line and then it stops parsing and won't touch anything that comes after. At first I thought 'aw crap, it's treating the new line as EndOfData', but some research shows that this apparently is not the case with TextFieldParser:
http://social.msdn.microsoft.com/Fo...orum=vblanguage

According to the documentation I can find, TextFieldParser should keep going until EOF, since it apparently ignores empty lines. I had thought my IF statement above would skip the offending line(s) and keep beavering on, but apparently not. Has anyone run into something like this and have a solution?

# ¿ Sep 4, 2014 03:41

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

ljw1004 posted:

It looks like TextFieldParser doesn't actually buy you much in this situation, and is more trouble than it's worth. I'd do it like this:
code:
Using afile As New IO.StreamReader("TextFile1.csv")
    afile.ReadLine() ' discard the first row, which contains column headings
    While Not afile.EndOfStream
        Dim line = afile.ReadLine()
        Dim csv = line.Split(","c)
        If csv.Count <> 3 Then Continue While
        Console.WriteLine("{0}...{1}...{2}", csv(0), csv(1), csv(2))
    End While
End Using
You'll need some heuristic for how to detect column headings, and blank lines, and subtitles. In your code your heuristic was to look for the exact content. I picked different heuristics in the hope that they'd be more general (i.e. wouldn't need you rewriting your code when the CSV starts looking slightly different).

Thanks for your speedy response; I was just heading out of the office last night when I wrote that (it was 9pm my time) and haven't had a chance to revisit til now.

That's... kind of the conclusion I came to myself after doing some more reading, that if I'm dealing with unstructured data I might as well not be using a structured format like ReadFields. What makes this data even better is that the number of columns isn't actually fixed, which is why I had to go with the string literal approach I was taking. It should still work moving forward I'm hoping, and luckily converting to readline doesn't really affect my core logic of what I'm doing with the columns, just the while not loop of reading it in.

# ¿ Sep 4, 2014 23:15

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

I know why this is happening, but I thought I'd pick youse guys brains on how to possibly get around it. I'm grabbing some images, resizing them, copying them somewhere, and then deleting them. The code I do it with is like so:

code:

Dim myClient As New System.Net.WebClient
dim dsImagesToGet as dataset = 'sql to get some images
For Each dr As DataRow In dsImagesToGet.Tables(0).Rows
  Dim fn As String = RegexFound("\/([A-Z0-9]+\.JPG)", dr("first_image"))
  Dim fna As String = RegexFound("\/([A-Z0-9]+a\.JPG)", dr("second_image"))

  myClient.DownloadFile(dr("first_image"), "d:\temp\images\small\" & fn)
  myClient.DownloadFile(dr("second_image"), "d:\temp\images\alt\" & fna)
  Dim source As New System.Drawing.Bitmap("d:\temp\images\small\" & fn)
  Dim target As Bitmap = ResizeImage(source, 1000, 1000) 'External function that does resizing
  target.Save("d:\images\pub\1000\" & fn, System.Drawing.Imaging.ImageFormat.Jpeg)
  File.Copy("d:\temp\images\alt\" & fna, "d:\images\pub\alt\" & fna)
  File.Delete("d:\temp\ruby\images\small\" & fn)
  File.Delete("d:\temp\ruby\images\alt\" & fna)
Next

Everything works, except for the last step of deleting the file in which I get "access denied". I've ran into this before when dealing with CSV/TXT/XML files I've created in the past, that they basically get locked until the entire process is over. I can set up a scheduled task to clear the directory at (x) PM every night, but I was wondering, is there a way to do it inside the For...Next loop without having to make another one?

# ¿ Sep 10, 2014 23:47

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Essential posted:

Is it just the first File.Delete that has the access denied? I wonder if it's the ResizeImage() that's locking that file. I'm almost positive (I don't have the code in front of me at the moment) that I've called webclient.downloadfile and then deleted the file after moving it. Can you wrap the webclients in using statements to make sure they get disposed?

Or possibly you need to dispose target first?

I'm pretty sure any object that touches the images has to be disposed before you can delete. If that's the case, then one (or more) of those objects is what's locking the file.

I'm looking into this now since I have to do this with another image source that already has images of the right size. However, that leads to:

Another Question for .NET Experts

I'm downloading these image files using a webclient as above, however this is an 'image server' that I have to give parameters to like so:

code:

myClient.DownloadFile("http://blah.com/imgsrc/" & filename & "?wid=1000&hei=1000", "d:\temp\images\1000\" & filename)

However someone on the other end is being a clever dick. If the file doesn't exist, it returns a 'image not available' image called missimg.jpg@wid=1000&hei=1000.

The problem being, I don't want those crappy images. And given how myClient.DownloadFile works, they're going to be invisibly renamed to be legit images, since the second parameter provided is what filename to save it as. Is there a way to check what filename is being provided after the Address gets downloaded, but before the Filename gets saved? Kind of hijack it in the middle as it were.

# ¿ Sep 12, 2014 00:25

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Sedro posted:

Pull the filename out of the response header, examples here.

Yeah that's looking definitely a thing I'll be checking out. I ran it through Fiddler to see what's coming back and those sneaks are using a 302:

code:

HTTP/1.1 302 Found
Date: Fri, 12 Sep 2014 01:09:42 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 4.0.30319
Location: /images/missimg.jpg?wid=1000&hei=1000
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 192

<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="/images/missimg.jpg?wid=1000&amp;hei=1000">here</a>.</h2>
</body></html>

So theoretically this will catch it:

code:

Using myClient
 myClient.OpenRead("http://blah.com/images/" & fn & "?w=1000&h=1000")
 Dim strHeader As String = myClient.ResponseHeaders("status-code")
 If Not strHeader.Contains("302") Then
  'do file saving stuff
 End If
End Using

The only thing I'm not sure is how to peel off the file from the webclient OpenRead without making another DownloadFile/request operation, something to do with StreamReader maybe.

Scaramouche fucked around with this message at 02:28 on Sep 12, 2014

# ¿ Sep 12, 2014 02:25

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Sedro posted:

C# code:

using (var contents = webClient.OpenRead(...))
using (var fs = File.Create(@"\path\to\file"))
{
    contents.CopyTo(fs);
}

You guys are way too good to me. That took care of that problem. However, that leads to

Another .NET problem (or possibly an sql problem)

I'm downloading these images, and then I'm updating the database with them so the item knows what it's image url is. However, I'm getting this message around halfway through*:

code:

System.Data.OleDb.OleDbException: Not enough storage is available to complete this operation

On the line that executes my stored proc, which is basically a InsertOrUpdate ItemID,ImageURL1,ImageURL2 kind of call. I'm surprised that this is happening, because I'm actually batching the statements per 1000 like so:

code:

Dim sql As New StringBuilder
Do While Not afile.EndOfStream
 Dim line = afile.ReadLine()
 Dim csv = line.Split(","c)
 '0 - ItemID
 '1 - Image1
 '2 - Image2

 If (stuff that confirms quality of the row in the csv) Then
  counter += 1
  sql.AppendLine("exec InsertOrUpdateThinger '" & Trim(csv(0)) & "','" & Trim(csv(1)) & "','" & Trim(csv(2)) & "'")
    If counter >= 1000 Then
     ExecuteSql(sql.ToString)
     counter = 0
     sql = New StringBuilder
    End If
 End If
Loop
ExecuteSql(sql.ToString)

So the idea being, the stringbuilder can only get up to 1000 rows before it gets sent off to the ExecuteSql function (where the error occurs), and the ExecuteSql after the loop will only catch the last <1000 rows. Is 1000 too much, or is this somehow occurring in the same scope/operation despite the batching? The InsertOrUpdate proc is a basic one like (from memory) select id from table where id=@id if @id is not null then update blah blah blah.

* Halfway being about 10,000 rows; There's only about 25,000 rows in the database >total<

# ¿ Sep 16, 2014 20:45

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Odd question for you guys. I have to classify and create text based on a relatively complex table. The table is the AGTA gem treatment manual (found here http://www.firemountaingems.com/encyclobeadia/beading_resources.asp?docid=8930 under the "Introduction of the Gemstone Information Chart" section partway down the page). As you can see, there's 7 columns and about 140 rows. The workflow is as thus:
1. Product comes in
2. Extract gem info from product (products can have more than one kind of gem)
3. Based on the gem, present treatment tag, explanation of treatment, and any special care instructions
4. If gems>1 goto 3 until finished
5. Append string of tags, explanations, and treatments to product description

I started doing it badly, basically if gem="ruby" then description.append("Gem: Ruby, Treatment: H, Explanation: Many gems are heated to enhance their color and intensity., Special Care: None") and so on and so on. I got about 5 gems into this before realizing that it sucks and I hate it.

I consulted a co-worker who suggested putting the whole table into database, and then putting it into session at step 3 and querying from session when it's present, and loading it into session with it's not available.

I didn't want to do pure database because I want to avoid the roundtrip. This is going to be a bulk operation, with thousands of products being processed at a time. Is his solution reasonable, or do you guys feel there's a better way? I'm using VS2012 and vb.net.

# ¿ Oct 2, 2014 00:42

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Thanks for the replies guys; I think I either gave too much information or too little! If I was to break down my question into the abstract it'd be: "If I need to refer to a static series of values, is it better to have a database hit once and store the table in session, or should I make a database call every time? (keeping in mind that the table data doesn't change)"

I have to do this in real time because I'm parsing over 500,000 pieces of jewelry per day from EDI, csv, XML, our own database, etc. etc. In about 50% of the cases the AGTA information isn't included, so I have to make assumptions using the table. The application is not on the same server as the database, and the job is already so big it's chunked up by vendor, with each vendor taking about 40 minutes to process. This is why I'm trying not to introduce more time into the operation.

By doing it badly I mean typing out 140 if/then/else or case statements, even though the actual number of treatments is about 20 or so. In the short term I've done a database only solution last night, we'll see how it performs.

# ¿ Oct 2, 2014 19:31

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

mortarr posted:

- PageExtractor comes in x86 and x64 versions, however we need to use the x86 version because OpenXmlSdk is x86 and there is no x64 version, and both assemblies are used in the same VS project.

Can't say much without knowing the exception, but based on my past experience with 3rd party apps something is going on here.

# ¿ Oct 3, 2014 02:32

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

I think this is possible, but I was wondering if you guys could give me a direction to get started. I've got a large number of images provided by a third party. Some of them are either contrasty, adjusted poorly, or what have you. This means that backgrounds that should be white, are instead kind of an off-white. Here's an example image:

So what I'd like to be able to do, programmatically, is 'fix' these images so the white is actually RGB 255 white. Do you guys have any suggestions or libraries that you think are up to this?

Note: I don't have to be able to >find< them, I have another service that does this. I'd be inputting a big list of file names with weird backgrounds into whatever solution I develop.

# ¿ Oct 9, 2014 23:18

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Destroyenator posted:

If this is a one-off batch conversion I'd probably go Python or similar if you're comfortable with it because there tends to be simpler libraries for this sort of thing.

If you are going .Net, I'd just fire up nuget, search for image processing, filter by popularity and start googling library names until I found one that looked right.

The other thing to consider is whether the "blended" bits where the background meets the item are going to look okay. Try paint bucket fill on a few with the more "off" backgrounds and see if it's noticeable. If you need those bits converted properly I think you might be heading into script photoshop/gimp territory unless a there's a library that explicitly does this.

Yeah, I was hoping that I could avoid a bucket fill style thing, and hopefully do a levels adjustment or something similar. e.g. set the white point/black point. I've tinkered with straight up colormap replacement and it looks kind of assy, especially when you need to color a range of values.

# ¿ Oct 10, 2014 02:04

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

mortarr posted:

Not sure if this does what you're after, but I've been using ImageResizer for about two years commercially and found it to be pretty good. I'm using it for one-off resizing of A1 and A2 colour pages taken out of pdf and tiff documents, which is probably way outside what it was intended for, but it's pretty sweet and you can override quite a lot of the processing pipeline if you need custom features.

There is a paid-for version that has a plugin to do white balance adjustment which might be what you need. The source is available though, so check out \Plugins\AdvancedFilters\AutoWhiteBalance.cs in the zip.

From the docs:
Automatic white balance (alpha feature of 3.3.0)

Automatically corrects the white balance of the photo with 1 of 3 algorithms.
- area - Threshold is applied based on cumulative area at the lower and upper ends of the histogram. Much larger thresholds are required for this than SimpleThreshold.
- simple - Simple upper and lower usage thresholds are applied to the values in each channel's histogram to determine the input start/stop points for each individual channel. The start/stop points are used to calculate the scale factor and offset for the channel.
- gimp - Threshold is applied based on strangely skewed cumulative area, identical to the process used by GIMP.

Thanks for the insight guys. I wanted to do this programmatically because I'm already doing some other processing on them (resizing, cropping, etc.) when I download them from the supplier so I figured it made sense to keep it all in one process as it were. However, mention of using an actual program to do it made me realize that I already have a (free!) program sitting on my hard drive that can handle this: FastStone Photo Resizer. It's only about one in 100 images that need processing, so I guess I can use a program to do it manually for now. If this gets any more complicated though I'm definitely going to be checking out imageresizing.net.

# ¿ Oct 10, 2014 19:32

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Newb question:
VS2012 on a basic web project

I've got a solution (let's call it Helper) that's been running on my local workstation for the last 18 hours, on the local IISExpress worker process. As you can probably guess from my earlier questions, it's doing a bunch of image downloading and manipulation (370,000 images to be exact). I'm doing this on my local because it's a one time thing and I thought it would finish overnight. The thing is, I want to run/build Helper again, in a separate instance that won't blow up/stop my huge image download (which has at least another 14 hours to go). How can I run another another instance of the same project, build it, and not have it interfere/mess with the existing and running version of the same project? Due to the chancy nature of losing 18 hours of downloading if I'm wrong, I'm kind of scared to guess and test on this one. These are possibilities I've thought of in order of ugliness:
1. Just build it again in the same instance of VS, iis is smart enough to not shut down an existing running process.
2. Open another instance of VS, open the solution, and VS will be smart enough to keep them separate
3. Copy entire solution folder somewhere else and rename it and open as solution in VS (uggh the RCS implications)
or
4. Man up and just RDP into the development server, get latest from Git, and go from there.

# ¿ Oct 17, 2014 00:45

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Thanks guys, looks like I was right to be paranoid instead of diving right into 1 or 2.

It's running as a web app because this is kind of my swiss-army knife utility for functions related to a particular web site, and I like being able to instantiate events via URL (e.g. /helper.aspx?op=update-inventory). While this batch is a one-off, the idea is that it'll run iteratively in future, and only have to handle 200-300 images a day (product images for ecommerce). The way it's set up now is a manufacturer can launch new products and they'll automatically be created for the site and uploaded to 8 different marketplaces within a few hours. The downside is when a vendor decides to launch 100,000 new products all at once.

I can come back if it's interrupted; it's downloading images based on an incremental id so worse came to worst I could just look at last id processed and jump on from there. I knew it was going to be long running, but not this long running. I also might have mis-estimated the timing, based on where it's at now there's still about 18 hours to go which means it's time to go home and then bite the bullet tomorrow by using the dev server.

EDIT-For anyone curious, it just finished now.

Scaramouche fucked around with this message at 01:06 on Oct 18, 2014

# ¿ Oct 17, 2014 03:46

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

This is more of an academic question, but what's up with String.Format("0.00") versus Math.Round? I've got a weight value that I always want to underreport/round down a little bit, so I've been using:

code:

Format(SomeDecimal,"0.0")

As I thought it would just truncate x.xx to x.x without getting all mathematical. Instead it appears to be doing Midpoint Rounding away from zero as demonstrated here:
http://stackoverflow.com/questions/2226081/why-does-net-use-a-rounding-algorithm-in-string-format-that-is-inconsistent-wit

This means that say, 2.77 ends up being 2.8 instead of the 2.7 I want. Is there a consistent way to do this? .ToString("0.#")?

# ¿ Oct 21, 2014 19:45

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Bognar posted:

You could always cheat: Format(((int)(SomeDecimal * 10))/10, "0.0").

Yeah, as crashdome says, if I'm going to be messing with multiply/divide I'd just do Math.Truncate(SomeDecimal * 10)/10, which is apparently what I'll have to do. Just seemed weird because "Format" to me implies that no math would be going on. Lesson learned I guess; the whole point of the exercise is that I'm converting from pennyweight, pounds, troy ounces, grains, whatever the supplier provides. Since it involves precious metals I always want to be rounding down instead of up.

# ¿ Oct 21, 2014 21:15

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

The Wizard of Poz posted:

Has anyone here tried using WP.NET (WordPress running under .NET)? The website makes it sound pretty good but they have virtually no concrete information on how it works or how to develop in C# for it or anything.

That's weird, other than the original press release there's like literally zero discussion of it. I'm running WP on IIS for a few projects, but it's all php on Windows.

It's looks like it's just the WP code compiled by Phalanger:
http://www.php-compiler.net/

I guess if you wanted you could just get Phalanger and the latest WP and see if it works for yourself, since that's apparently all they're doing.

# ¿ Oct 23, 2014 18:30

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

I would too actually, wouldn't mind getting rid of php on that machine entirely, mostly for performance reasons. The only customizing we do is CSS/style, though I guess the occasional change to header.php and functions.php would have to be re-compiled in.

# ¿ Oct 23, 2014 23:27

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

epalm posted:

The "help file" that ships with our software is a directory containing html files, which comes from an MS Word document -> Save As Web Page

Part of the reason why our documentation isn't great is because one guy here "owns" that word document. It would be way better if it was public (within the company), and a group of people could edit it, and it would have edit versioning, and so on. Sounds like a wiki to me.

IIS is a dependency of our software, so I'm looking into .NET wikis and CMSs. Anyone have any suggestions? Something like http://userguide.tomecms.org/ looks like it might fit the bill, but I have no idea what's "good".

How do you provide Documentation / User Guides to your clients via the .NET stack?

They were used for internal documentation (e.g. Customer Support scripts, company policies, holiday calendering, release notes etc.) and not customer facing, but I've used ScrewTurn Wiki quite a few times in the past:
http://www.microsoft.com/web/screwturnwiki/

You say IIS is a dependency which implies enterprise stuff though, I'm not sure if lumping in a whole CMS/wiki package into the install would, for example, not create headaches in terms of a security audit. (e.g. what happens if an exploit is found in STW?)

# ¿ Nov 7, 2014 20:39

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Ah I get you. Then ScrewTurnWiki might not be the solution, it doesn't have native export support:
http://www.wikimatrix.org/show/screwturn-wiki

# ¿ Nov 7, 2014 22:03

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

I think I see your confusion; you're struggling with the idea that each button could potentially be a submit and how to handle them independently? What I'm wondering is, why have a button for each one and the round-trip that implies? You could have checkboxes or radios instead (with the default set to approve/reject depending how often each comes up). Then the form can handle each box/id in a FOR...EACH operation with only one submit.

# ¿ Nov 7, 2014 22:45

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Che Delilas posted:

Mostly I'm dealing with people who I know really struggle with anything computer. I didn't want to introduce an extra step, and "push button, thing happen" is as simple as I came up with. Radio buttons probably are a better plan though.

I got you. If you're still married to the idea, you could fake it out with link buttons. I have no idea how 'MVC' that is, but instead of doing a full submit just have a link button with href="blah.aspx?userid=xx&approve=1" and then let the code behind process it. Insecure as hell and messy to boot.

# ¿ Nov 7, 2014 23:53

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Gul Banana posted:

very nice! nothing quite as exciting as record classes, but I'll be making immediate use of half this stuff. how on earth did you solve the memcpy problem to allow for parameterless value type constructors?

Is it sad that I'm excited for multi-line string literals in VB? Man maybe it's time to switch over.

# ¿ Nov 13, 2014 23:31

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Any of you guys work with AWS S3? I'm uploading about a million images using this pretty straightforward code (VB) and the AWS SDK (2.3.8.1):

code:

Dim s3Client As IAmazonS3 = New AmazonS3Client("AWSAccessKey", "AWSSecretKey", RegionEndpoint.USWest2)
Dim files
files = Directory.EnumerateFiles("c:\tonsofimages\", "*.jpg")
For Each filename As String In files
   Dim reqUp As New PutObjectRequest()
   reqUp.BucketName = strBucket
   reqUp.Key = Path.GetFileName(filename)
   reqUp.FilePath = filename
   s3Client.PutObject(reqUp)
Next

It worked for about probably 500,000 or so until I got the dreaded "The request has been aborted". This is apparently quite common when dealing with S3 and large transactions. What I'm wondering, is there a way to upload only the files that aren't there? From what I've read so far S3 is kind of a 'write once read many' system, which means there's no hierarchical data on the buckets; there's a one way relationship that the object 'knows' the bucket that owns it, and that's it. What I'm hoping to do is:
1. Get enormous list of files (enumerate *.jpg)
2. Check if the object exists in S3 bucket >without downloading the entire object<
3. If yes upload

What I'm running into trouble with is the second step. There's a couple of solutions in this SO question but I'm not comfortable using exception handling as program flow:
http://stackoverflow.com/questions/8303011/aws-s3-how-to-check-if-specified-key-already-exists-in-given-bucket-using-java
or
https://forums.aws.amazon.com/message.jspa?messageID=219046

I think I can use S3FileInfo.Exists (and man was this a pain to hunt down):
http://docs.aws.amazon.com/sdkfornet1/latest/apidocs/html/M_Amazon_S3_IO_S3FileInfo__ctor.htm

But what's not clear to me is, when I'm instantiating it eg:

code:

Dim s3FI As New S3FileInfo(s3Client, strBucket, filename)
   If s3FI.Exists Then
    'Stuff
   End If

Am I downloading the whole object? Just metadata? Because I don't want to download a million files while uploading a million files. Any S3-pro-bros out there have any idea?

EDIT-Sorry just to clarify, I know my approach above will work, but I'm wondering if it's the most efficient when dealing with a million objects. For example, if it keeps failing and I end up doing more checking ifexists than uploading, should I maybe keep a local log of what uploaded successfully and check against that? Of course that list will get to about 90mb assuming a million lines of text.

Scaramouche fucked around with this message at 22:46 on Nov 19, 2014

# ¿ Nov 19, 2014 22:19

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Scaramouche posted:

Any of you guys work with AWS S3? I'm uploading about a million images using this pretty straightforward code (VB) and the AWS SDK (2.3.8.1):
code:
Dim s3Client As IAmazonS3 = New AmazonS3Client("AWSAccessKey", "AWSSecretKey", RegionEndpoint.USWest2)
Dim files
files = Directory.EnumerateFiles("c:\tonsofimages\", "*.jpg")
For Each filename As String In files
   Dim reqUp As New PutObjectRequest()
   reqUp.BucketName = strBucket
   reqUp.Key = Path.GetFileName(filename)
   reqUp.FilePath = filename
   s3Client.PutObject(reqUp)
Next
EDIT-Sorry just to clarify, I know my approach above will work, but I'm wondering if it's the most efficient when dealing with a million objects. For example, if it keeps failing and I end up doing more checking ifexists than uploading, should I maybe keep a local log of what uploaded successfully and check against that? Of course that list will get to about 90mb assuming a million lines of text.

I know you guys had lots of info about this but were just pulling for me behind the scenes to figure it out on my own, and here's what I figured out: I'm expecting S3 to be like a database but it is not a database. So instead I'm making a database table of all the filenames with an uploaded bool and a modified date and using that to direct my upload/modify operations instead of using the minimal S3 tools to do it. You certainly >can< as I do above, but it starts to get unwieldy at around 300,000 objects or so. The best way I've found to do this is get a local copy of your list (either via ListObjectsRequest or your own database) and work from there; it's not really worth it with the limited hardware/bandwidth I'm currently using to do it all in S3. Luckily this builds in iterative updating in future as well.

# ¿ Nov 21, 2014 01:40

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Dumb question but you guys are my go-to for dumb questions.

I'm building a List (of String), and I want to check if something is already in the list before adding it. However what I'm not sure of, does the .Contains method match any part of any string in the list, or only entire items? So for example if the list is currently:

code:

ruby
emerald
black-diamond

And I want to add 'diamond', will List.Contains("diamond") return true because it's part of the "black-diamond" entry? Or will it return false?

# ¿ Dec 5, 2014 02:38

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Hah, thanks guys. I'm working off of mobile hotspot while switching broadband providers and my first Googling didn't give me what I wanted so I figured it'd be faster to ask here rather than wait 10 seconds for each page load.

Newf: That looks ideal, will have to see how it handles low bandwidth usage.

EDIT-In retrospect it feels like I'm spending more time checking if the list contains things than actually inserting into it. Might switch to hashset

Scaramouche fucked around with this message at 05:52 on Dec 5, 2014

# ¿ Dec 5, 2014 05:49

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

This is pretty old but when I used to get that error in the past I'd follow some of it:
http://blogs.msdn.com/b/dougste/archive/2006/09/05/741329.aspx

The use of FUSELOG and FILEMON especially.

Maybe someone will contradict me but I've seen the second warning all the time and generally ignore it, if it occurs in development.

# ¿ Dec 11, 2014 21:27

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Azubah posted:

Has anyone had experience in integrating SSRS reports into a MVC project? I can get the /ReportServer folder directories to show up in an iFrame and navigate to the reports, but you still have to provide log in credentials and isn't as pretty. It's also not what is expected.

According to the database guy here I should be able to use Reportviewer control to do this, but so far I've only found that it renders one report if you've provided the report name in the specific directory. A lot of the tutorials I've found assume you need to build the report and use aspx pages.

What I'd like to do is render the directories as folders and drill down to the reports like Reporting Services's Folder.aspx page using credentials already provided by the system. Unlike that page though, we don't want the report builder or folder settings to appear, only the navigation.

My only experience with SSRS is that the three times I've had to implement something with it that it's not capable of doing what we wanted it to do and we moved to other solutions. Sorry that doesn't help, and maybe it's gotten better (this was 2008R2), but good god was it a tangle of permissions and lovely bridge code and not-quite programming but not-quite scripting.

# ¿ Dec 18, 2014 03:34

Adbot: ADBOT LOVES YOU

# ¿ May 2, 2024 00:56

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Yeah they're putting a URL literal in code but apparently don't want people to be able to find that url by text search, maybe?

# ¿ Jan 2, 2015 21:09

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > .NET Megathread 3.5: await GetGoodPosts()

«‹›3 »